# Almost Always Rebase: Git The Right Way ## Preamble Git is an extremely complicated tool. But learning various concepts can make it dramatically easier. I have been asked by a friend to provide an explanation for how I believe one should use git, and why. This is that explanation. I will assume you have basic git proficiency, ie that you can git add, git commit, git push, and maybe even git pull. This document addresses certain preliminaries (which may perhaps be skipped), then my general strategy that I call "Almost Always Rebase", then the specific question of how to enact AAR in a repo with both a dev(elopment) and prod(duction) realization. ## Preliminaries: What is a commit? What is a delta? What is a conflict, and what does it mean to resolve them? “A delta isn't just a feature of a river in Africa.” — Wyatt S Carpenter What is a commit in git? Well, a commit is a snapshot of the entire codebase at a certain point in time. That's why you can "checkout" a commit in git and browse the state of the repo at a certain point in history. Git is a little bit more complicated than keeping all of your old code versions in a hidden directory, but conceptually that's what it does. HOWEVER, because the people who made git got confused, or something, many actions that git says work on "commits" are actually performed on "deltas" (aka differences/patches/diffs). A "delta" is the difference between a commit and the previous commit. So, when you are using a git tool and you "apply" a commit from somewhere, what you're actually doing is looking at the commit, doing a diff between that commit and the commit right before it*, which produces a delta, and then applying that delta. This is, ultimately, why you can have conflicts in git at all (but also why git is useful at all). (Note that git commit messages are usually written as a summary of the delta as well. If I've just added a feature X to my program, my commit message is almost certainly going to be "add feature X". Even though, conceptually, the commit is just the entire state of the program, so the commit message should perhaps be a summary of the entire capacities of the program. But this would make commit messages useless, so it is better that we write a summary of the delta in the commit message.) * Or, in the case of commits with multiple parents, ie merge commits, **one of** the commits right before it. This explains why you can apply a merge commit to another commit, but also why you need to specify which parent to use. Don't worry about this part if you don't understand it. So, for instance, if you have a git history that goes: A - B - C (HEAD) \ D (tip of another branch) And you decide to apply the commit D to the head C, using cherry-pick, it will *not* just replace the state of your codebase with the state of the codebase in commit D. Instead what it will do is compute the diff between B and D (which we can call BD by analogy to geometric notation for line segments between two points) and then apply that diff to C, resulting in a new commit with a new state of the codebase (which we can call D′, to denote it's kind of like D but special). When you were making changes to your code to make C, you might have changed the code such that now it doesn't make sense to change the code in C in the same way you changed B to get to D. For instance, maybe you changed a function name from foo to baz when you were making C, but when you were making D, you changed the function name from foo to bar. Then you will get a "conflict" when you try to apply D to C, as git won't know what to change. In case you've never seen what this looks like, I've included an example repo here you can inspect with git branch, git log, etc. Since git absolutely refuses to version control other git repos recursively(??), and my blog is currently version controlled by git, I've had to zip it up for you. So, you can download it and unzip it and play around with it from where it lives at assets/almost_always_rebase/example_git_repo.zip. The repo has a conflict in it. It is in the middle of the conflict, in the conflict state, as you can see by running git status. The steps to produce this conflict went like so: First, I wrote some text into a file. Then, I changed it in incompatible ways on both master and a branch. Once the conflicting branches, master and example-d-branch were set up, I was on branch master (meaning, HEAD was there) and ran the cherry-pick command to apply the tip of the example-d-branch (the D commit) to the tip of master (the C commit): git cherry-pick example-d-branch Then, I was greeted by this friendly error message: Auto-merging script.txt CONFLICT (content): Merge conflict in script.txt error: could not apply 5294e6e... change foo to bar hint: After resolving the conflicts, mark them with hint: "git add/rm ", then run hint: "git cherry-pick --continue". hint: You can instead skip this commit with "git cherry-pick --skip". hint: To abort and get back to the state before "git cherry-pick", hint: run "git cherry-pick --abort". The hints here are actually pretty good; if you kind of know what you're doing, you can just follow them. script.txt now looks like this: <<<<<<< HEAD baz: ======= bar: >>>>>>> 5294e6e (change foo to bar) 1 2 3 Your task, as a conflict-resolver, is to replace everything within the <<<<<<< and >>>>>>> lines (inclusive) with whatever you want, and then this new version of the file will become the file in the new commit D. Usually, you use your incredible human brain to figure out a version of the file that satisfies the general intention of both commits. (You generally try not to introduce **new** changes at this step (unless it makes a **lot** of sense) because that would quickly make the history more confusing. A merge (not what we're doing here, btw) which introduces some completely new changes, for example, is colloquially called an "evil merge".) But here we will just replace baz with bar, because it makes most sense that way in this toy example (the point of the delta BD is to rename the function to bar, so BD applied to C should also rename the function to bar). So we do that, and get the result: bar: 1 2 3 As our result. We can then continue the cherry-pick by running: git add script.txt git cherry-pick --continue In this case, since we've completely resolved everything, the cherry-pick now concludes, and the new commit is formed. Since we didn't provide a commit message, our text editor will open to prompt us for one. The default is the commit message of D, but since this is **a new commit, D′, not the same commit as D** you can actually type any message you want. Still, the commit message of D is usually the most useful. In our particular case, you might want the commit message of D′ to read "change baz to bar" instead of the commit message of D, "change foo to bar", as we don't have a foo anymore. Well, in either case, we're done. We have fixed the conflict. Once the editor saves the files and/or exits (I don't know which) the new commit is finalized. As far as I can tell, conflicts like these **are always called "merge conflicts" by git, even when they are not part of a merge commit but instead are part of a rebase commit**. So, that's annoying. By the way, if you're in the middle of some operation, git status will always keep you appraised of the situation. For instance, in the middle of the cherry-pick, running git status tells us: On branch master You are currently cherry-picking commit 5294e6e. (fix conflicts and run "git cherry-pick --continue") (use "git cherry-pick --skip" to skip this patch) (use "git cherry-pick --abort" to cancel the cherry-pick operation) Unmerged paths: (use "git add ..." to mark resolution) both modified: script.txt no changes added to commit (use "git add" and/or "git commit -a") ## Preliminaries, partie deux (deuxième partie): What is a merge commit? What is a rebase? What is fast-forwarding, anyway? While we're at it, what is squashing? Due to time constraints, I have elected not to write this section. Instead, please enjoy this meme I have made: assets/almost_always_rebase/two_mommies.png You can just google these concepts; it's fine. I'm sure they're competently explained somewhere. They're pretty simple, especially now that you know the above. ## Almost Always Rebase I hate code and I want as little of it as possible in our product. —Jack Diederich, "Stop Writing Classes", 2012, https://www.youtube.com/watch?v=o9pEzgHorH0 Reflect on the Jack Diederich quote above. It's one of the most powerful aphorisms in software engineering. It also applies to source code revision. Here is an argument: "Code" is details. The fewer details you need to do what you want, the easier it is to deal with. Therefore, you should try to minimize the amount of code in your product, to make it as easy to deal with as possible. Keep in mind that "everything should be made as simple as possible, but no simpler" (nb: this quote is apparently a famous paraphrase of Einstein, I guess). You do eventually need code. You may need a lot of code to get exactly the right behavior you want. Also performance counts as a feature. Also, not all "features" count as features, if they are bad (I will not elaborate on this here). But, within those constraints, the logic still stands. Now: THE SAME LOGIC APPLIES TO GIT COMMITS. That's the point of this blog post. Now you can enjoy the rest of this section and then the appendices if you wish. By the way, the developers of git also realized this thing about minimizing the number of redundant commits — that's why git rebase destroys your merge commits by default. (That behavior is somewhat unfortunate, given the paucity of good ways to truly manipulate the DAG in git, but it at least showed they were thinking about it.) There are basically only three reasons to ever keep a commit around: 1. The commit history including that commit is public and widespread and you don't want to bother everyone by breaking history and forcing them all to perform complicated maneuvers to adjust to the new history for very little reason. (Thus, the common rule of thumb in git: never change the history of the master branch once it has been published to github, only ever do git reverts to it.) 2. You think you might need that commit later. 3. The commit highlights a logical change in the code, so keeping it as a distinct commit makes reviewing the commits easier. (The optimal size of reviewable commits is somewhat a matter of taste.) kthxbai ## Appendix A: "Introduction to Git" Now that you've read this document, treat yourself to the amusing “Handmade Hero Day 523 - Introduction to Git” https://www.youtube.com/watch?v=3mOVK0oSH2M (first half of the livestream) ## Appendix B: See A Fork In A Road? Take It. Based note for git rebases: if you're doing a rebase --rebase-merges (formerly known as --preserve-merges, a much more self-explanatory name), git will, by default, piss its pants and cry and demand you re-solve the merge every time. And I do mean **every time**. Here are some based hints for handsome geniuses you can use in the situation: • The crazy thing about mastering git is that you go from thinking "wtf do all these commands do" to "wtf why don't these commands do the right thing?" Anyway, in this case it's because rebase was initially built to flatten your history when rebasing your feature branch on to master — which is great if you can, but as the primary verb to manipulate the DAG of history in git it's sorely lacking. You need to invent your own tools, in the form of practices, around the broken tools you do have. • The sane choice here is usually to forget doing a rebase --rebase-merges, and figure out what you can do instead to mostly satisfy whatever goal you started with. Often this involves one of the easy recipes in the famous git-filter-repo tool https://github.com/newren/git-filter-repo. • Use jj instead of git. After mastering git, I've concluded that it's actually very poor at manipulating git histories. (Weird!) So, even though I've never used jj, it's probably better about this. • You can **as soon as possible, ideally before you made any of those merge commits in the first place** turn on git.rerere (git config --global rerere.enabled true). This stands for “reuse recorded resolution”, and it means if you ask git to do a merge you've already done then it will just remember how you did it and do it. Naturally, something this helpful and crucial is turned off by default in git. ◦ rerere is turned off by default because it uses some kind of rule to determine if it should run, which is occasionally wrong. So, if it's turned on, it will eventually ruin other merges sometimes. Great... See also: https://stackoverflow.com/questions/5519244/are-there-any-downsides-to-enabling-git-rerere/77453543#77453543 & https://github.com/wyattscarpenter/funny-little-rerere-example-repo/ • This situation is often actually very easy, don't fret! The git rebase message will say something like: CONFLICT (content): Merge conflict in foo.py error: could not apply 0eef091... mycommitmessage hint: Resolve all conflicts manually, mark them as resolved with hint: "git add/rm ", then run "git rebase --continue". hint: You can instead skip this commit: run "git rebase --skip". hint: To abort and get back to the state before "git rebase", run "git rebase --abort". hint: Disable this message with "git config advice.mergeConflict false" Recorded preimage for 'foo.py' Could not apply 0eef091... mycommitmessage The funny thing about this is, even with the bonus helpful message, it doesn't tell you the right and obvious solution: simply git checkout 0eef091 -- foo.py. This gets the version of the file from the merge that you did the first time in the original timeline. Provided that you haven't actually changed the file in question in your previous changes that you rebased, and the "merge conflict" is just spurious, this will deal with it easily. And if rerere is on, you'll never have to deal with an identical situation to that again! (If you accidentally take this advice when you shouldn't and ruin your rerere, you'll then have to get git to forget the right rerere.) • For certain situations, you can use git replace --grafts instead of git rebase --rebase-merges. It's extremely complicated and error-prone, full of special cases where it won't work right. That's how you can tell it's the intended solution to do it in git, babyyyyy! For more info, read this thread: https://stackoverflow.com/questions/3810348/setting-git-parent-pointer-to-a-different-parent, or just consult the man page for git replace. • There is also this guy's script https://github.com/MarkLodato/git-reparent, which includes instructions about how to rebase into a reparent, which is a little finnicky but probably does what you want in the trivial case. • I may one day make a tool in https://github.com/wyattscarpenter/gyatt to **actually** reparent, because this is quite feasible even though no extant tools seems to want to do it, but I wouldn't hold your breath for that. It's not high on my priority list at all. ## Appendix C: Random tool suggestions difftastic and mergiraf seem like they're cool. Also, jj, as I already mentioned, seems like it couldn't possibly be worse than git. ## Appendix D: Prod and dev I've gotten all the way to this appendix before realizing that I've still got to pay off the prod & dev thing from the introduction. That's actually why this essay was written, ultimately, although it no longer matters. Anyway, the general idea is that we'd have a master branch in dev (which was, for arcane reasons*, the staging environment, not the dev environment, as those terms are usually used) and a master branch in prod, and whatever history we had in dev, once it was settled and tested, we would eventually push to prod, creating a nice history with only a time lag, ultimately, between dev and prod. Let me stress that it was **one** history, conceptually, just some of the commits hadn't "made it to" the prod environment yet. * the system wouldn't let us make an environment with "staging" in the name lol ## Appendix E: Explanation of a joke "A delta isn't just a feature of a river in Africa" is a play on the old joke "denial (the Nile) isn't just a river in Africa", which I guess you're supposed to say to or about someone who is in denial.