|« 3D printers - worry for the right reasons||Protip: Enable !$ »|
Tue, Apr 30, 2013
I came across this plea on Hacker News to always use "git pull --rebase" instead of the default merge behaviour.
The logic was that if somebody has pushed to the remote branch since you last did a pull, and you've also made local commits to it, you'll get a merge commit. And if this happens frequently, you get a history that's mostly merges, and this is ugly and clutters your history.
I agree that having lots of merge commits is a Bad Thing. But I also contend, and will in this post try to persuade you, that not only is rebase the wrong solution to his problem, it is so often the wrong solution that (a) rebasing prevents people from learning how to use git correctly, and (b) rebasing should be regarded as a code smell - if you use it often, it's probably a symptom that you're doing something wrong.
Those are pretty sweeping claims considering that some people think that the existence of rebase is grounds for the removal of git's merging capabilities. So I've got to be pretty convincing with my arguments, I guess.
So, firstly, let's just remind ourselves: What is rebasing?
Well, if your current HEAD is commit 'a', and you make some changes and commit them, you have based your changes off commit 'a' to get commit 'b'. If somebody, in the meantime, made their own commit, commit 'c', also based off commit 'a', then you have a problem: you both want the head of the current branch to contain your changes, but you appear to have forked the branch.
The standard way of fixing this is to create a new commit that has both 'b' and 'c' as parents, merging them so the new HEAD has both your changes:
d / \ b c \ / a
The alternative is to rebase: To apply the changes you made in commit 'b' to the changes THEY made in commit 'c'. This results in a completely new commit, commit 'b1', and a smaller & simpler commit history:
b1 | c | a
So that seems superior, right? Clean, linear, compact history; every commit is a meaningfull change. Much nicer than the fork-away, merge-back alternative with its commits that do nothing other than bring branches back together again.
Well, it might *look* nicer, but it comes with problems, too. The biggest one is that timestamps are always preserved. So if you created commit 'b' before they committed 'c', but you didn't push it until later, rebasing will give you a linear history that's no longer in chronological order.
In our simple one-off examples above, this is no big deal. If you've got an entire history that's been heavily-rebased and your problem is "This bug started happening on Tuesday last week", you have a big problem: You can't just track back through your simple, linear branch until you get to last Tuesday's commits. You have to keep going back until you're absolutely certain there are no other commits further back in the history that are newer chronologically.
If you aren't aware of this and you start running your "git bisect" using your "good" base as the last commit made on Monday last week, you'll go through a long and time-consuming process that will be a total waste of time because the real culprit was actually in a commit made on Tuesday that happened (according to git) before Monday!
The "simple linear history" is a lie. The branched history might not be as pretty, but it's an accurate representation of what happened. If you see two branches in your log, you know you need to track both of them for the offending commit.
The whole reason to use a VCS is to have it record your history. Rebasing destroys your history, and therefore destroys the point of using a VCS.
"But without rebasing you get so many merge commits," I hear you cry.
I disagree. You only get the plethora of merges if you're using git wrong.
If you've never seen Linus Torvalds' talk on git, I recommend watching it. Because at one point, you'll hear him explain why he thinks Subversion is stupid - because they made it really easy to branch, but it's really hard to merge your branches back afterwards. (Having been through the process using svn, btw, I completely agree with him - it sucks)
Git was designed from the ground up to make it easy to branch, and easy to merge. People who have the misfortune of coming to git from other VCS's tend to carry in a philosophy of "Branches are hard" and thus they try to avoid it. They'll do all their work on master, because that's what they're used to.
But this is git. Branches are easy. You should use them. You should use them more than you do. And I'm confident in saying that, because I've never yet met anyone (myself included) who branches as often as they should.
It's so tempting to stay on master, to think "It's just a quick fix, it's not worth branching for!"
Or to think "I created a new branch for my project, we will now all work on the project branch" when you should instead say "I created a new branch for my project, we will now all work on feature branches forked off from the project branch."
This is where rebasing starts to hurt your git usage: Because you can rebase to avoid a merge-filled history, you will do so. And so you won't learn that what you should have done instead is to be on a branch. Here's a golden rule for using git that far too few people follow: You should never be working on a branch that other people will be pushing commits to. Fork it.
That will seem like overkill to some people: A branch off master for the project, and then a branch off the project for each feature?? You might wind up with a dozen branches!
Yes. You might. So what?? Back to my mantra: This is git. Branches are cheap. Merging is easy. You are not using enough branches.
If all your work is on a feature branch, it doesn't matter that other people are updating master: Their commits do not affect your branch. When you want to publish your work, checkout master, pull it, then merge in your branch. You get one merge commit, and you get a history in your VCS that is a true match for what actually happened. If instead you had rebased, you'd have a "Hitler Diaries" type of history - one that might seem to match real history on a casual glance, but turns out to be a pack of lies when you look closer.
So what about changes that genuinely aren't worth branching for? Correcting a typo you just noticed in a comment, for instance?
Sure. Don't branch for that - I wouldn't. But do:
You have two options here: A soft reset back to origin's HEAD, and then re-commit your work. Or go right ahead and do an interactive rebase to fix up your history. It's only a tiny change. Nobody is ever likely to care. The occasional white lie is fine ("That new haircut really suits you!") - outright, ongoing deception ("I love you and want to marry you, it's nothing to do with your money") is not.
What about fixing up history for reviewers? Say, if you make a commit that introduces a bug, but don't notice it until a few commits later. The reviewer going through commit-by-commit is likely to spot it and flag it before they see that you fixed it later. Rebasing solves that.
True enough. But - like the typo fix above - this should be a very rare occasion. If you're consistently writing buggy code and rebasing to fix it, then you're coding badly. Don't fix the symptom by rebasing endlessly, figure out your problem. And you do have a problem, because not only are you writing crap code, but you're committing it as well!
Look closer at your diffs. Write more unit tests. Run them more often. Whatever, figure out what you need to do to avoid routinely making bad commits.
Because if you keep your work on the main branch and you frequently commit bad code, then the day will come when you hit the absolute no-no of rebasing: You'll push a bad commit to a remote, and then you'll be stuck because you absolutely must not rebase published history.
Rebase is like a painkiller - it's perfectly ok to use it from time to time. But if you're using it daily, then you have an underlying problem that you need to solve. Don't keep hiding the symptom, diagnose and fix the real issue.
You get merges every time you pull? Get onto a branch!
You keep committing bugs? Test and review your code before you commit it!
You need to condense a dozen "work in progress" commits into a few "worthwhile" commits? Soft reset when you come back to a branch you had a WIP commit on, don't keep unwanted bad commits in your private branches.
You need to split a single huge commit into more atomic commits? Commit more often and look up "add -p".
You really, really need to collaborate in real-time with another dev. and so must share all your code? At this point you're pair-programing - maybe look up GNU screen & its 'acladd' command to allow you to share a terminal with your collaborator. Or just tell each other when you're about to commit.
I use git every single day of my working life. On repos I share with colleagues and we all push work to. I cannot remember the last time I used rebase - I certainly haven't used it this month, I possibly haven't used it this year.
It has valid uses. But they are few and far between. You can only ever use it on non-published history; you shouldn't use it on large numbers of commits; it's rarely needed when dealing with small numbers of commits.
The best thing that could happen to rebase is that it gets relegated to "power tool that you don't find out about until you're a git wizard" because far too many people use it as a crutch to support their ability to use git without understanding it.
If you use rebase more than once a week, I maintain that you have a problem. It might be hard to spot, it might be rough on your ego, but that's my opinion. And if you can figure out what the problem is, and fix it, then you're the one who benefits.
Rebuttals or use-cases for rebase that I haven't considered are welcome in the comments.
|<< <||> >>|