[1+1=2]

OneAndOneIs2

« 3D printers - worry for the right reasonsProtip: Enable !$ »

Tue, Apr 30, 2013

[Icon][Icon]Please, stay away from rebase

• Post categories: Omni, FOSS, Rant, In The News, Technology, Programming, Helpful

I came across this plea on Hacker News to always use "git pull --rebase" instead of the default merge behaviour.

The logic was that if somebody has pushed to the remote branch since you last did a pull, and you've also made local commits to it, you'll get a merge commit. And if this happens frequently, you get a history that's mostly merges, and this is ugly and clutters your history.

I agree that having lots of merge commits is a Bad Thing. But I also contend, and will in this post try to persuade you, that not only is rebase the wrong solution to his problem, it is so often the wrong solution that (a) rebasing prevents people from learning how to use git correctly, and (b) rebasing should be regarded as a code smell - if you use it often, it's probably a symptom that you're doing something wrong.

Those are pretty sweeping claims considering that some people think that the existence of rebase is grounds for the removal of git's merging capabilities. So I've got to be pretty convincing with my arguments, I guess.

So, firstly, let's just remind ourselves: What is rebasing?

Well, if your current HEAD is commit 'a', and you make some changes and commit them, you have based your changes off commit 'a' to get commit 'b'. If somebody, in the meantime, made their own commit, commit 'c', also based off commit 'a', then you have a problem: you both want the head of the current branch to contain your changes, but you appear to have forked the branch.

The standard way of fixing this is to create a new commit that has both 'b' and 'c' as parents, merging them so the new HEAD has both your changes:

  d
 / \
b   c
 \ /
  a

The alternative is to rebase: To apply the changes you made in commit 'b' to the changes THEY made in commit 'c'. This results in a completely new commit, commit 'b1', and a smaller & simpler commit history:

 b1
 |
 c
 |
 a

So that seems superior, right? Clean, linear, compact history; every commit is a meaningfull change. Much nicer than the fork-away, merge-back alternative with its commits that do nothing other than bring branches back together again.

Well, it might *look* nicer, but it comes with problems, too. The biggest one is that timestamps are always preserved. So if you created commit 'b' before they committed 'c', but you didn't push it until later, rebasing will give you a linear history that's no longer in chronological order.

In our simple one-off examples above, this is no big deal. If you've got an entire history that's been heavily-rebased and your problem is "This bug started happening on Tuesday last week", you have a big problem: You can't just track back through your simple, linear branch until you get to last Tuesday's commits. You have to keep going back until you're absolutely certain there are no other commits further back in the history that are newer chronologically.

If you aren't aware of this and you start running your "git bisect" using your "good" base as the last commit made on Monday last week, you'll go through a long and time-consuming process that will be a total waste of time because the real culprit was actually in a commit made on Tuesday that happened (according to git) before Monday!

The "simple linear history" is a lie. The branched history might not be as pretty, but it's an accurate representation of what happened. If you see two branches in your log, you know you need to track both of them for the offending commit.

The whole reason to use a VCS is to have it record your history. Rebasing destroys your history, and therefore destroys the point of using a VCS.

"But without rebasing you get so many merge commits," I hear you cry.

I disagree. You only get the plethora of merges if you're using git wrong.

If you've never seen Linus Torvalds' talk on git, I recommend watching it. Because at one point, you'll hear him explain why he thinks Subversion is stupid - because they made it really easy to branch, but it's really hard to merge your branches back afterwards. (Having been through the process using svn, btw, I completely agree with him - it sucks)

Git was designed from the ground up to make it easy to branch, and easy to merge. People who have the misfortune of coming to git from other VCS's tend to carry in a philosophy of "Branches are hard" and thus they try to avoid it. They'll do all their work on master, because that's what they're used to.

But this is git. Branches are easy. You should use them. You should use them more than you do. And I'm confident in saying that, because I've never yet met anyone (myself included) who branches as often as they should.

It's so tempting to stay on master, to think "It's just a quick fix, it's not worth branching for!"

Or to think "I created a new branch for my project, we will now all work on the project branch" when you should instead say "I created a new branch for my project, we will now all work on feature branches forked off from the project branch."

This is where rebasing starts to hurt your git usage: Because you can rebase to avoid a merge-filled history, you will do so. And so you won't learn that what you should have done instead is to be on a branch. Here's a golden rule for using git that far too few people follow: You should never be working on a branch that other people will be pushing commits to. Fork it.

That will seem like overkill to some people: A branch off master for the project, and then a branch off the project for each feature?? You might wind up with a dozen branches!

Yes. You might. So what?? Back to my mantra: This is git. Branches are cheap. Merging is easy. You are not using enough branches.

If all your work is on a feature branch, it doesn't matter that other people are updating master: Their commits do not affect your branch. When you want to publish your work, checkout master, pull it, then merge in your branch. You get one merge commit, and you get a history in your VCS that is a true match for what actually happened. If instead you had rebased, you'd have a "Hitler Diaries" type of history - one that might seem to match real history on a casual glance, but turns out to be a pack of lies when you look closer.

So what about changes that genuinely aren't worth branching for? Correcting a typo you just noticed in a comment, for instance?

Sure. Don't branch for that - I wouldn't. But do:

  • "git pull" before you make any changes - someone else might have fixed the typo already.
  • Make the change, commit it, pull - it's unlikely that anyone else will have pushed in the meantime, but be sure.
  • Oh noes! Somebody pushed a commit in that tiny window of opportunity! Now I have an ugly merge commit! :(

You have two options here: A soft reset back to origin's HEAD, and then re-commit your work. Or go right ahead and do an interactive rebase to fix up your history. It's only a tiny change. Nobody is ever likely to care. The occasional white lie is fine ("That new haircut really suits you!") - outright, ongoing deception ("I love you and want to marry you, it's nothing to do with your money") is not.

What about fixing up history for reviewers? Say, if you make a commit that introduces a bug, but don't notice it until a few commits later. The reviewer going through commit-by-commit is likely to spot it and flag it before they see that you fixed it later. Rebasing solves that.

True enough. But - like the typo fix above - this should be a very rare occasion. If you're consistently writing buggy code and rebasing to fix it, then you're coding badly. Don't fix the symptom by rebasing endlessly, figure out your problem. And you do have a problem, because not only are you writing crap code, but you're committing it as well!

Look closer at your diffs. Write more unit tests. Run them more often. Whatever, figure out what you need to do to avoid routinely making bad commits.

Because if you keep your work on the main branch and you frequently commit bad code, then the day will come when you hit the absolute no-no of rebasing: You'll push a bad commit to a remote, and then you'll be stuck because you absolutely must not rebase published history.

Rebase is like a painkiller - it's perfectly ok to use it from time to time. But if you're using it daily, then you have an underlying problem that you need to solve. Don't keep hiding the symptom, diagnose and fix the real issue.

You get merges every time you pull? Get onto a branch!

You keep committing bugs? Test and review your code before you commit it!

You need to condense a dozen "work in progress" commits into a few "worthwhile" commits? Soft reset when you come back to a branch you had a WIP commit on, don't keep unwanted bad commits in your private branches.

You need to split a single huge commit into more atomic commits? Commit more often and look up "add -p".

You really, really need to collaborate in real-time with another dev. and so must share all your code? At this point you're pair-programing - maybe look up GNU screen & its 'acladd' command to allow you to share a terminal with your collaborator. Or just tell each other when you're about to commit.

I use git every single day of my working life. On repos I share with colleagues and we all push work to. I cannot remember the last time I used rebase - I certainly haven't used it this month, I possibly haven't used it this year.

It has valid uses. But they are few and far between. You can only ever use it on non-published history; you shouldn't use it on large numbers of commits; it's rarely needed when dealing with small numbers of commits.

The best thing that could happen to rebase is that it gets relegated to "power tool that you don't find out about until you're a git wizard" because far too many people use it as a crutch to support their ability to use git without understanding it.

If you use rebase more than once a week, I maintain that you have a problem. It might be hard to spot, it might be rough on your ego, but that's my opinion. And if you can figure out what the problem is, and fix it, then you're the one who benefits.

Rebuttals or use-cases for rebase that I haven't considered are welcome in the comments.


27 comments

Roberto
Comment from: Roberto [Visitor]
This is one good argument you have.

I've started using GIT only lately, but i've used Mercurial for the past 3 years and most of the concepts are the same.

People is scared about branches but they shouldn't, this is what distributed VCS are about.
You do your development on your branch and then merge it back when you are done.

Good article,

R.
30/04/13 @ 14:11
max
Comment from: max [Visitor]
I am assuming you are aware of --no-merges arg to "git log" as well as ease of creating alias commands in .gitconfig?

If the biggest perceived problem with merges is the log format just add something like this to your .gitconfig:
[alias]
l = log --no-merges

so when you do "git l" it won't show them.
30/04/13 @ 15:47
Chris
Comment from: Chris [Visitor]
Thanks mate. I am not really deep into GIT, but I am using it daily now. I never figured out why people rebase, it felt wrong to me. Now I know. This is great.
30/04/13 @ 16:05
jhare
Comment from: jhare [Visitor] Email
Like has been said, branches are cheap, _but your time is not_. Often what seems clever and shirks a standard process may work in a lot of cases... until something goes wrong. Not only is something broken, but you have a mess to unravel.

Using similar logic in development would be "This code works 80% of the time until we get an exception!" :/
30/04/13 @ 16:59
Alex
Comment from: Alex [Visitor]
What do you think about using interactive rebase to fusionize many commits before merging them into master? This way, code reviews become much easier because the reviewer does not need to analyze a ton of commits ...
30/04/13 @ 18:13
Ismael Gimenez
Comment from: Ismael Gimenez [Visitor]
Great post!!!!
30/04/13 @ 19:44
Magnus Reftel
Comment from: Magnus Reftel [Visitor]
At my previous job, we had a workflow where all patches go though pre-commit review in Gerrit, and are merged to master from there when they are done. It was a rare patch that went through fewer than three patch sets before it was accepted (the reviews could be pretty strict and/or opinionated at times), so when a patch was ready to be merged, it was regularly a few versions behind. We handled that by rebasing after review but before merge.

While we could have merged to master and handled conflicts there, insisting on the submitter rebasing and resubmitting meant that new changes due to the merge could be reviewed again if necessary. Should we have skipped that? What would the benefit have been?
30/04/13 @ 19:56
Goran Petrovic
Comment from: Goran Petrovic [Visitor]
Rebase is great if you know what you are doing. It gives you an amazing power.

The guy in the post mentioned that if it fails, you abort and merge. I would probably just resolve the conflicts and keep my changes on top, I hate unnecessary nonsemantical merges.

All in all, as long as you know what you are doing, rebase is amazing.
30/04/13 @ 20:42
jessica kerr
Comment from: jessica kerr [Visitor]
Thanks for this - I didn't know about the timestamp problem.

When you're working on a branch, do you ever pull the latest from master into your branch? I find it important to do this on a daily basis, preferably more. Rebase makes it easy. Merge would leave a bunch of commits... but on the branch, which is better than master.
30/04/13 @ 20:52
Robert Cowham
Comment from: Robert Cowham [Visitor]
I agree with the basic thrust. My general preference for version control principles is "all history is good, even if it was a mistake". I do not like the gratuitous use of Git features to "rewrite history" - it just feels wrong.

Of course there are always exceptions to the rule, but these should be genuine exceptions.

Rather like deciding on the difference between a change process and an emergency change process - why can't all changes do the emergency process?! What is it missing? If nothing important then why force other changes down a different route. If it is missing important things, then don't miss them out - just make them easier/faster to do.
30/04/13 @ 21:48
Braden Shepherdson
Comment from: Braden Shepherdson [Visitor]
Chronological order matters less than you think. Git log shows commits in their order of committing to the repo (aka topological order, following the parent chain), which is precisely what you want when you're looking for good and bad commits from Tuesday. git log --before and friends use the commit time, not the authorship timestamp that git log prints. Try it.

You could use git log --before=<1am Wednesday> and git log --before=<1am Tuesday> to get good and bad commits to feed to git bisect. Just ignore their timestamps.

I will concede that seeing the inconsistent timestamps is annoying, and I don't know why one can't pass an option to git log so it will show the commit times too/instead. Git's user interface leaves a lot to be desired, film at 11.

Rebase is a useful tool and this weakness is not enough to be worth banning it. It helps when dusting off old pull requests and experimental branches, and when squashing multiple commits to keep them complete and self-contained. It's not necessary, but I fail to be convinced that the timestamp quirk makes them worth banning to the arcane fringes of use.
30/04/13 @ 23:32
Mike Perdide
Comment from: Mike Perdide [Visitor]
I agree with you: branches are cheap, branches are awesome, let's use more of them! Yay! But that does not make it a could reason to dismiss pull/rebasing altogether.

The other argument against rebase you're giving is: your don't know what was added when. I disagree: that's what CommitDates are made for. In every repo I work on, if I want to know at which point some code was added to master, I just need to look at the history and the commit dates. I don't see where the trouble lies. The date a commit was originally authored is irrelevant: it's the date this commit came into production that is.

You write that although the history will be messy and convoluted, at least it won't be a pile of lies like what the "Bad Rebase Guys" are doing. What's the point of having a history you can't read? Having a clean history makes it easier to see when a commit came into production.

Rebasing common history is an absolute no-no, don't get me wrong. But rebasing your commits on top of the current master before making a pull request is a courtesy to the dev who is going to review and merge your code. Why should he be the one to solve cataclysmic conflicts in dozens of files, just because you started a branch months ago and didn't keep up with the common work?

A clean history is a great tool to look back on a project, and rebasing is a part of that.

(FYI, our team works with branches and pull requests, and the merger does a pull --rebase on his master, then a git merge --no-ff of the branch, before pushing. That way all the commits related to a branch/feature are separated from the rest when displayed in a graphical history viewer).
01/05/13 @ 00:02
Krzysztof Hasinski
Comment from: Krzysztof Hasinski [Visitor]
Linus himself mentioned that you shouldn't merge until you really mean it and know what you merge.

Editing history is an important part of git local commit functionality.

Timestamps aren't important as git maintains another set of timestamps which are rewritten when `rebase`ing.

Avoiding bad commits just to avoid rebases is ridiculous.

I say it again - editing LOCAL history is an important feature in git that makes it a great tool. You can experiment, test, save and rearrange your work so you push good code with clean history.

Avoiding rebases is similar to using subversion with better merges. This is git, use it! ;) If you never rebase and always merge it means you broke bisects, code reviews and basic readability.
01/05/13 @ 00:44
Vincent Povirk
Comment from: Vincent Povirk [Visitor]
I agree that everyone should be using branches more.

"Test and review your code before you commit it!" This is a terrible idea because the other thing everyone should be using more is commits. I would say that, at a minimum, every time you have code that is known to do at least one thing correctly, and is different from the code that was in your repo, you should commit before you make any further changes. Otherwise, when you break that one thing that you knew worked, you won't have any diffs to consult when you're trying to figure out why.

Dividing your work into logical, small but self-contained steps is a very good thing, because that makes your changes easier to review, easier to understand, and (if they're truly self-contained) better for bisecting. Such things cannot always be easily split given a single diff; you may need some lines of code in intermediate stages that aren't exactly present in either version. And you may not be able to publish your changes until the patch set becomes fairly large; perhaps you're laying groundwork for a new feature but you won't be sure that your groundwork supports it well until that feature is almost done. What that means is that you can easily be in a situation where you have to go back and change something, and you need a rebase because splitting your changes doesn't make sense. (I'd argue that the threshold for that is 2 commits, because splitting requires me to duplicate work that I might do wrong the second time, and makes it difficult for me to test the intermediate versions.)

The so-called "timestamp problem" is a case of confusing UI. All commits, including rebased commits, have a correct and chronological commit timestamp (unless you manually change yours for some reason), but the author timestamp, which is preserved by rebase, tends to be shown more prominently. So that's not an inherent problem with rebase and could have been avoided with foreknowledge (but is imo a problem with the UI, not the users).

Here are some use cases for rebase you might not have considered:

* Contributing to a project that requires a linear history with small, self-contained commits, such as Wine. Obviously, rebase is required in this case, but then you have to ask whether such a requirement is sensible. I believe it is, because Wine regresses often (it has to, because we can't test every Windows program every time we want to make a commit, and they make a lot of assumptions about the API), and that means people bisect it often. Having a history filled with merges, large changes, or changes that are not self-contained would make the bisects more difficult and the results less useful for various reasons that I won't go into here.
* Contributing to a project that uses SVN. I will take your word that it's possible for SVN to branch and merge, but if you're a Git user you probably want to just use git-svn which demands a linear history when you try to push changes to it.
* Bisecting a merge of upstream into a true long-lived fork. If you have a bug that can only be tested in the fork, the only way to bisect is to rebase all of the changes in the merge (i.e. the right parent of the merge) on top of the previous revision of the fork (the left parent of the merge). Rebasing to update a long-lived fork is, of course, a terrible idea that causes all sorts of problems, so you have to do the rebase only when needed for a bisect.

Personally, I avoid "git pull" in all of its forms. To do updates I use:

$ git fetch
$ git merge --ff-only origin/master

But with an alias for merge --ff-only so I don't have to type that out all the time. I might be willing to support git pull if it could be made to do the equivalent.

Merge --ff-only will fail if I have local changes, committed or not. I can then decide what to do from there. Maybe the changes are in a feature branch that I forgot I had been working in, and the correct action is to commit anything remaining as a WIP and switch back to master. Maybe it's some debugging code I forgot I had left in place. Maybe it's a patch I had applied using 'git am' or reverted for testing. The odds of me actually remembering that my tree is dirty before I update are pretty low (the most common exception being updates following failed pushes), so there's no way I can prepare git ahead of time for that situation.

But my real problem with git pull (with or without --rebase) is that it hides things from users that can harm them later. If you have a local commit in your tree that you don't know about (you forgot, you did something stupid that you didn't know would produce a commit, or you don't know what a commit is because you're not a developer and you're just doing what you were told so you can test the latest version or a project or do a bisect or whatever), git pull will silently keep that commit around. (I say "silently". It actually does tell you, but the command succeeds, and git tends to tell you a lot of things that aren't important and/or have some obscure meaning that you only understand if you really know git. So people aren't going to notice that the output of 'git pull' is slightly different than it used to be, especially if they're not experienced with git.) If you're lucky, this will end when it conflicts, and maybe you'll even figure out how to resolve it without cloning again. If not, you will form the impression that git is a terrible and unreliable piece of software, and you'll never want to use it again. If you're unlucky, it'll cause some other problem that will be blamed on the project rather than the local update.

Sure, pull is fine if you know what you're doing, but we need to design guides and tutorials with new users in mind, some of whom will never bother to learn more. For those who do, fetch && fast-forward isn't a bad habit and at worst may be slightly less convenient than pull.
01/05/13 @ 03:23
Matthew
Comment from: Matthew [Visitor] Email
Your argument about timestamps is flawed IMHO. Yes, rebasing doesn't preserve a chronological history, but this causing problems during bisection is a code smell itself. Your discussion implies that you noticed the issue pre-rebasing and are then bisecting a post-rebased tree. You should never, ever, ever be rebasing commits that have been pushed to production (which I'm assuming is the only environment from which you'll get bug reports of the form "things broke on Tuesday"). If you noticed a bug on a dev branch, your tests that detected this should give you the commit ID.
01/05/13 @ 05:41
Matt Briggs
Comment from: Matt Briggs [Visitor]
I work on a team of 14 devs, all committing to the same git repo. Without rebasing feature branches, the history is completely unreadable, a tangled mass of lines crossing over each other.

We always do --no-ff merges back to master after a rebase. this doesn't flatten the history, and gives a single merge commit that is revertable. (and also gives meaningful history). So instead of linear, things look more like this

|\
| |
| |
|/
|\
| |
I/


You still lose what point the code branched at the time the dev started working on it (in practice I don't think that is something I have ever cared about), but what you get is a readable history, even with the gigantic team.
01/05/13 @ 06:15
oneandoneis2
Comment from: oneandoneis2 [Member] · http://geekblog.oneandoneis2.org/
Sigh.

Note to the people telling me I shouldn't advocate "banning" rebase - please read the post *I* wrote: the one that explicitly says that rebasing should be considered a "power tool" for experienced git users; not a tool used by people who don't understand it; and not a tool that should be removed.

@Alex: re. interactive rebase to squash commits for review - I'm very much against it. If the reviewer doesn't want the minutiae, he can just do something like "git diff start-commit end-commit". Squashing them with rebase destroys your ability to view the history atomically.

@Vincent - the "test your code before commit" is a possible solution for somebody who persistently commits buggy code and is trying to avoid it. Not something I would advocate the average user doing for every commit.

I stand by the "review your diff before commit" though - I've seen too many lousy commits caused by people who just commit everything without first making sure they know what they're about to commit. All manner of leaked TODO's, debugs, and outright "This should never have been committed" junk gets in because people don't look at their diff before committing it.

To the people who keep saying "rebasing isn't lying, it's just editing your commits" - this is on a par with saying "I wasn't lying, I just wasn't telling the truth". Rebasing says your work was based off a commit that it simply wasn't. Its entire raison d'etre is to lie about history.

A really simple example from my own experience: Our main code repo uses the Catalyst framework. A lot of stuff gets stored in $c->{stash}. Sometimes, enough is being called from it that somebody will do "$stash = $c->{stash}" as a micro-optimisation to cut down on typing.

So we had dev A set one function to use $stash, and he dutifully refactored all the code to remove the $c version - as is correct procedure to keep our code consistent.

And then there was dev B, who added a block of code to the same function, and used the $c call because that was what was in place when he wrote it.

Then when dev B came to merge his changes, he rebased it onto dev A's work. There were no clashes, as he was simply adding a new block.

He then sent his code into review & it got red-penned because he'd used $c->{stash} when he clearly should have used $stash - because that's what the rest of the function did, as could be clearly seen *in his own commit*

Sure, merging wouldn't have stopped the inconsistency from being introduced. But rebasing made it look like dev B had ignored prior art and was therefore at fault, whereas a merge would have kept the history intact and made it apparent that both devs had done the Right Thing at the time.

This is a trivial example that caused absolutely no actual bugs or problems. But it illustrates the point: Because history was re-written via rebase, it was no longer accurate.

Rebasing makes your history lie. That's how it works. It's what it does. It is a tool that exists specifically to allow you to make your history lie.

And as I point out in the post, the occasional lie for the right reasons is perfectly acceptable. Lying continually for no good reason is not.

Rebase when it makes sense to do so. Do not rebase when it doesn't. And until you know git well enough to be certain when rebasing does make sense, don't use it.

That last paragraph sums up the whole point of the post. Not "Never rebase", not "Rebase is always bad" - rebase is fine, when *used correctly*. It's just that most of the uses advocated for it are *not* correct, so until you can spot them, avoid them.
01/05/13 @ 09:28
Alfonso
Comment from: Alfonso [Visitor] Email
Hello, Dominic. I hope you don’t mind me chiming in so late after the discussion has died down…

I see the problem you illustrated in your last comment, but wouldn’t the timestamp then absolve dev B? Sure, the history makes it look like his changes were applied *after* dev A’s refactoring, but a quick glance at the timestamp reveals that his commit containing the $c call was written before the refactoring. It’s not as if rebase completely eradicated any evidence that it went down that way. In fact, what you call a flaw in rebase (the fact that the timestamp is preserved, even though a new commit date is created) actually provides the same “solution” to this problem as merge does: in both cases, you get to see when dev B’s offending commit was created. And like you said: a merge would not have avoided the inconsistency, so merge has nothing more than your preference going for it.

Your example also ignores the likeliness that a code reviewer in a team that allows rebasing (or even, you know, a reviewer who is aware of the existence of rebase and how commit histories work) will consider that a commit may have been rebased and thus will look for evidence of this. If you’re worried about someone *incorrectly accusing* dev B of being sloppy, then you don’t have a rebase/merge controversy, you have an ignorant ass making remarks based on a misread of the available information (information that is, again, available whether you rebase or merge).

You mischaracterize frequent rebasing when you compare it to “lying for no reason”. There is very good reason, for those of us who prefer to deliver a clean commit history. The fact that you consider a cleaner history not good enough reason is a question of personal preference, and while there is nothing inherently wrong with that opinion, it is just that—an opinion.

I also find your assertion that the frequent use of rebase is necessarily the product of problematic coding to be baseless and poisoning the well. And the last two paragraphs of your latest comment comprise a circular argument: frequent rebases are a sign that you don’t know what you’re doing, but you won’t know what you’re doing until you realize that frequent rebases are a sign that you don’t know what you’re doing. No disrespect, but perhaps if you actually illustrate how frequent rebasing is necessarily —or even very likely— the product of bad coding, then your assertion will cease to be conjecture and your last two paragraphs will at least have a way out of their little circle.

IMO, teams (or public repo admins) should determine what works best for their workflow, their philosophy, the nature of their project, and whether it will be public or not.
01/05/13 @ 14:33
oneandoneis2
Comment from: oneandoneis2 [Member] · http://geekblog.oneandoneis2.org/
> I hope you don’t mind me chiming in so late

Not at all

> wouldn’t the timestamp then absolve dev B

Sure, if you want to put the onus onto the reviewer of trekking back through non-chronologically-ordered history every time they see a bug on the offchance that they find a justification for it. But the whole point of the "rebase for the reviewer" argument is that rebasing makes the reviewer's life easier.

> The fact that you consider a cleaner history not good enough reason...

No, you're missing my point: If you use git properly, you get a tidy history without needing to rebase. This comes back to the point I keep making: Rebasing is a crutch that stops you learning how to use git properly. I address this very clearly in the post.

I don't maintain a "messy but accurate" history. I maintain a tidy and accurate history. As can anyone else if they put down rebase long enough to learn how.

> perhaps if you actually illustrate how frequent rebasing is ... the product of bad coding

Seriously?

You honestly need me to explain why being able to write something right the first time is better than only being able to write something that then requires re-writing later?

Or why it's better to take off your muddy shoes before walking through the house than to walk across and then mop up afterwards?

Frequent rebasing is the mindset of "Screw up now, fix up later". It's unnecessary and it's more work than just doing it properly.
01/05/13 @ 15:02
Darren Garvey
Comment from: Darren Garvey [Visitor]
You have some valid points, but you've gone too far the other way to prove your point.

Rebasing is very useful, just don't abuse it. I think your arguments here are only valid for "git pull --rebase" and aren't at all relevant to "git rebase -i". That may have been your intention, but by the end of the post, you're talking about rebase in general.

You're arguing that a rebasing-heavy workflow is ignoring the awesome power that git branching / merging gives, but that's not always true. Git is also great for committing regularly. If you commit regularly then there will be incomplete commits in there that need rebasing. Sometimes I might split what initially looks like a controversial design decision into a separate commit and decide later (before pushing) that it's perfectly fine and the separate commit isn't necessary.

Other times, there are several design iterations in a commit history when developing something non-trivial. It's good to have that iteration saved in git and leave it on a branch somewhere, but no-one should have to wade through it on a public branch.

I guess this isn't a problem if you're always following a waterfall process, or never iterate an idea, or never make any mistakes, or never want to code first, ammend commit messages later.

> Don't fix the symptom by rebasing endlessly, figure out your problem. And you do have a problem, because not only are you writing crap code, but you're committing it as well!

You're using the wrong term there. What you should say is "but you're /pushing/ it as well". There's absolutely nothing wrong with committing broken or fugly code as long as you don't push it.* In fact, I find knowing I can rebase means that if I see a problem, I can fix it right there and then knowing I can move it away from my branch later. Being able to do partial commits isn't always appropriate.

Plus, committing regularly and rebasing later is a great way of seeing how you develop. Seeing that evidence and reorganising it is a really useful way of learning how to code in a more logical and partitioned way. Without rebase, I'd be basically using a subversion client that can branch and merge really well. IMHO, that's missing out on a whole load of heavenly glories.

* If you do it too often, then you're probably wasting time - "Screw up now, fix up later" - as you say. There are other times it's useful, eg. switching branches while in the middle of something else.
01/05/13 @ 17:04
Foo
Comment from: Foo [Visitor]
I'm not sure to understand your ire with commit squashing in Git.

When I create a new feature in a program, I will create a new branch, do the work in small steps with a commit at each intermediate step, and then squash the intermediate commits into an "atomic" one that represents the feature I added. This makes sense functionally IMHO.
01/05/13 @ 17:09
oneandoneis2
Comment from: oneandoneis2 [Member] · http://geekblog.oneandoneis2.org/
> If you commit regularly then there will be incomplete commits in there..

So far so good

> ..that need rebasing.

Completely disagree - you can & should "commit early, commit often" and that means committing incomplete code. You should NOT then rebase away all your intermediate steps - that's useful history that will allow somebody to track back and understand why a piece fo code is that way it is that you blow away for no reason other than some abstract "cleanliness"

> committing regularly and rebasing later is a great way of seeing how you develop

No, it's the opposite - if you rebase away all the interim steps, you've just *lost* the ability to look back at how you developed.

If you don't squash commits, you can look back and see that a year ago you went through a dozen iterations to fix a problem whereas this time you solved it in just one - yay progress! If you rebased last year's fix into one to "clean up" your interim steps, you've just lost all that useful knowledge.

> no-one should have to wade through it on a public branch.

Again, the argument that seeing the sum total of a change is somehow difficult when it's in multiple commits: If you want a single at-a-glance "what changed", just get the diff between the start and the end - this is trivial. If you rebase all your commits into one, you can never break it back down again - this loses history and is bad.

Keep your options open: Keep your history.
01/05/13 @ 17:33
oneandoneis2
Comment from: oneandoneis2 [Member] · http://geekblog.oneandoneis2.org/
@Foo - I'm not going to sugar-coat it, what you describe is appallingly bad practice and you might as well stop bothering with a VCS if you're going to blow away your history like that. Seriously, it's that bad.
01/05/13 @ 17:35
csirac2
Comment from: csirac2 [Visitor]
FWIW I think Darren Garvey is completely correct. I rarely use git rebase -i on small internal stuff, but contributing to open-source projects seems to be where I find it most useful. You're in there because something is broken, it's code you use but are unfamiliar with, and it takes a bit of a journey to fix a problem and update the tests. This is git, we've been conditioning ourselves to commit more - hours or less between each commit! We no longer fear commits. Commits are our friends. I almost always commit working code, or working tests, but sometimes there's a lag between both working simultaneously - especially if I need to move away from the task for a while (for example).

However, for the sake of your pull request/patches getting accepted in a timley manner, you need the cleanest, simplest history that your newfound reviewers/collaborators can reasonably be subjected to.

I've certainly seen "your branch is too noisy, can you clean it up" type comments - but never have I seen anything to the effect of "your 3 commits have confusing dates, I'd rather all 8 original commits, including that revert and both merges please".

This should be a really clear case for rebase -i, especially in those situations where the final solution to a problem is much smaller in changes than the sum of the new contributor's exploratory journey - a journey which adds nothing for the reviewer, given that the mistakes or dead-ends are probably obvious noise to them.
01/05/13 @ 18:17
Remmelt
Comment from: Remmelt [Visitor]
Some thoughts.

1) I'm working on a team with seven members pushing to origin. If no-one would pull --rebase, this would quickly lead to a veritable spiderweb of branches. There is a trade off between some abstract "completeness" and a clean history line. There is value in knowing how you develop and how you improve over time, but - for us - at the cost of speed of development. For us, I fear your method would result in loads of branches, since we're also using the git flow method, resulting in a loss of overview (which means I would have to fix it...) How does this Keep It Simple?

2) In scrum theory, tasks should be small. In practice it's not always feasible to break things up into bite sized chunks. Tasks might take longer than a single day, sometimes even days. In the mean time, intermediate code is not pushed on the develop branch. This means that this intermediate code is not continuously integrated. Yes, code should be perfect the first time, and sometimes errors do sneak in. The CI evangelists tell me to push at least once a day or I'm not practicing "real" CI. It is not feasible to set up CI tracks for every developer, release and feature branch, besides develop.

This leads me to think that there is a middle road. Our current MO is pull --rebase so there is less overhead and more clearness in the branches department, especially on origin. Plus, we get the added benefit of having our code tested every night. For "features" we use branches. Feature branches could and probably should be used more often.

Still, an interesting look at rebasing. It's always good to think about why we do the things we do, find a good reason and stick with t or change if one cannot be found.
01/05/13 @ 21:37
Vincent Povirk
Comment from: Vincent Povirk [Visitor]
> rebasing made it look like dev B had ignored prior art and was therefore at fault, whereas a merge would have kept the history intact and made it apparent that both devs had done the Right Thing at the time.

It doesn't matter if it was the right thing at the time dev B wrote his patch. No one cares what the world was like when he wrote it, because the world has moved on since then. What matters is the state of the world when dev B published his code, and he had a responsibility to make sure it worked at that time. Had he done a merge, the reviewer might not have caught the problem, because that problem would have been introduced by a merge commit with no conflicts and would not have shown up in his patch.

> Completely disagree - you can & should "commit early, commit often" and that means committing incomplete code. You should NOT then rebase away all your intermediate steps - that's useful history that will allow somebody to track back and understand why a piece fo code is that way it is that you blow away for no reason other than some abstract "cleanliness"

You absolutely should squish any intermediate steps at which the project is broken, to make things easier for people who need to bisect the project later. Of course, ideally for a large feature you'd have some intermediate steps that DON'T BREAK THE PROJECT, which are useful for understanding your changeset and shouldn't be squished away.

> If you don't squash commits, you can look back and see that a year ago you went through a dozen iterations to fix a problem whereas this time you solved it in just one - yay progress! If you rebased last year's fix into one to "clean up" your interim steps, you've just lost all that useful knowledge.

I think this gets at the heart of the disagreement: when you read your project's history, you want something different from what we who prefer rebases want. You want a complete record of the project's history and every contributor's history, even if those contributors had to fix bugs, redesign their code, or make changes at a maintainer's request before publishing their code. The rebasers, however, are only interested in the history of the project as a whole and find the individual contributors' histories to be not just irrelevant but actually counter-productive. We don't care that you got better at avoiding false starts since last year; we just don't want to see your false starts in our bisects. Therefore, we ask people to throw away their mistakes, false starts, and original topology before publishing, so that we don't have such irrelevant and counter-productive information in our projects' histories.
01/05/13 @ 22:19
oneandoneis2
Comment from: oneandoneis2 [Member] · http://geekblog.oneandoneis2.org/
@csirac2 - yeah, rebase -i to make things easier for somebody to pull your work is potentially what I'd do.

I'd draw a clear distinction between "rebasing because somebody else has changed the same code as you & you want to resolve the conflicts so nobody else has to" and "rebasing when somebody else has changes unrelated code and you're just doing it for the sake of it" though.

@everyone else - sorry, I'm tired of the constant cherry-picking of a single point I've made, removing all context, and then arguing with a point I'm not making. So whilst further (constructive) comments are welcome, I'm not going to be addressing further comments here. Catch me on irc if you really want to discuss - I'm 'djh' in #git

If you want to learn more about Git, I thoroughly recommend Pro Git
02/05/13 @ 08:40
 

[Links][icon] My links

[Icon][Icon]About Me

[Icon][Icon]About this blog

[Icon][Icon]My /. profile

[Icon][Icon]My Wishlist

[Icon]MyCommerce

[FSF Associate Member]


August 2014
Mon Tue Wed Thu Fri Sat Sun
 << <   > >>
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Search

User tools

XML Feeds

eXTReMe Tracker

Valid XHTML 1.0 Transitional

Valid CSS!

[Valid RSS feed]

multi-blog