Stacked Diffs: Keeping Phabricator Diffs Small
2016-07-12
In my previous post, I argued that Large Diffs Are Hurting Your Ability To Ship. Now I'm going to show you how to keep your diffs small using Phabricator. Don't use Phabricator? This workflow has been ported to Github in Stacked Pull Requests: Keeping Github Diffs Small.
Here's the basic strategy:
- Break up a large change into a series of individual commits
- Stack the commits on top of one another.
- Create Phabricator Diffs for each commit.
Let's walk through a real world example.
1: Create the First Commit
Time to make the first bit of our change. Starting on master, we'll checkout a branch for the new feature, add a new class to our project, make our first commit, and sent it out for review:
If we look at our git log we can now see the new branch with the new commit:
And if we checkout Phabricator, we'll see our diff is up for review:
2: Commits 2 Through N
We continue coding, making an additional logical change.
For the sake of this example, let's say the class uses functionality in the NewClass.java
file we added in the first commit:
Here's what our git log looks like now:
Running arc diff HEAD^
now sends up our second commit (and only our second commit) to differential:
We now have two commits in differential, that can both be reviewed independently.
When your reviewers are looking at Part 1, they can focus solely on the changes introduced by NewClass.java
.
When your reviewers are looking at Part 2, they can just focus on the additional functionality that is added by the DependentClass.java
file.
The benefits here are the same you get with abstraction, the reviewer has to hold fewer things in their head.
3: Address Feedback
Now let's say a reviewer caught a mistake in your Part 1 diff and we need to go back and fix it.
Let's do this using an interactive rebase.
While we're still on the my_feature
branch we'll run the following:
Doing this will drop us into a text editor where we can opt to edit commits between our current commit and the master commit. Since we want to edit the first commit, we'll follow the instructions given by git and switch the pick associated with the first commit to edit. We'll then save the file.
edit 512bfc6 [Part 1] add NewClass.java
pick 7eb8f56 [Part 2] Add DependentClass.java
# Rebase 512bfc6..7eb8f56 onto 172628f (2 command(s))
#
# Commands:
# p, pick = use commit
# r, reword = use commit, but edit the commit message
# e, edit = use commit, but stop for amending
# s, squash = use commit, but meld into previous commit
# f, fixup = like "squash", but discard this commit's log message
# x, exec = run command (the rest of the line) using shell
# d, drop = remove commit
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#
# Note that empty commits are commented out
If you're not familiar with an interactive rebase here's essentially what's happening. Git rewinds all of your progress, uncommiting all the changes you've made. Then it plays back each of those changes in the order you specify in the file above. If any of the commits are marked with edit, git will stop at the commit and let you make changes. In fact, this is exactly what git tells you once you start this rebase:
And if we take a look at our git log now, we'll see that our HEAD is actually pointing to this commit: So following git's instructions, let's make our fix, amend, upload the fixes to Differential, and continue the rebase:
Once our rebase is finished, we'll be right back where we started, at the tip of the new_feature
branch:
And we see that our diff is now waiting to get re-reviewed:
4: Land the Diff
Alright, Part 1 has been approved and is ready to land:
We want to specifically land Part 1 of the branch we're on so we use it's git refhash when running arc land
.
Also, we still have commits on the branch that we're actively editing.
So instead of deleting the branch, we tell arc that we'd like to keep it for now.
Your diff has now landed! Our git log now looks like this:
|
|
|
You'll notice that the Part 1 commit seems to appear twice.
Let's clean this up by rebasing our my_feature
branch onto the new master.
The new master should include the commit we just landed, so after the rebase we'll see that our my_feature branch
actually only include the Part 2 commit.
This is pretty nice.
After each land and then rebase, our branch will now only include the commits that haven't yet been landed.
After doing this, we'll see that our git log is nice and tidy:
5: Rinse and Repeat
If we'd had more than just two diffs in our stack we would just keep repeating the same processes in steps 3 and 4.
Once all but the last diff in a stack has been landed, we can just run arc land my_feature
to land the last commit.
Provisos, Pro-tips, and Parting Notes
- You'll see I always have to type
arc diff HEAD^
(I never use the standardarc diff
). Because of this, I've actually created a bash alias for doing my diffs so I never forget theHEAD^
part:alias ad="arc diff HEAD^"
- I was pretty explicit and verbose with a lot of the commands above for the purposes of explaining exactly what was going on. As you get more comfortable with the work flow, I highly recommend setting up scripts and git aliases to help you do all of this. For example, since this work flow involves often doing an interactive rebase starting from master, I've aliased git rebase -i master to simply git rim.
- If you're using Harbormaster and Jenkins for CI, you'll want to setup a staging area, other wise your stacked diffs will fail your integration tests (see appendix for how to do this).
- If you're ever unsure about what code is going to be sent up to Differential when you run
arc diff HEAD^
, just runarc diff HEAD^ -preview
. - If you ever screw up a rebase, git reflog is your salvation.
- Sometimes you'll amend during a rebase and that will cause a conflict further up in the stack when you continue the rebase.
When resolving these conflicts be sure to only use
git rebase --continue
. Don't do an amend when resolving these conflicts. I've made that mistake before. It's no fun. - When doing an interactive rebase, only edit one commit at a time. Doing more than one is an invitation for forget where you are in the stack and screw something up. I've made this mistake before. It's no fun.
- Always finish a rebase. If you walk away from your computer in the middle of a rebase, you may come back to it later, forget you're in a rebase, and then try to change branches. This can seriously screw up what your repository looks like and confuse the hell out of you. I've made this mistake before. It's no fun.
Special Note
I want to give a special thanks to all of the mentors at Facebook who taught me this workflow and to all the folks at Uber who have helped me refine it. Special thanks to Grayson Koonce for his massive help with editing this post.
Appendix
Rebasing before landing
When landing an individual commit in your stack, there’s a chance there might be a conflict when landing. In order to avoid any conflict issues, we update master and rebase our branch to ensure a clean land.
Setting up a staging area
- find your repository in diffusion, e.g. https://code.mycomany.com/diffusion/GRREF/
- click “edit repository”
- copy the “remote uri” value
- edit the “staging area”, and paste in the remote uri value
- save!