Stacked Diffs: Keeping Phabricator Diffs Small

2016-07-12

In my previous post, I argued that Large Diffs Are Hurting Your Ability To Ship. Now I'm going to show you how to keep your diffs small using Phabricator. Don't use Phabricator? This workflow has been ported to Github in Stacked Pull Requests: Keeping Github Diffs Small.

Here's the basic strategy:

  1. Break up a large change into a series of individual commits
  2. Stack the commits on top of one another.
  3. Create Phabricator Diffs for each commit.

Let's walk through a real world example.

1: Create the First Commit

Time to make the first bit of our change. Starting on master, we'll checkout a branch for the new feature, add a new class to our project, make our first commit, and sent it out for review:

$ git checkout -b new_feature
$ vim NewClass.java
$ git add NewClass.java
$ git commit -m "[Part 1] add NewClass.java"
$ arc diff HEAD^

If we look at our git log we can now see the new branch with the new commit:

* 512bfc6 - (8 seconds ago) [Part 1] add NewClass.java - Kurtis Nusbaum (HEAD -> new_feature)
* 172628f - (70 seconds ago) add temporary file - Kurtis Nusbaum (origin/master, origin/HEAD, master)

And if we checkout Phabricator, we'll see our diff is up for review:

2: Commits 2 Through N

We continue coding, making an additional logical change. For the sake of this example, let's say the class uses functionality in the NewClass.java file we added in the first commit:

$ vim DependentClass.java
$ git add DependentClass.java
$ git commit -m "[Part 2] add DependentClass.java"
$ arc diff HEAD^

Here's what our git log looks like now:

* 7eb8f56 - (3 minutes ago) [Part 2] add DependentClass.java - Kurtis Nusbaum (HEAD -> new_feature)
* 512bfc6 - (5 minutes ago) [Part 1] add NewClass.java - Kurtis Nusbaum
* 172628f - (6 minutes ago) add temporary file - Kurtis Nusbaum (origin/master, origin/HEAD, master)

Running arc diff HEAD^ now sends up our second commit (and only our second commit) to differential:

We now have two commits in differential, that can both be reviewed independently. When your reviewers are looking at Part 1, they can focus solely on the changes introduced by NewClass.java. When your reviewers are looking at Part 2, they can just focus on the additional functionality that is added by the DependentClass.java file. The benefits here are the same you get with abstraction, the reviewer has to hold fewer things in their head.

3: Address Feedback

Now let's say a reviewer caught a mistake in your Part 1 diff and we need to go back and fix it.

Let's do this using an interactive rebase. While we're still on the my_feature branch we'll run the following:

$ git rebase -i master

Doing this will drop us into a text editor where we can opt to edit commits between our current commit and the master commit. Since we want to edit the first commit, we'll follow the instructions given by git and switch the pick associated with the first commit to edit. We'll then save the file.

edit 512bfc6 [Part 1] add NewClass.java
pick 7eb8f56 [Part 2] Add DependentClass.java

# Rebase 512bfc6..7eb8f56 onto 172628f (2 command(s))
#
# Commands:
# p, pick = use commit
# r, reword = use commit, but edit the commit message
# e, edit = use commit, but stop for amending
# s, squash = use commit, but meld into previous commit
# f, fixup = like "squash", but discard this commit's log message
# x, exec = run command (the rest of the line) using shell
# d, drop = remove commit
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#
# Note that empty commits are commented out

If you're not familiar with an interactive rebase here's essentially what's happening. Git rewinds all of your progress, uncommiting all the changes you've made. Then it plays back each of those changes in the order you specify in the file above. If any of the commits are marked with edit, git will stop at the commit and let you make changes. In fact, this is exactly what git tells you once you start this rebase:

topped at 512bfc6...[Part 1] add NewClass.java
You can amend the commit now, with

	git commit --amend 

Once you are satisfied with your changes, run

	git rebase --continue

And if we take a look at our git log now, we'll see that our HEAD is actually pointing to this commit: So following git's instructions, let's make our fix, amend, upload the fixes to Differential, and continue the rebase:

$ vim NewClass.java
$ git add NewClass.java
$ git commit --amend
$ arc diff HEAD^
$ git rebase --continue

Once our rebase is finished, we'll be right back where we started, at the tip of the new_feature branch:

* 2bf959f - (11 minutes ago) [Part 2] add DependentClass.java - Kurtis Nusbaum (HEAD -> new_fea
ture)
* f24d59b - (12 minutes ago) [Part 1] add NewClass.java - Kurtis Nusbaum
* 172628f - (13 minutes ago) add temporary file - Kurtis Nusbaum (origin/master, origin/HEAD, m

And we see that our diff is now waiting to get re-reviewed:

4: Land the Diff

Alright, Part 1 has been approved and is ready to land:

We want to specifically land Part 1 of the branch we're on so we use it's git refhash when running arc land. Also, we still have commits on the branch that we're actively editing. So instead of deleting the branch, we tell arc that we'd like to keep it for now.

$ arc land 2cccd79 --keep-branch

Your diff has now landed! Our git log now looks like this:

* d5980f3 - (12 minutes ago) [Part 1] add NewClass.java - Kurtis Nusbaum (origin/master, origin/HEAD, master)
| * 2bf959f - (11 minutes ago) [Part 2] add DependentClass.java - Kurtis Nusbaum (HEAD -> new_fea
ture)
| * f24d59b - (12 minutes ago) [Part 1] add NewClass.java - Kurtis Nusbaum
|/
* 172628f - (13 minutes ago) add temporary file - Kurtis Nusbaum 

You'll notice that the Part 1 commit seems to appear twice. Let's clean this up by rebasing our my_feature branch onto the new master. The new master should include the commit we just landed, so after the rebase we'll see that our my_feature branch actually only include the Part 2 commit. This is pretty nice. After each land and then rebase, our branch will now only include the commits that haven't yet been landed.

$ git rebase master

After doing this, we'll see that our git log is nice and tidy:

* 589aa33 - (11 minutes ago) [Part 2] add DependentClass.java - Kurtis Nusbaum (HEAD -> new_fea
* d5980f3 - (12 minutes ago) [Part 1] add NewClass.java - Kurtis Nusbaum (origin/master, origin/HEAD, master)
ture)

5: Rinse and Repeat

If we'd had more than just two diffs in our stack we would just keep repeating the same processes in steps 3 and 4. Once all but the last diff in a stack has been landed, we can just run arc land my_feature to land the last commit.

Provisos, Pro-tips, and Parting Notes

Special Note

I want to give a special thanks to all of the mentors at Facebook who taught me this workflow and to all the folks at Uber who have helped me refine it. Special thanks to Grayson Koonce for his massive help with editing this post.

Appendix

Rebasing before landing

When landing an individual commit in your stack, there’s a chance there might be a conflict when landing. In order to avoid any conflict issues, we update master and rebase our branch to ensure a clean land.

Setting up a staging area

  1. find your repository in diffusion, e.g. https://code.mycomany.com/diffusion/GRREF/
  2. click “edit repository”
  3. copy the “remote uri” value
  4. edit the “staging area”, and paste in the remote uri value
  5. save!