Large Diffs Are Hurting Your Ability To Ship

2016-07-12

I've worked at two companies now that have a culture of code review. I'm convinced that it's really the best way to develop software. But reviewing code is a whole world unto itself. There are entire tools designed around reviewing code.

If you're using Phabricator, the fundamental unit of code review is called a diff. If you're using Github, it's called a pull request (PR). If you're working at Google it's called a change-list. Whatever you choose to call it, a diff represents the set of lines you wish to add or remove from a code base.

But when diffs become too big, a certain set of problems appear.

The Case For Small Diffs

There are four main reasons why you should keep your diffs small.

Large diffs won't get reviewed

First and foremost, your reviewers are not going to want to review your diffs. I don't care how much code you churn out, if none of it gets reviewed the code will never make it to master. You've effectively written zero code. Even if you can convince someone to review your huge diff, it's going to take way longer to review than if you'd broken it up into smaller diffs. This is because it takes longer for your reviewers to understand all the things that are happening in your diff and how they all interact. When you write a big diff, you're essentially saying:

"I didn't want to take the time to split this into small diffs. As a consequence, you the reviewer will now have to spend more time reviewing my diff. This is fine because my time is more valuable than yours".

Be kind. Respect your reviewer's time. Write small diffs.

It's hard to spot bugs in large diffs

Consider the classic tweet. It's way easier to spot bugs in small diffs. This is because review fatigue is real. By the time I get to the 500th line of a diff, this is my internal monologue:

"Oh sweet jesus there's more. Dear God. Please make it stop. What does this line of code even have to do with 500 lines I've seen above."

There's no way I'm spotting a potential Null Pointer Exception when all my brain wants to do is make the pain stop.

Reverting large diffs is hard

Small diffs are much easier to revert than large diffs. Imagine you've got a large diff with four logical changes in it. If any one of those changes breaks something you now have to rollback the entire diff. Your one mistake drags down the other three good changes.

Rebasing/merging large diffs is hard

Small diffs are much easier to rebase/merge than large diffs. If you have a 700 line diff that makes changes across multiple files and directories, you've just signed your self up for a bounty of rebase/merge conflicts. The moment you try to integrate your work with that of the rest of your co-worker's, you will be boned.

What you can do about it

Hopefully by now, you're onboard the small diff train. But how do you actually achieve small diffs? The following 2 workflows illustrate how to break down large pieces of work into small diffs:

Stacked Diffs: Keeping Phabricator Diffs Small

Stacked Pull Requests: Keeping Github Diffs Small

Happy coding!