Git is a fantastic tool. Source control tools provide several vital features: a place to keep code, a history of changes, and information about who did what and when. If you and your team are good with commit messages, you may even get bonus info about why a change was made. The distributed tools such as Git and Mercurial also offer many possibilities for collaboration between multiple repositories, as well as providing effective ways of working with branches.
So what exactly is a branch in Git? Take a look at Figure 1. A branch in Git is simply a pointer to a commit. If you look at how a branch is represented in the '.git' directory, you'll find a textfile with the same name as the branch, simply containing the commit identifier of the current 'tip', or newest commit, on that branch. The branch itself is created by tracing the ancestry of that single commit, since every commit knows which commit it occurred after.
As a trainer, one thing I am always asked about is what the 'right' Git branching strategy is. My advice varies quite a lot depending on the team and situation, but there are some common patterns that show up time and time again. This post outlines those patterns and explains the situations where they can be most helpful.
Start simple with GitHub flow
Made famous by a blog post from Scott Chacon, GitHub Flow is my favourite branching strategy. Why? Mostly because it's simple while still covering all the essential bases. GitHub Flow simply states that when working on any feature or bug fix, you should create a branch.
When it is finished, it is merged back into the master with a merge commit. It's a very easy way for teams and individuals to create and collaborate on separate features safely, and to be able to deliver them when they are done. It creates a graph that looks something like the one in Figure 2.
These 'topic' branches can be quite long-lived, but be aware that the more a branch diverges from the master, the more likely it is that conflicts will be experienced when the feature lands. The rule is that the master branch is always deployable, so features must be finished and thoroughly tested before they are merged in. After a new feature has been merged to the master, it is normal to immediately deploy the new codebase. For this, you could either establish a true continuous deployment approach, or just set up an easy automated deployment process so the code can be deployed often.
There's a great article on Laura Thomson's blog that describes this as 'potentially continuous deployment'. It isn't considered unusual to deploy code to live platforms multiple times per day.
Since this model advocates branching for even a 'hotfix' – a quick, single-commit change – it creates a graph pattern all of its own. Figure 3 represents an example where there's just a one-commit diversion on the direction of development.
Why would someone create a single commit on a branch and then merge it with a merge commit, rather than just applying the change directly to master? There are some philosophical reasons about separation, but for me there are two main advantages.
Firstly, creating a branch gives the opportunity to open a pull request and get some review/feedback/QA on the change before it merges. Secondly, if you want to either unmerge the change or apply the same change elsewhere, both those things are easier with a branch than with a direct-on-master commit.
This is what I like to call the 'Branch per Platform' model. In addition to a master branch, there are branches that track the live platform, plus any other platforms you use in your workflow (such as test, staging and integration). To do this, the master remains at the bleeding edge of your project, but as features complete, you merge to the other platforms accordingly (see Figure 4).
The advantage of this is that you always have a branch that tracks a particular platform, which makes it very easy to hot-fix a problem if, for example, it is spotted on live. By using the live branch as the basis of your fix, you can easily deploy the existing codebase, plus one single change. Once the live platform is sorted out, you can then apply the fix to the master and any other environment branches. This means you avoid a situation where you're either trying to guess what is on the live platform, or doing a weird direct-on-live fix!
Marking with tags
A very similar alternative is to use tags to mark your releases when you make them. This way, you can look back at the history of the versions that have been live. In fact, GitHub even has a dedicated page for this. This technique is used especially to mark library releases.
Other tools, such as Composer for PHP, will use the tags to pick up and make available the newly released version, so it's important that the tags are used correctly. Tags can also be useful in your own projects, perhaps configuring your build or deployment tools to only respond to specifically named tags rather than every commit on a branch, or indeed a whole repository.
Merge commit or not?
Using Git gives us the ability to merge changes in any order, regardless of how those features were actually built and combined. It also enables us to rewrite history, leading to endless debates over the benefits of a clean and pretty commit graph, versus one that reflects what actually happened.
A bit like the argument between tabs and spaces, this argument can and will run for some time. That said, both sides of the discussion have merit and it's an important consideration when you standardise the way that your team will work.
For any of this to make sense, first we need to talk about merge commits in Git. All Git commits have commit identifiers – those long hex strings – and they are unique. They are made from information about the changeset, the commit message, the author, the timestamp and the parent commit that they were applied to.
This means if the same commit is applied to a different parent, even if the resulting code ends up identical, the new commit will have a different commit identifier. Merge commits have not one, but two parents.
In the log, they look like this:
commit 41ea02cc8a1d2964c4f7b46b5f6b11cc04327959 Merge: a8d1421 e71dbf5 Author: Lorna Mitchell <email@example.com> Date: Mon Dec 22 19:48:34 2014 +0000 Merge branch 'akrabat-update-homepage-with-open-cfps'
Notice the extra line in the commit message which starts Merge: .... The two numbers are the commit references of the two parents for this commit – one from each of the branches being merged together.
Merges don't always look like this, however. If there are changes on a branch but not, for example, on the master branch, then a 'fast-forward merge' will take place. We start with this situation: changes on a feature branch but not on master (as shown in Figure 5). If we just ask Git to merge this branch, it will 'fast-forward'.
When the merge happens, there's no merging actually needed. The commits on the feature branch continue on from the newest commit on the master branch, making a linear history. Git therefore fast-forwards, moving the master label to the tip of the feature branch so we can continue from there (Figure 6). If you are aiming for a more traditional merge pattern, you can force this using the --no-ff switch. This creates a history that looks like Figure 7.
The --no-ff switch tells Git not to fast-forward but instead to create the merge commit. Many branching strategies require that this technique be used when branches merge in. Most of the time, we'll need a merge commit anyway, because there will have been changes on both the feature branch and the branch we're merging into. In some situations – for me that's usually when creating a hotfix branch – it can be useful to force the merge commit to make the history clear on exactly which branch has been merged to where.
Perhaps the most well-known branching strategy is Git Flow, which is a very comprehensive strategy. So comprehensive, in fact, it needs a whole set of scripts in order to use it properly! In my experience, Git Flow is too much for all but very large and technically advanced teams that are solving problems across multiple releases at one time.
However, every branching strategy I've ever implemented has drawn ideas from this approach, and I've tried to break these down in this article.
A branching strategy itself is a process; a document. It's the place where, as a team, you capture your approach and attitude to the way that you produce, track and ship code. Everyone understands how each of the moving parts in the process works (at least on the code level – social and political issues are a topic for another article), and how to collaborate to achieve the goal that you're all aiming for. A branching strategy can be a simple as a page on the company wiki – something that gives structure to the way that you work. It may also be a great place to put the Git cheatsheet for how to do the various steps.
I find that having a branching strategy in place greatly improves both the quality of the work and the confidence and communication of the team. Take a little time to agree and record your strategy, and then go forth and be even more awesome than you were before.
Words: Lorna Mitchell
Lorna Mitchell is an author, trainer and developer specialising in PHP, APIs and Git. You can get more help from her by reading her Git Workbook, which contains practical exercises to level up your skills. This article was originally published in issue 267 of net magazine.
Liked this? Read these!