Skip to Content

Git: submodules vs. subtrees

Posted on 6 mins read

This is a common topic while managing Git repositories. Sometimes, you need to have one (or many) repositories withing another repository.

The laziest way to do it is having a monorepo. A Git monorepo is when a team develops multiple projects, related or not, in a single Git repo, in order to make it easy to share code between different projects. As example, if you have a library that you want to use in different projects, you just put it in its own folder and reference in your projects:

- /your-git-project
	- /one-project
	- /project-related-to-project-two
	- /project-unrelated-to-the-two-other-projects
	- /library-used-in-all-other-projects

This is very easy to do. But in order to make things more organized, and to prevent your repo to became too big, it’s sometimes a better idea to keep things in separated repos. To do so, Git have two buildin features to make this work, Git submodules and Git subtrees. Let’s see what they are and how to use them.

Git submodules

We can think of submodules as a kind of link that points to another repository. As example, imagine we are starting a repo for a web application. This repo will contain both server-side and client-side code.

$ mkdir webapp
$ cd webapp
$ git init
$ echo "This will be an web app" > README.md
$ git add .
$ git commit -m "Initial commit"

Now, imagine we will add Pikaday as a submodule, since we will need it in our web app. Doing that is as simple as doing:

$ git submodule add https://github.com/owenmead/Pikaday pikaday
$ git add .
$ git commit -m "Add Pikaday as a submodule"

The repo structure will look like this:

- /webapp
	- /.git
	- /README.md
	- /.gitmodules
	- /pikaday
		- /.git
		- ... # pikaday source files

What happened here is:

  • The pikaday repository was cloned in the pikaday folder
  • A .gitmodules file was added with metadata about the submodules in the repo

Let’s take a look in the .gitmodules file:

$ cat .gitmodules
[submodule "pikaday"]
	path = pikaday
	url = https://github.com/owenmead/Pikaday

It contains what submodules we have and its clone path. Now let’s take a look in the commit we just made:

$ git show
commit 0387cc12229bd31fc3a4f299225ce3c1e1b6aec3
Author: You <your@email.com>
Date:   Sun Apr 3 15:06:32 2016 -0300

    Add Pikaday as a submodule

diff --git a/.gitmodules b/.gitmodules
new file mode 100644
index 0000000..affc969
--- /dev/null
+++ b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "pikaday"]
+       path = pikaday
+       url = https://github.com/owenmead/Pikaday
diff --git a/pikaday b/pikaday
new file mode 160000
index 0000000..d57fa05
--- /dev/null
+++ b/pikaday
@@ -0,0 +1 @@
+Subproject commit d57fa05193f46a1394635f11bbbcd9c55da2a54c

Here’s the interesting part. Internally, Git stores the submodule as a simple text file, with the exact commit ref it points to:

# pikaday file
Subproject commit d57fa05193f46a1394635f11bbbcd9c55da2a54c

This means that the Pikaday source was not commited to the repository when we did the commit. Remember, a Git submodule is just a link to a specific ref in another repository. When another person clones your repository, it won’t see the Pikaday source there. In order to have that, they will have to run:

$ git submodule init
$ git submodule update

An alternative is cloning with the --recursive option:

$ git clone --recursive <repo-path>

Git submodules gotchas

There are many gotchas you have to be aware when dealing with submodules. One of them is that Git often keep your submodules checked in detached heads. Imagine we want to make a change in the Pikaday repo. First, we need to make sure we are checked in a branch (often master):

$ cd pikaday
$ git checkout master
# now we are ready to work

Another gotcha: even after making changes to Pikaday, we have to manually update the ref in the outer repo:

/pikaday $ echo "Foo" > README.md
/pikaday $ git add .
/pikaday $ git commit -m "Update Pikaday README.md"
/pikaday $ git push <your-pikaday-fork-path> master
/pikaday $ cd ..
/ $ git add .
/ $ git commit -m "Update Pikaday submodule ref"

These additional steps make everything a little more boring and error-prone:

  • Someone may forgot to update the ref after making changes in a submodule
  • Someone may forgot to do git submodule update after pulling and ending up with a different build
  • Someone not much familiar with Git might have problems with dealing with detached heads
  • What if you don’t want to have your own fork of a lib to make changes in it?

These and other problems make many people prefer subtrees over submodules, which we will see next.

Git subtrees

Subtrees are much simpler than submodules. As opposed to submodules, subtrees’ sources files are stored in the repo. It’s not just a link, the code is really there. There’s also fewer steps required and fewer changes to the workflow.

Subtrees started as a set of scripts that were later made available in the Git itself. It uses some conventions, like metadata written in the commit messages, that made it work without changing how Git work internally. Let’s reproduce the above example, but using subtrees instead:

$ mkdir webapp
$ cd webapp
$ git init
$ echo "My webapp" > README.md
$ git add .
$ git commit -m "Initial commit"

# Here's the important part
# Do not forget the ending slash (/) in the prefix
# Also do not forget the "--squash" flag, otherwise you will
# end up with a very polluted Git history
$ git remote add pikaday https://github.com/owenmead/Pikaday
$ git subtree add --squash --prefix=pikaday/ pikaday master

Let’s take a look in the log:

$ git log
commit a0a9a576b8ce5a73422f6f3f1489faabe7b26dd0
Merge: 230ef84 0277a19
Author: You <your@email.com>
Date:   Sun Apr 3 16:07:11 2016 -0300

    Merge commit '0277a193131f68b873ab83b2618dea89217db757' as 'pikaday'

commit 0277a193131f68b873ab83b2618dea89217db757
Author: You <your@email.com>
Date:   Sun Apr 3 16:07:11 2016 -0300

    Squashed 'pikaday/' content from commit d57fa05

    git-subtree-dir: pikaday
    git-subtree-split: d57fa05193f46a1394635f11bbbcd9c55da2a54c

commit 230ef8475baeeb9ce9e9940c84d54c214135e5ce
Author: You <your@email.com>
Date:   Sun Apr 3 16:06:47 2016 -0300

    Initial commit

What happened is: Git squashed the entire Pikaday history in our repo’s history. There isn’t another .git folder, just one. As opposed to submodules, someone that clones your repo won’t have to do anything else to have all the code.

If, in the future, you have to pull Pikaday changes from its original repository, do:

$ git subtree pull --squash --prefix=pikaday/ pikaday master

If you have write access to the repository, you can also push changes you did to a subtree repo to its original repository:

$ echo "Imagine this is a bug fix" > pikaday/README.md
$ git add .
$ git commit -m "Pikaday: fix #123"
$ git subtree push --prefix=pikaday/ pikaday master

When you do a git subtree push, Git will collect the commits that changed files inside the folder specified in the --prefix option, and push just these commits to the given repo.

Recap

Submodules Subtrees
Harder (specially for Git beginners) Easier
It’s just a link to a commit ref in another repository Code is merged in the outer repository’s history
Requires the submodule to be accessible in a server (like GitHub) Decentralized
Requires additional steps Just clone, pull and push in a similar way you are already familiar
Smaller repository size Bigger repository size