Recompress git repository

git-slimGit repositories are getting bigger and bigger. To reduce the disk space used by a repository, git provides some housekeeping functionality.

Compared to other version control systems, git stores the complete history in each cloned repository if not told otherwise. This can result in quite huge cloned local repositories. Thankfully Git offers a way to reduce the size of a repository.

The following command executed in the directory of the repository will reduce its size:

$ git gc --aggressive --prune=now

The git command ‘gc’ (garbage collection) will cleanup unnecessary files and optimize the local repository. It runs a number of housekeeping tasks within the repository, such as compressing file revisions to reduce disk space, and more.

The option ‘–prune=now’ is set to remove loose objects older than the specified date (in this case “now”). With –aggressive git will more aggressively optimize the repository at the expense of taking much more time.

$ du -sh .git
1.2G    .git
$ git gc --prune=now --aggressive
Counting objects: 15488, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (14257/14257), done.
Writing objects: 100% (15488/15488), done.
Total 15488 (delta 11406), reused 0 (delta 0)
Removing duplicate objects: 100% (256/256), done.
$ du -sh .git
20M     .git

The example above is from an etckeeper repository, as explained in Keep track of Linux configuration changes with etckeeper. The repository collected over time more then 200 commits and some garbage along with them.

The example shows the size of the .git directory containing all the repository data is reduced from 1.2GB down to 20MB. The amount of space that can be reclaimed is of course different depending on the content of the repository. In case of the etckeeper directory, most of the files in the repository are text based files which compress a lot.

Cleaning up this repository with the –aggressive option took about 20 minutes. The time it takes depends of course not only on git itself but also on the performance of the computer and its components.


Read more of my posts on my blog at http://blog.tinned-software.net/.

This entry was posted in Version control system and tagged . Bookmark the permalink.