Convert Subversion repository into git repositories

A long long time ago before git was the dominant version control system (VCS) I used Subversion for quite some time. With git now being the de-facto standard VCS, it was finally time to convert my last remaining subversion repositories which collected dust over the last decade.

Before converting subversion (svn) repositories, it is important to understand a few important differences between svn and git. It is also important to understand how svn was used and organised back then before git came along.

Differences between svn and git

RCS (Revision Control System) and CVS (Concurrent Versions System) as well as their successor SVN (Subversion) and now git all try to solve the same basic idea. That is, giving a developer the possibility to store the state of a file (or files) at a particular point in time, while then continuing to modify them. This allows for later changing a file back to a previous state.

Subversion did not provide much of structure a developer had to follow. Most svn repositories are not like git with a more or less predefined structure for tags and branches. Subversion, simply put, provides a versioned directory structure. The concept of branches and tags is a concept introduced by the user of the system by naming convention. There were many conventions to structure a svn repository. What has become the de-facto standard structure is a set of directories “branches”, “tags” and “trunk”. Creating a tag or a branch meant nothing more than creating a label for a certain state of the repository in either the tag or branch directory.

Another very important difference is the common practice to have multiple projects inside one svn repository. The possibility to checkout only a subsection of the svn repository structure made this common practice. One could checkout and update a small directory of a huge repository.

Now, converting such a repository containing multiple projects into one huge git repository is not as git was intended. Thankfully, there is a very easy way to separate the projects. Before this can be attempted, I suggest to take a simple checkout of the svn repository or use a svn browser to inspect the repository structure.

Installation of required tools

Git provides a subcommand “git svn” which allows to convert an svn repository to git while retaining the entire history. The “git svn” subcommand is installed as a separate package and requires perl.

Installing git-svn on CentOS / Rocky Linux

Using yum / dnf depending on the version of CentOS, the following packages should be installed as root.

$ dnf install git git-svn subversion subversion-perl perl

Installing git-svn on Debian

On a debian based system, apt can be used to install the required packages. As usual, the package names differ between RedHat and Debian based systems.

$ apt-get install git git-svn subversion perl svnkit

Installing git-svn on macOS

For the required “SVN::Core” perl module, the Apache Portable Runtime (APR) is needed (packages “apr” and “apr-utils”).

$ brew install git git-svn subversion perl apr apr-util

For cpan to find the APR, the path to the APR and APR-util needs to be set.

$ export PATH="/usr/local/opt/apr/bin:/usr/local/opt/apr-util/bin:$PATH"

After that, the “SVN::Core” dependency needs to be installed via cpan.

$ cpan SVN::Core

Generate a user mapping file

In Subversion, each commit is recorded with the user name of the committer. With git, the commits get the name and email address of the committer. With that in mind, the first step is to generate a mapping between the user names in SVN and the names and email addresses used in git.

To extract the user list from the svn repository, checkout the repository and execute the following.

$ svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" "}' | sort -u > authors_mapping

The first command is a “svn log -q”. This “svn log” show the commit history while the “-q” option suppresses the commit message. This list contains the user names like in this example output.

------------------------------------------------------------------------
r782 | gerhard | 2014-04-27 16:33:17 +0200 (So, 27 Apr 2014)
------------------------------------------------------------------------
r781 | gerhard | 2014-04-27 16:32:25 +0200 (So, 27 Apr 2014)
------------------------------------------------------------------------
r780 | gerhard | 2014-04-27 16:30:57 +0200 (So, 27 Apr 2014)
------------------------------------------------------------------------
r779 | gerhard | 2014-04-27 16:30:35 +0200 (So, 27 Apr 2014)
------------------------------------------------------------------------

The “awk” command starts by matching and processing only lines starting with an “r” character followed by removing surrounding spaces. The output of the awk is already the format as required. With the “sort -u” at the end, the list is sorted and, most importantly, the duplicates are removed from the list. The resulting mapping file will look something like this.

gerhard = gerhard

Cloning subversion repository using git

To migrate the subversion repository to git, the previously installed “git svn” subcommand is used. This command will retrieve the subversion repository including the entire svn history. Depending on the structure of the svn repository, the option “–stdlayout” can be used (assuming the common “trunk/branches/tags” layout).

$ git svn clone svn+ssh://svn@repo.tinned-software.net/REPO_NAME -A authors_mapping --stdlayout REPO_NAME

If the layout does not follow the common layout, “git svn” allows to specify the trunk/branches/tags paths.

$ git svn clone svn+ssh://svn@repo.tinned-software.net/REPO_NAME -A authors_mapping --trunk=project/trunk --tags=project/tag --branches=project/branch REPO_NAME

Convert tags

In the cloned repository, the structure as it exists in the subversion repository is cloned into a git repository. This structure might contain tags – these are created in the git repository but not as actual git tags. The following command extracts all the tag names and creates actual git tags from them.

$ git for-each-ref refs/remotes/origin/tags --format="%(refname:short)" | sed 's#origin/tags/##' | while read TAG; do git tag ${TAG} origin/tags/${TAG}; done

The first command retrieves the svn tag names. With the second command, the path for the tags is rewritten. Finally, in the loop, the actual tags are created.

Convert branches

As with the tags, also the branches are cloned but not created as actual git branches. Similar to the tags, the following commands will create the git branches.

$ git for-each-ref refs/remotes/origin/ --format="%(refname:short)" | egrep -v "origin\/((tags)|(trunk))" | sed 's#origin/##' | while read BRANCH; do git branch ${BRANCH} origin/${BRANCH}; done

With the first command a list of all tags and branches are retrieved. The “egrep” will filter out the trunk and tags leaving the branches. The path is rewritten again with the “sed” command followed by the loop to create the branches.

Convert ignore patterns

In contrast to git, subversion stores the ignore pattern as properties inside the subversion repository. With the “show-ignore” subcommand, the ignore pattern can be retrieved.

$ git svn show-ignore > .gitignore
$ git add .gitignore
$ git commit -m "Convert svn ignore properties to .gitignore"

With the ignore pattern extracted it can be committed to the git repository.

Push to git remote

With the git repository prepared from the subversion repository, it is time to push it to a remote. this can be achieved by adding a remote to the repository and pushing it.

$ git remote add origin ssh://git@git.remote.domain.tld:22/${REPO_GIT}.git
$ git push -u origin --all

Read more of my posts on my blog at https://blog.tinned-software.net/.

This entry was posted in Version control system and tagged , . Bookmark the permalink.