A long long time ago before git was the dominant version control system (VCS) I used Subversion for quite some time. With git now being the de-facto standard VCS, it was finally time to convert my last remaining subversion repositories which collected dust over the last decade.
Before converting subversion (svn) repositories, it is important to understand a few important differences between svn and git. It is also important to understand how svn was used and organised back then before git came along.
Differences between svn and git
RCS (Revision Control System) and CVS (Concurrent Versions System) as well as their successor SVN (Subversion) and now git all try to solve the same basic idea. That is, giving a developer the possibility to store the state of a file (or files) at a particular point in time, while then continuing to modify them. This allows for later changing a file back to a previous state.
Subversion did not provide much of structure a developer had to follow. Most svn repositories are not like git with a more or less predefined structure for tags and branches. Subversion, simply put, provides a versioned directory structure. The concept of branches and tags is a concept introduced by the user of the system by naming convention. There were many conventions to structure a svn repository. What has become the de-facto standard structure is a set of directories “branches”, “tags” and “trunk”. Creating a tag or a branch meant nothing more than creating a label for a certain state of the repository in either the tag or branch directory.
Another very important difference is the common practice to have multiple projects inside one svn repository. The possibility to checkout only a subsection of the svn repository structure made this common practice. One could checkout and update a small directory of a huge repository.
Now, converting such a repository containing multiple projects into one huge git repository is not as git was intended. Thankfully, there is a very easy way to separate the projects. Before this can be attempted, I suggest to take a simple checkout of the svn repository or use a svn browser to inspect the repository structure.
Installation of required tools
Git provides a subcommand “git svn” which allows to convert an svn repository to git while retaining the entire history. The “git svn” subcommand is installed as a separate package and requires perl.
Installing git-svn on CentOS / Rocky Linux
Using yum / dnf depending on the version of CentOS, the following packages should be installed as root.
$ dnf install git git-svn subversion subversion-perl perl
Installing git-svn on Debian
On a debian based system, apt can be used to install the required packages. As usual, the package names differ between RedHat and Debian based systems.
$ apt-get install git git-svn subversion perl svnkit
Installing git-svn on macOS
For the required “SVN::Core” perl module, the Apache Portable Runtime (APR) is needed (packages “apr” and “apr-utils”).
$ brew install git git-svn subversion perl apr apr-util
For cpan to find the APR, the path to the APR and APR-util needs to be set.
$ export PATH="/usr/local/opt/apr/bin:/usr/local/opt/apr-util/bin:$PATH"
After that, the “SVN::Core” dependency needs to be installed via cpan.
$ cpan SVN::Core
Generate a user mapping file
In Subversion, each commit is recorded with the user name of the committer. With git, the commits get the name and email address of the committer. With that in mind, the first step is to generate a mapping between the user names in SVN and the names and email addresses used in git.
To extract the user list from the svn repository, checkout the repository and execute the following.
$ svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" "}' | sort -u > authors_mapping
The first command is a “svn log -q”. This “svn log” show the commit history while the “-q” option suppresses the commit message. This list contains the user names like in this example output.
------------------------------------------------------------------------ r782 | gerhard | 2014-04-27 16:33:17 +0200 (So, 27 Apr 2014) ------------------------------------------------------------------------ r781 | gerhard | 2014-04-27 16:32:25 +0200 (So, 27 Apr 2014) ------------------------------------------------------------------------ r780 | gerhard | 2014-04-27 16:30:57 +0200 (So, 27 Apr 2014) ------------------------------------------------------------------------ r779 | gerhard | 2014-04-27 16:30:35 +0200 (So, 27 Apr 2014) ------------------------------------------------------------------------
The “awk” command starts by matching and processing only lines starting with an “r” character followed by removing surrounding spaces. The output of the awk is already the format as required. With the “sort -u” at the end, the list is sorted and, most importantly, the duplicates are removed from the list. The resulting mapping file will look something like this.
gerhard = gerhard
Cloning subversion repository using git
To migrate the subversion repository to git, the previously installed “git svn” subcommand is used. This command will retrieve the subversion repository including the entire svn history. Depending on the structure of the svn repository, the option “–stdlayout” can be used (assuming the common “trunk/branches/tags” layout).
$ git svn clone svn+ssh://svn@repo.tinned-software.net/REPO_NAME -A authors_mapping --stdlayout REPO_NAME
If the layout does not follow the common layout, “git svn” allows to specify the trunk/branches/tags paths.
$ git svn clone svn+ssh://svn@repo.tinned-software.net/REPO_NAME -A authors_mapping --trunk=project/trunk --tags=project/tag --branches=project/branch REPO_NAME
Convert tags
In the cloned repository, the structure as it exists in the subversion repository is cloned into a git repository. This structure might contain tags – these are created in the git repository but not as actual git tags. The following command extracts all the tag names and creates actual git tags from them.
$ git for-each-ref refs/remotes/origin/tags --format="%(refname:short)" | sed 's#origin/tags/##' | while read TAG; do git tag ${TAG} origin/tags/${TAG}; done
The first command retrieves the svn tag names. With the second command, the path for the tags is rewritten. Finally, in the loop, the actual tags are created.
Convert branches
As with the tags, also the branches are cloned but not created as actual git branches. Similar to the tags, the following commands will create the git branches.
$ git for-each-ref refs/remotes/origin/ --format="%(refname:short)" | egrep -v "origin\/((tags)|(trunk))" | sed 's#origin/##' | while read BRANCH; do git branch ${BRANCH} origin/${BRANCH}; done
With the first command a list of all tags and branches are retrieved. The “egrep” will filter out the trunk and tags leaving the branches. The path is rewritten again with the “sed” command followed by the loop to create the branches.
Convert ignore patterns
In contrast to git, subversion stores the ignore pattern as properties inside the subversion repository. With the “show-ignore” subcommand, the ignore pattern can be retrieved.
$ git svn show-ignore > .gitignore $ git add .gitignore $ git commit -m "Convert svn ignore properties to .gitignore"
With the ignore pattern extracted it can be committed to the git repository.
Push to git remote
With the git repository prepared from the subversion repository, it is time to push it to a remote. this can be achieved by adding a remote to the repository and pushing it.
$ git remote add origin ssh://git@git.remote.domain.tld:22/${REPO_GIT}.git $ git push -u origin --all
Read more of my posts on my blog at https://blog.tinned-software.net/.