Splitting up an SVN repository

When I was creating my SVN repository I was lazy and created just one repository for all my projects. This was easy to administrate and to use; only one repository to configure with user names and passwords, only one URL to remember, but over time this repository has grown, and at some point the whole thing has got quite messy. As SVN is not built to delete anything, it grows and you can’t get rid of old projects or separate things anymore. You might end up having something similar to this in your SVN repository:

    /
    |---project_A
    |    |----trunk
    |    |----branches
    |    |----tags
    |---project_B
    |    |----trunk
    |    |----branches
    |    |----tags
    |---directory_1
    |    |---project_C
    |    |    |----trunk
    |    |    |----branches
    |    |    |----tags
    |    |---project_D
    |         |----trunk
    |         |----branches
    |         |----tags
    |   .
    |   .
    |   .

That all works fine so far but when it comes to backup you have to store the complete repository and all its revision as one huge package. But if you want to, there is a way of splitting a single repository up. In this post I want to go through the way I cleaned up my repositories.

How does it work?

The whole procedure sounds fairly easy: you dump the complete directory, then you filter out what you don’t want to have in it and at the end you import it again into a new repository. The whole procedure requires a lot of manual work and a couple of shell commands but at the end the result is great.

Dump the Repository

The first step is quite simple but must be run directly on the machine where the repository is located. The command “svnadmin dump” requires direct access to the repository directory.

svnadmin dump /path/to/repository/ >repository.dump

After that is finished, you will find the file ‘repository.dump’ containing a full dump of the complete repository.

Filter the SVN dump

The SVN dump as we have it now contains all revisions of the complete repository, but as we just want one project extracted we need to filter it out of the complete dump. We will do this using the “svndumpfilter” program. While this program will not modify the full dump in any way, you can repeat this step for each project you want to extract into its own repository without creating the complete dump again.

To filter the dump do the following.

cat repository.dump | svndumpfilter include “directory_1/project_C” >/project_C.dump

The parameter “–drop-empty-revs” causes all revisions that are not related to the filtered project to be removed from the dump. The parameter “–renumber-revs”  renumbers revisions to avoid missing numbers caused by the first parameter.

The resulting dump file will now contain only the one project we filtered out. It’s important to note at this point that the structure within the dump is still unchanged. If we import the dump as it is now, the “directory_1/project_C” will be still in the new repository. This brings us to our first problem. As we have filtered a sub-directory, the SVN entry to create this directory we filtered for is missing. So if we imported it, we would get an error like this:

svnadmin: File not found: transaction '0-0', path 'directory_1/project_C'

This error tells us that the path “directory_1” doesn’t exist, so trying to create the directory “project_C” in it is not possible. This is something we have to correct by hand. Depending on how you plan to load the dump into the new repository there are 2 possibilities.To get the dump working for us, we need to edit the dump file as follows.

warningShort word about the right editor

Choose your editor with care, as the dump might be huge and loading big files is not a strong point of all editors. Also, some editors might try to modify the file even without asking. The editors “nano” or “pico” are very nice, but useless for this job, as they try to add a line break into lines if they are too long to show them in one line on the screen and this can result in a corrupt file. The “vi” editor is great for this job, but not everyone’s favorite editor!

You will probably find something like this in one of the the first revisions. Each revision starts with a line like “Revision-number: 1”.

PROPS-END

Node-path: directory_1/project_C
Node-action: add
Node-kind: dir
Prop-content-length: 10
Content-length: 10

PROPS-END

Node-path: directory_1/project_C/trunk
Node-action: add
Node-kind: dir
Prop-content-length: 10
Content-length: 10

PROPS-END

This means that this revision has created the directories “directory_1/project_C/trunk” in one go. It first creates the “project_C” directory in “directory_1” that does not exist. This is exactly where we need to fix it.

Now if you want to keep the structure as it is, you simply need to add one of those PROPS blocks to let it create “directory_1” before “project_C” is created. You will then have something like this.

PROPS-END

Node-path: directory_1
Node-action: add
Node-kind: dir
Prop-content-length: 10
Content-length: 10

PROPS-END

Node-path: directory_1/project_C
Node-action: add
Node-kind: dir
Prop-content-length: 10
Content-length: 10

PROPS-END

Node-path: directory_1/project_C/trunk
Node-action: add
Node-kind: dir
Prop-content-length: 10
Content-length: 10

PROPS-END

If you don’t want to keep the structure anyway, you can just delete the PROPS block with the “Node-path: directory_1/project_C“ completely. If you have a separate revision that shows just one PROPS block with the “Node-path: directory_1/project_C“ then you need to delete the complete Revision from the dump. That can be done by deleting from the “Revision-number: “ line to the next of those lines.

Change the project path in the repository

We have now filtered the project from “directory_1/project_C” into our dump. But we don’t want it in the new repository under the path “directory_1/project_C”. So we need to change that in our dump. To remove the path and just have the trunk, tags, branches directly in the root of the new repository, run the following command. Please note that the “/“ needs to be escaped here and so is shown as “\/“.

sed -i 's/Node-path: directory_1\/project_C\//Node-path: /g' project_C.dump
sed -i 's/Node-copyfrom-path: directory_1\/project_C\//Node-copyfrom-path: /g' project_C.dump

Import the dump into a new repository

The final step is to import the dump into a fresh new repository. To do this, we create a new repository (probably in the location of your other repositories). After the SVN repository is created, we load the prepared dump into the repository. This can be done with the following commands:

svnadmin create /path/to/project_C
svnadmin load /path/to/project_C/ –-ignore-uuid <project_C.dump

The –ignore-uuid is an important parameter here. Every SVN repository has a universally unique identifier or UUID. The UUID is used by SVN clients to identify the repository. If you imported the dump without this parameter you would end up with two repositories with the same UUID.

With these last commands you have a separate repository. If you did not already, now might be the time to configure the new repositories user and access rights as well as block access to the extracted project in the old repository to avoid confusion.


Read more of my posts on my blog at http://blog.tinned-software.net/.

This entry was posted in Version control system and tagged , . Bookmark the permalink.