Git repository cluster setup – Part 2

02_git-cluster-mirroring_1Using a git cluster containing more then two servers, as described in Git repository cluster setup, will increase the time taken to push changes to the cluster. This is because of the required mirroring. To solve this, a more advanced version of the script is needed to avoid the mirror-server needing to mirror the changes back to the master-server.

The above approach with – for example, three git servers – would cause the following situation. When you commit to server1, the server would mirror the changes to all other servers. Each of the mirror servers would also start to mirror the changes to all servers. This results in a lot of mirroring attempts that will slow down the cluster.

The git protocol avoids any problems resulting from this mirroring but it does add a lot of transmission overhead. Lets assume one push procedure takes up 2 seconds, the 6 mirroring attempts for the 3 servers sum up to an 12 additional seconds waiting for the mirroring to finish. If the hook script could check that it is currently getting data from a mirror, it would be able to avoid the unnecessary attempts. Then the mirroring would look like this.

03_git-cluster-mirroring_2       04_git-cluster-mirroring_3

The down side of this second approach is that the script needs to know the server IP addresses. Configure all the server IP addresses into the script. Every time you add a mirror server or one of their IP addresses changes, you have to update this script.

#!/bin/bash
CLUSTER_IP_LIST="11.11.11.11 22.22.22.22 33.33.33.33"

REMOTE_IP=`echo $SSH_CLIENT | sed 's/ .*$//'`
MIRROR="YES"
for IP in $CLUSTER_IP_LIST; do
	if [[ "$IP" == "$REMOTE_IP" ]]; then
		MIRROR="NO"
		break
	fi
done

if [[ "MIRROR" == "YES" ]]; then
	REMOTE_LIST=`git remote`
	for REMOTE in $REMOTE_LIST
	do
		echo "git push --all $REMOTE"
		git push --all $REMOTE
	done
else
	echo "mirroring - not mirror back"
fi

This script should be copied to the “hooks” directory of the repository you want to mirror. This should be done in both the master-server and the mirror-servers’ repositories to be able to mirror both ways.

To be executed, this script file needs to be executable by the git user when new updates are pushed to the repository. With the following commands, the files are marked executable and the owner is changed to the git user.

[Server1]$ cd /home/git/repositories/repository-name.git/hooks/
[Server1]$ chown git:gitosis post-receive
[Server1]$ chmod a+x post-receive

With the hook in place, the push output will show additional lines like those in the example below. As you can see, the repository is mirrored from Server1 to the mirror-server. The mirror-server tries to push the changes back to the master server, but this is up to date and therefore no update is performed. Without performing an update, the hook does not get called again on the Server1, so it does not create a loop.

Counting objects: 5, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 306 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
remote: git push --all mirror-server
remote: remote: git push --all master-server        
remote: remote: Everything up-to-date        
remote: To ssh://git@mirror.example.com/repository-name.git
remote:    709387f..e2375b5  master -> master
To ssh://git@master.example.com/repository-name.git
   709387f..e2375b5  master -> master

With this hook, the mirroring should work without any unnecessary mirroring attempts, but it is a long way from being a perfect solution. There is still a lot of manual work to be done for every repository you want to mirror. I will now show you how to automate as much as possible of the mirroring.

Automate the cluster

The work starts with creating a new directory. You configure it in the gitosis.conf configuration file and only after the user has pushed the first content to the repository, can the hook be installed. This can be automated by placing the hook in git’s template directory “/usr/share/git-core/templates/”. This way the hook will be installed as soon as the repository is created by the initial push.

Still, the hook alone does not know where to mirror the changes to. The remote is not configured to the repository when it is created. The idea is to let the hook check and if needed add the remote to the repository. I therefore extended the functionality of the hook to check if the remote exists on the repository. If the remote is not configured, it will be added.

But here comes the next problem: where to configure the list of remotes so you do not need to update the configuration on every remote if a new server is added? The simplest way is to add the configuration to the gitosis.conf file as it is available and configured on every server.

Gitosis places the gitosis.conf in the gitosis-admin repository directory. The hook can access this file and read its configuration. For this purpose I created a few configuration items in the gitosis.conf file.

With the “CLUSTER_SERVER” you can configure all the servers of the cluster. It’s important that the IP address is listed here as it should show up when the servers connect to each other. I assumed here that they are public IP addresses.

The “IGNORE_CHECK” defines a list of repositories where the remote IP address is not checked. When the remote IP is not checked, the repository will be mirrored to the other servers. This can cause a mirroring back as described above, but there might be repositories where you need this still. When server1 pushes the changes from etckeeper to the repository located on Server1, it should be mirrored. The check of the origin IP would avoid that. When the Keep track of Linux configuration changes with etckeeper is listed, the repository is mirrored without checking.

All remotes that are configured in the repository but not in the gitosis.conf can be removed from the repository if this setting is set to “YES”.

# Configure cluster server as CLUSTER_SERVER="RemoteName RemoteURL RemoteIP" (space seperated)
CLUSTER_SERVER="server1 ssh://git@srv1.example.com:1234/ 123.123.123.101"
CLUSTER_SERVER="server2 ssh://git@srv2.example.com:1234/ 123.123.123.102"
CLUSTER_SERVER="server3 ssh://git@srv3.example.com:1234/ 123.123.123.103"

# Ignore the check if the remote server is a cluster server for the configured repositories.
# Define the repositories as "RepositoryName.git RepoName.git" (space seperated)
IGNORE_CHECK="repository2-name.git"

# Eanable or disable the functionality to remove remotes that are not configured anymore (YES or NO)
REMOVE_REMOTE="YES"

This configuration gets pushed to each cluster server as part of the gitosis configuration. The following hook should be placed in the repositories hook directory as a “post-receive” file and/or copied to the “/usr/share/git-core/templates/” directory on all cluster servers.

#!/bin/bash
#version=1.3.016

# Get the list of Cluster IPs from the gitosis.conf
REMOTE_IP_LIST=`grep -E "^CLUSTER_SERVER=" /home/git/repositories/gitosis-admin.git/gitosis.conf | sed -E 's/^.*"(.*)".*$//' | awk '{print $3}'`
# Get the list of remote names from the gitosis.conf
REMOTE_LIST=`grep -E "^CLUSTER_SERVER=" /home/git/repositories/gitosis-admin.git/gitosis.conf | sed -E 's/^.*"(.*)".*$//' | awk '{print $1}'`

# Get the list of repositories to ignore the mirror checkfrom the gitosis.conf
IGNORE_LIST=`grep -E "^IGNORE_CHECK=" /home/git/repositories/gitosis-admin.git/gitosis.conf | sed -E 's/^.*"(.*)".*$//'`

# Get the setting for removing remotes not configured in the gitosis.conf
REMOVE_REMOTE=`grep -E "^REMOVE_REMOTE=" /home/git/repositories/gitosis-admin.git/gitosis.conf | sed -E 's/^.*"(.*)".*$//'`



# Check if remote is a cluster member 
REMOTE_IP=`echo $SSH_CLIENT | sed 's/ .*$//'`
MIRROR="YES"
for IP in $REMOTE_IP_LIST
do
	# Remote is cluster member - do not mirror
	if [[ "$IP" == "$REMOTE_IP" ]]
	then
		MIRROR="NO"
		break
	fi
done

# Disable mirror check for configured repositories
if [[ "$MIRROR" == "NO" ]]
then
	# get the name of current repository
	REPO_NAME=`echo $SSH_ORIGINAL_COMMAND | sed -E "s/^.*\/(.*)'.*$//"`

	# Check if the repository is available in the IGNORE_CHECK list
	DO_MIRROR=`echo "$IGNORE_LIST" | grep "$REPO_NAME" |wc -l`

	if [ "$DO_MIRROR" -gt "0" ]
	then
		MIRROR="YES"
	fi
fi


# Push to cluster members
if [[ "$MIRROR" == "YES" ]]
then
	# Get the servers public IP
	OWN_IP=`curl -s icanhazip.com`

	for REMOTE in $REMOTE_LIST
	do

		# check if remote is the host itself
		REMOTE_IP_LIST=`grep -E "^CLUSTER_SERVER=\"$REMOTE" /home/git/repositories/gitosis-admin.git/gitosis.conf | sed -E 's/^.*"(.*)".*$//' | awk '{print $3}'`
		if [[ "$OWN_IP" == "$REMOTE_IP_LIST" ]]
		then
			echo "*** Remote is own host - not mirror (loop)"
			continue
		fi

		REMOTE_FOUND=`git remote | grep "$REMOTE" |wc -l`
		if [[ "$REMOTE_FOUND" -eq "0" ]]
		then
			REMOTE_URL=`grep -E "^CLUSTER_SERVER=\"$REMOTE" /home/git/repositories/gitosis-admin.git/gitosis.conf | sed -E 's/^.*"(.*)".*$//' | awk '{print $2}'`
			REPO_NAME=`echo $SSH_ORIGINAL_COMMAND | sed -E "s/^.*\/(.*)'.*$//"`
			echo "git remote add $REMOTE $REMOTE_URL$REPO_NAME"
			git remote add $REMOTE $REMOTE_URL$REPO_NAME
		fi

		echo "git push --all $REMOTE"
		git push --all $REMOTE
	done
else
	# if the IP indicates mirroring back, nothing needs to be done.
	echo "*** Mirroring - not mirror back"
fi


# remove remotes not configured, if enabled
if [[ "$REMOVE_REMOTE" == "YES" ]]
then
	# compare the lists of configured remotes to the repository remotes
	REMOVE_LIST=$(comm -13 <(grep -E "^CLUSTER_SERVER=" /home/git/repositories/gitosis-admin.git/gitosis.conf | sed -E 's/^.*"(.*)".*$//' | awk '{print $1}' | sort) <(git remote | sort))

	for REMOVE in $REMOVE_LIST
	do
		echo "git remote rm $REMOVE"
		git remote rm $REMOVE
	done
fi

With the hook in place, a new repository will automatically have this hook installed upon creation, and it will be executed when the repository is created. This will check the configuration and add the remotes to the repository. The complete process of mirroring the repository is automatically set up as soon as you create a new repository, or when you add a new server to the cluster.

Similar to the configuration of the repository is the configuration of the cluster servers. Simply add them to the configuration and with the next push to the repository, the remote will be added and the repository pushed.

This is an example output when using this hook on a three server cluster setup. Note that there is no pushing back of the changes.

[Desktop]$ git push origin master 
Counting objects: 7, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 303 bytes | 0 bytes/s, done.
Total 3 (delta 2), reused 0 (delta 0)
remote: git push --all server1
remote: To ssh://git@server1.example.com:1234/repository-name.git
remote:    3df6491..a5b963b  master -> master
remote:    3067fbc..3df6491  server1/master -> server1/master
remote:    3067fbc..3df6491  server2/master -> server2/master
remote: git push --all server2
remote: To ssh://git@server2.example.com:1234/repository-name.git
remote:    3df6491..a5b963b  master -> master
remote:    3df6491..a5b963b  server1/master -> server1/master
remote:    3067fbc..3df6491  server2/master -> server2/master
remote: *** Remote is own host - not mirror (loop)
To ssh://git@repo.example.com:1234/repository-name.git
   3df6491..a5b963b  master -> master

The situation is slightly different when the repository is used as a remote etckeeper repository. Of course it should also be mirrored to all other cluster servers, but as it originates from server3 itself, the check for the IP would prevent the mirroring from happening. This is why the IGNORE_CHECK parameter was introduced.

When the etckeeper repository is listed in the “IGNORE_CHECK” list, the hook will ignore the IP check and push it to the other servers. All servers have the same configuration and therefore will act the same way. This does not necessarily result in push attempts and these do not cause any problems for the consistency of the repository. It will just slow down the push process.

[Server3]$ etckeeper commit "commit message goes here"
[master 8ec94b9] commit message goes here
 Author: user1 <user1@server3.example.com>
 22 files changed, 43 insertions(+), 3 deletions(-)
Counting objects: 54, done.
Compressing objects: 100% (27/27), done.
Writing objects: 100% (28/28), 2.98 KiB, done.
Total 28 (delta 23), reused 0 (delta 0)
remote: git push --all server1
remote: remote: *** Remote is own host - not mirror (loop)        
remote: remote: git push --all server2        
remote: remote: remote: git push --all server1                
remote: remote: remote: Everything up-to-date                
remote: remote: remote: *** Remote is own host - not mirror (loop)                
remote: remote: remote: git push --all server3                
remote: remote: remote: Everything up-to-date                
remote: remote: To ssh://git@server2.example.com:1234/repository2-name.git        
remote: remote:    51bf750..8ec94b9  master -> master        
remote: remote: git push --all server3        
remote: remote: Everything up-to-date        
remote: To ssh://git@server1.example.com:1234/repository2-name.git
remote:    51bf750..8ec94b9  master -> master
remote: git push -all server2
remote: Everything up-to-date
remote: *** Remote is own host - not mirror (loop)
To ssh://git@repo.example.com:1234/repository2-name.git
   51bf750..8ec94b9  master -> master

The above output shows an example output when you commit/push changes via etckeeper to a repository that is listed in the “IGNORE_CHECK” list. Check my previous post about Keep track of Linux configuration changes with etckeeper for details about etckeeper setup.


Read more of my posts on my blog at http://blog.tinned-software.net/.

This entry was posted in Linux Administration, Version control system and tagged , , , , . Bookmark the permalink.