Rewrite author of entire git repository

In git repositories, commit messages contain a person’s email address as an identifier for the user. If the email address used to commit is wrong, you might not want it to be shown in the git repository anymore. In general git is very good in keeping a history of changes and information but not as comfortable when it comes to changing history.

Git’s strength is to keep track of changes in the history but it still provides utilities to rewrite history. Those utilities provide a lot of functionality to rewrite / modify the repositories history.

Advertisements

Rewrite affected commits

infoWhen reaching deep into the repository’s history like described here, starting from a clean state reduces the risk of something going wrong. With this in mind, make sure to start with a clean repository.

Rewriting the history is done with “git filter-branch” by walking through the complete history. For each commit, filters are applied after which the changes are re-committed. The different filters allow modifying different parts of the commit.

The following uses “git filter-branch” to filter the history. Instead of manipulating the files to be recommitted like explained in Remove files from git history, this command uses the “–env-filter” to alter the environment in which the re-committing statement takes place.

In this particular case, the name and email will be modified in the environment filter. When the commit is made, the modified name and email will be used. The below script is checking if a commit was made with the email you want to replace. If it matches, the environment is altered by exporting environment variables used to recommit the changes.

As this check is a relatively complex script to place in a one line command, it would be easiest at this point to create a script for it. The following is an example script which allows you to provide the old and new values as arguments to it.

#!/bin/sh

if [[ $# -eq 3 ]]; then
    git filter-branch --env-filter '
    if [ "$GIT_COMMITTER_EMAIL" = "'$1'" ]; then
        export GIT_COMMITTER_NAME='$2'
        export GIT_COMMITTER_EMAIL='$3'
    fi
    if [ "$GIT_AUTHOR_EMAIL" = "'$1'" ]; then
        export GIT_AUTHOR_NAME='$2'
        export GIT_AUTHOR_EMAIL='$3'
    fi
    ' --tag-name-filter cat -- --all
else
    echo "usage: $0 OLD_EMAIL NEW_EMAIL NEW_NAME"
    echo ""
    echo "    OLD_EMAIL     The email address to be replaceed in the commits"
    echo "    NEW_EMAIL     The new author email address to be used in the commit matching OLD_EMAIL"
    echo "    NEW_NAME      The name which should be used in the commit mathing OLD_EMAIL"
fi

“git filter-branch” uses the additional “–tag-name-filter” to take care about tags referencing commits. The last option “–all” defines all refs to be processed. The “–all” needs to be separated from the filter-branch options by “–“. The script is then executed, as the simple help screen shows, as follows.

$ git-change-author.sh "wrong@example.com" "Right Name" "right.name@example.com"
Rewrite 76b9a5e71964ce9843daef1683c14baf9a3b7e0d (19/19) (0 seconds passed, remaining 0 predicted)
Ref 'refs/heads/master' was rewritten
Ref 'refs/remotes/origin/master' was rewritten
WARNING: Ref 'refs/remotes/origin/master' is unchanged

The output of the script shows how “git filter-branch” processes all commits and creates an alternative history. Additional to the “HEAD” and “master” reference, the original history is referenced via “refs/original/refs/remotes/origin/master” and “refs/original/refs/heads/master”. This can be considered as a backup of the old history.

$ git for-each-ref --format='delete %(refname)' refs/original | git update-ref --stdin

The above “git for-each-ref” will print all refs matching “refs/original” in the repository (those which reference the old history) with the “delete” command prefixed. This command is then piped to “git update-refs” command which will delete any reference to the old history.

Push the rewritten history

Pushing the repository with git push to the remote server will only push to the server what is referred from the index. As such the files are completely gone in the remote copy of the repository. To push the rewritten repository, the “–force” option needs to be given to rewrite the history of the remote repository. The second command will additionally force push the tags to the remote server.

$ git push origin --force --all
$ git push origin --force --tags

Update other clones

After the repository has been filtered and the history has been rewritten, the changes were force pushed to the remote server. Now every clone of this repository has to be updated. This can not be done with the usual “pull” alone.

The same procedure to update the clone of the repository can be used as described in Remove files from git history.

$ git fetch origin 
$ git reset --hard origin/master
$ git for-each-ref --format='delete %(refname)' refs/original | git update-ref --stdin
$ git reflog expire --expire=now --all
$ git gc --prune=now

Common issue

In some cases, rewriting the history of the repository fails. If there was already in the past an operation to alter the history, the above mentioned backup references might still exist and cause the following error message.

Cannot create a new backup.
A previous backup already exists in refs/original/
Force overwriting the backup with -f

Either cleanup the repository as explained in the procedure after rewriting the history or “–force” (or “-f” for short) can be added to the “git filter-branch” command in the described script.

Set Author Information

To avoid future commits with the wrong email or name, set them either for this specific repository or globally. The below command, when executed inside the repository directory,  specifies the author’s name and email address used to commit.

$ git config user.name "Right Name"
$ git config user.email right.name@example.com

When “–global” is added to the above commands, the settings are set globally rather than just for the current repository. The globally set name and email are used as long as there is no name or email set in the repository itself.


Read more of my posts on my blog at https://blog.tinned-software.net/.

This entry was posted in Version control system and tagged , , , , . Bookmark the permalink.