Migrating from Subversion to Mercurial

| by Ken | in Technology Add comments

I have an old computer that I had been using as a subversion server. Before the computer fully fails, I decided to move the subversion repository off the computer. Rather than putting it on another computer, though, I wondered if I still needed a server for source control at all.  As a small business, having less hardware to maintain/manage is better for me.

At first I wondered if I could use an NAS or similar device to function as a source control server. I found some reports indicating that you could hack an NAS to add in subversion server capability. But that’s not what I had in mind. I was wondering if my laptop could effectively be not only the computer that I use to write code but also the computer responsible for storing code. Now I didn’t want to make it a server, per se, where other computers could theoretically use it as a server. But I thought the computer might be able to handle storing files to a repository, like an NAS, on its own.

After a brief education on the current state of version control systems, I realized what I wanted is what is known as a “DVCS”, or a Distributed Version Control System. Rather than needing a centralized server to do the all the thinking about version control and having my various computers connect to it, I could manage the repository (or repositories, as I would come to learn) completely on my own laptop. At first the fact that the whole version control mess would be on my laptop seemed like a negative. Wasting disk space, keeping a local mess, etc.. And the idea that I could lose my whole source control system if the laptop hard drive failed was equally worrisome. But once you grasp the idea of a DVCS giving you the ability to put copies of the repository (or repositories) anywhere easily, you see the benefit to a DVCS. And no computer where you put a repository needs to be a server to any other computer which is good news for me because I don’t need to have hardware to function as a server for source control.

Now which DVCS should I use? Well, since I don’t want to make source control a big deal and spend a lot of time dealing with managing it, I want something that is both mainstream and simple to use. The first obvious DVCS to consider is Git. But having used Git before (although never previously fully understanding the idea that it is a DVCS), I knew how raw the Git experience is. I wanted something a bit more friendly. Fossil seemed like a neat choice but it didn’t seem like I’d be able to set up disk storage on an NAS without being able to run Fossil on the NAS as though it was a server. In the end, I settled on Mercurial because I could run it on Windows, Mac OS, and Linux, because there were no servers required, and yet I would still be able to use an NAS as a master repository of sorts.

Okay, now about the NAS idea, one thing was still bothering me. If I were to travel with my laptop, I’d need to get remote access to the NAS to safely “backup” my work to the repository. And that kind of access would effectively be web access, so rather than use an NAS, I then started thinking I’d be better off using my web host system. I tried to figure out how to do that and I was lost for a bit. All of the references I found described setting up tools on the server which I can’t do because the server is at my hosting provider. They give me shell access but not the ability to install software. (Which is the right call for them.) I could even use Git with them, but I’d have to pay for it. And considering my very simple needs, it’d be overkill to pay for source control. I just wanted to use the web server as a remote file system, remember.

My solution ended up being to remotely mount the Web host system as a “drive” on my local computer. For my Windows system, I used SFTP Net Drive which allowed me to mount the web host as a drive I cleverly called “W:” – I don’t keep SFTP Net Drive running all the time, but it is simple to turn on when I need to make a connection. On the Mac, I followed instructions from Jonathan’s Blog to set up SSHFS so that I now have a volume mounted when the Mac starts cleverly named WebHost that is the file system for my web host. In both cases, I used the existing keys that I have previously used to connect to an SSH terminal session. I haven’t found a Linux tool yet to allow for the same kind of connection but I’m pretty confident I’ll find something similar when I have the time to look into that.

Cool, okay, now I have the plan. I can write code on any computer of any platform. I manage the source code repository for that code locally. When I want to publish (my word) the code, I can simply push it to the drive that is the web server. There, the code repository will be automatically backed up and is available wherever I have network access.

Now all I have to do is convert the old Subversion repository to Mercurial and move it off the old server. That actually turned out to be harder than I expected. Mercurial includes the ability to convert from Subversion (I had checked to make sure before I seriously considered Mercurial) but I had a difficult time finding good information about what to do. And when I did finally muddle through it, it took more steps than it should have and I didn’t end up with exactly what I wanted due to the nature of the original Subversion repository.

The thing with a centralized source control system like Subversion is that you tend to have only one repository. I had checked in my various projects all to the same repository. Now, when converting that repository to Mercurial, I get a directory structure that would force me to rework how I have everything stored on my computer. In other words, with a centralized system, you can point any random point on your file system to a particular branch in your source control. But with a DVCS, the directory structure you have in the repository is by definition the directory structure on disk – that is your source control, distributed to you right there on your disk. Once I got my head around that, it became clear that you don’t want to manage a DVCS the same way you do a centralized system and therefore, I needed to split things out. Which means that the migration work would be more than just convert and move computers. I would also need to then split out the one repository into individual ones.

That led to the next problem. If I split things out as they are, it means I get individual projects that have extra directory structure at their head. For example, say you have three projects at the root of your Trunk in Subversion, Project1, Project2, and Project3 and your Project1 head maps to the Project1 folder in your IDE, etc.. That’s a logical way to work with Subversion. But now, splitting things out for Mercurial, I would end up with one repository that was “Trunk/Project1”, another that was “Trunk/Project2”, and the third that was “Trunk/Project3”. Say I want to put Project1 and Project2 in the same directory, but have them managed separately while Project3 goes in a separate directory, also managed separately. The Mercurial management no longer happens within the directories Project1, Project2, and Project3 – it happens one level above them because that’s where the root is from Subversion – one level up. In the case of Project3, that’d be messy to have it managed one level up but it is doable. In the case of Project1 and Project2, they end up in conflict since the parent for both is the same. I could keep them in the same master, but that could lead to problems later. So I tried to push them down into the child directories. But then I got effectively duplicate directories – Project1 contained the Mercurial directory and a directory named “Project1” which is where the source went. Even though my original had the source directory directly in the first Project1, alongside where I wanted the Mercurial directory to go. So now instead of simply splitting things out, I’d also need to rearrange too!

I first tried to do the rearranging after the fact. But I found that was a big problem. One of the reasons for migrating to Mercurial from Subversion instead of just starting new with Mercurial is that I want to keep the history of the repository. Mercurial doesn’t keep history for files you rearrange. So if I need to effectively rearrange everything after building the repository, then I have blown away all the history and there was no point to doing the migration at all. (Technically, Mercurial can “–follow” to get a history of a file that has been renamed or moved but you have to know that’s the case to get the history and when I tried that, it said the original file didn’t exist. So while that may work sometimes for an individual file, there’s no point in doing that for every file in the whole repository.)

I finally have the plan: Split out original Subversion repository, rearrange to get individual projects at their own root, convert to Mercurial, move to a new computer, and clone to the web host drive. Whew. Here we go.

First, I needed to install Mercurial on the system running Subversion. And connecting to the old Subversion server through network connected drives was easier than copying to a USB drive and moving the USB drive around. Also, of course, I installed Mercurial on the destination computer and the appropriate software to map to my web host as described above.  Although Mercurial comes with the convert tool, it is disabled by default.  To enable it, I edited my mercurial.ini file found in my home directory and added the lines:

[extensions]
convert=

Then because my original Subversion repository was on a Windows system, I had to download some supporting files for Mercurial’s convert.  Atlassian has the details in a wiki article.  I needed the “python SWIG bindings” and the “insertpath.py” file too.  I stored the python bindings directory in where Mercurial was installed and put the insertpath.py file with the child python directories.  Then, I returned to the mercurial.ini file and added another line so that my mercurial.ini file looked like this:

[extensions]
convert=
svnbindings = C:\Program Files\TortoiseHg\svn_1.7.5_py27_x86\insertpath.py

To split the Subversion repository, I used a helpful guide I found at Mugo. There was a goof in the guide there and I needed to modify it a bit for my needs so here are the commands I ran. Childname is the name of the project in Subversion that will also become the repository in Mercurial.

Beginning on the old Subversion server, dump out the whole repository to a file and then filter the file to create a one with only your one project in it.

svnadmin dump Trunk > trunk.dmp
svndumpfilter include childname --drop-empty-revs --renumber-revs < trunk.dmp > childname.dmp

Edit the dump file to change all occurrences of “Node-path: Trunk/” to “/Node-path: ” effectively removing the Trunk specifier and getting the contents to move up a level. And change all occurrences of “Node-copyfrom-path: Trunk/” to “Node-copyfrom-path: “. See the original post for “sed” commands.  I used Textpad to do the search and replace instead since I was on a Windows computer that already had Textpad installed.

Next load the filtered Subversion content back into a new Subversion repository.

svnadmin create childname
svnadmin load childname < childname.dmp

Still on the old Subversion server, I created a directory to serve as temporary holding place for the new Mercurial repository and then created the repository by converting from the Subversion repository.

mkdir hg
cd hg
hg convert -s svn -d hg /full/path/to/svn/childname childname

Now I cheated and just copied the Mercurial .hg directory from the repository I created in the prior step on the old computer to the directory on my normal work computer where I have the current version of the files.  I used a network share to make copying easy but I could have done a USB drive too.

cd /path/to/local/childname
copy /path/to/remote/childname/.hg .
hg update

The update command then runs through the .hg directory and gets the latest version for the project directory. When I did this before getting the original Subversion split out and rearranged, the update would create a new child directory named for the project with the source control stuff inside. But having done this correctly now, the update simply puts the files in the right places and gets Mercurial thinking that it is all set where it is. Depending on the state of the files, in once case, I needed to do a commit after this and in another case, I used update with the –clean flag to get it fully synced.  To make me feel better, I checked a source file for history.

hg log path/to/source/file/sourcefile.txt

The results showed the original history from Subversion. Perfect! With that working, I was finally able to clone the repository to the web host.

hg clone . /path/to/webhost/childname

Checking the web host after the clone shows the Mercurial .hg directory and all of the files that are under control by Mercurial.

Then I repeated the steps above for each of the projects.

Leave a Reply

Human Verification *

All content Copyright © Katharsys LLC Created with Wordpress, Theme "Synergy" by Pagelines modified by Katharsys LLC