Being a long time user of StarTeam (since 2000) we have been reasonable content with our StarTeam installation. We do however use a number of different operating systems and with Borland no longer giving appropriate support to OSx and the fact that the integration with Visual Studio is extremely bulky and only available on the top tier of the product I decided it was time to switch to something new. A number of our developers had used Subversion in the past so this was the obvious choice - the repository would however need to be converted. Having gone through the conversion from MKS to starteam in 2000 I thought I would take responsibility for the StarTeam to Subversion conversion.
I found a utility written in Java called svnimport from Polarion and it did a reasonable job until it hit our main project and kept throwing out of heap exceptions. I tried all the Java tricks of setting a large heap size and aggressive heap usage but it would always die at about 1.3GB of allocated memory (I had 12 GB so I think the issue was the way the utility was written). After a bit of searching I found the source code for svnimport. I downloaded a copy of myeclipse and started debugging svnimport to see if I could possibly fix the issues or change the way it was using memory. I am however not a big fan of Java so I decided to convert the project to C# and use the Borland .Net libraries.
The first issue I discovered with svnimport was that it did not do its branching or tagging intelligently if you had 10 branches that were created at different times there would be 10 copies of everything up to the branch time. this meant that if you had a 1MB file with 10 revisions in your main trunk it would be present 10 times in every branch. This design caused one project to take 18 hours to dump and 4 days to import creating a 24GB subversion repository
The first thing I changed was to build up a list of branches and tags sorted by creation date (you need to check the attributes and see if the branch is based on a label and then get the appropriate date from the label item).
Svnimport builds up a list of actions/commits based on the revision of each file, I changed this to only process files that had a modified time > than the creation time of the branch - this cut down on the issue above of having duplicate files. now with these 2 changes I went about updating the dump file creation. The commits are sorted by date so all I had to do was insert code that would check to see if a commit date was larger than a branch or label and if so I would inset an action to create a branch/label that was copied from the current revision. Tags were easy as they are just a snapshot in time.
I did a couple of test runs - one project that was generating a 18MB dump file was now generating a 3MB dump, and the import into subversion was solid.
I could now start working on getting a more reliable conversion going.
continued in part 2...