Migrating Self-Hosted GitLab Projects to GitHub

version-control github gitlab

We wanted to migrate from self-hosted GitLab projects to GitHub repos. Here is some background on how we accomplished that.

Robert M Flight true
2021-11-25

Moving to GitHub

Six years ago, we didn’t really look into what the free academic version of GitHub teams provided for private repositories, and instead decided that we should have a self-hosted GitLab instance. Outside of some weird issues around SSL certificates and reverse proxies (i.e. how our web-hosting worked), this actually worked pretty well. However, instead of making our internal projects public, we would frequently move them over to GitHub. I think this is partly because GitHub has become the de-facto location for open-source tool development. However, it seemed to cause some friction to push things between the two platforms. This may be due to our lab not using git as much as we should be, especially the collaborative aspects in issues and pull requests.

After six years, our PI decided to look into the costs for GitHub teams, and discovered that academic teams get free private repos with rather generous amounts of storage and actions minutes. Therefore, we decided it was time to migrate our projects over to GitHub. The PI delegated this task to yours truly.

Getting the List of Projects

To get the list of projects to migrate, I actually copied the project lists from the web interface. If I was doing this again, I would have just asked for the backup locations of our GitLab projects and used those to get the paths for each project. Some projects had changed their names over time and not updated the corresponding path in the database.

Users to IDs

Because I used lists from the web interface, I also had to convert the users to actual IDs that define the paths to each project. These weren’t too hard to look up.

Local Clones

With a full list of projects as “User Name / Project”, and users to IDs, I could easily script converting users to IDs and the full project path, and then system calls to GIT_SSL_NO_VERIFY=TRUE git clone --mirror gitlab_url to make mirror clones of each project. GIT_SSL_NO_VERIFY=TRUE deals with our labs weird SSL certificate issues, while using --mirror means all the branches are pulled, not just the default branch. This is useful to maintain a history of branches over the course of the project development.

Note that this means you get a directory with “projectName.git”, not just “projectName”. If you look inside the “projectName.git” folder, it’s actually all of the git contents, not the usual files you would see for a normal clone operation.

Duplicate Projects

One thing I didn’t think about when doing this is that users on GitLab can fork each others projects, resulting in duplicate projects. If I had been smart, I would have cloned the projects into user specific directories. Thankfully, we only had a few duplicate projects where I had to do a second clone to a user specific directory.

Empty Projects

Some of the projects are actually empty, either because we thought we were going to do something and didn’t, or someone just wanted issue tracking (that would be me). These we don’t want to push back up to GitHub. However, empty projects with no commits appear to have less than two files in the objects directory, and zero files in the objects/pack directory.

Creating GitHub Repos

I used the GitHub command-line-interface tools (gh) to interface with our GitHub organization. After authorizing the gh tools, for each GitLab project I created a corresponding GitHub project, and then moved to the local clone, and did

github_locs = "/path/to/github/repos"
purrr::walk(gitlab_project_dirs, function(in_dir){
  proj_name = basename(in_dir)
  setwd(github_locs)
  gh_command = paste0("gh repo create --confirm --private orgName/", proj_name)
  system(gh_command)
  setwd(in_dir)
  gh_loc = paste0("git@github.com/orgName/", proj_name)
  git_push = paste0("git push --mirror ", gh_loc)
  system(git_push)
})

You can see here I move back and forth between directories a lot, and used system to run gh and git push. It feels a little hacky, but it worked just fine.

190 Repos Later

I moved over 180 GitLab projects to GitHub repos. With the projects that already had GitHub repos, that means we have over 190 lab repos. That doesn’t count the stuff we left related to testing CI/CD, people learning how to work with GitLab, and 20 copies of the code we use for sys-admin. There were a subset I had to do by hand because of replication issues and not using user directories. I also discovered some missing repos due to a missed letter when copying one lab members user ID.

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/rmflight/researchBlog_distill, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Flight (2021, Nov. 25). Deciphering Life: One Bit at a Time: Migrating Self-Hosted GitLab Projects to GitHub. Retrieved from https://rmflight.github.io/posts/2021-11-25-migrating-self-hosted-gitlab-projects-to-github/

BibTeX citation

@misc{flight2021migrating,
  author = {Flight, Robert M},
  title = {Deciphering Life: One Bit at a Time: Migrating Self-Hosted GitLab Projects to GitHub},
  url = {https://rmflight.github.io/posts/2021-11-25-migrating-self-hosted-gitlab-projects-to-github/},
  year = {2021}
}