Migrating Self-Hosted GitLab Projects to GitHub

We wanted to migrate from self-hosted GitLab projects to GitHub repos. Here is some background on how we accomplished that.

version-control
github
gitlab
Author

Robert M Flight

Published

November 25, 2021

Moving to GitHub

Six years ago, we didn’t really look into what the free academic version of GitHub teams provided for private repositories, and instead decided that we should have a self-hosted GitLab instance. Outside of some weird issues around SSL certificates and reverse proxies (i.e. how our web-hosting worked), this actually worked pretty well. However, instead of making our internal projects public, we would frequently move them over to GitHub. I think this is partly because GitHub has become the de-facto location for open-source tool development. However, it seemed to cause some friction to push things between the two platforms. This may be due to our lab not using git as much as we should be, especially the collaborative aspects in issues and pull requests.

After six years, our PI decided to look into the costs for GitHub teams, and discovered that academic teams get free private repos with rather generous amounts of storage and actions minutes. Therefore, we decided it was time to migrate our projects over to GitHub. The PI delegated this task to yours truly.

Getting the List of Projects

To get the list of projects to migrate, I actually copied the project lists from the web interface. If I was doing this again, I would have just asked for the backup locations of our GitLab projects and used those to get the paths for each project. Some projects had changed their names over time and not updated the corresponding path in the database.

Users to IDs

Because I used lists from the web interface, I also had to convert the users to actual IDs that define the paths to each project. These weren’t too hard to look up.

Local Clones

With a full list of projects as “User Name / Project”, and users to IDs, I could easily script converting users to IDs and the full project path, and then system calls to GIT_SSL_NO_VERIFY=TRUE git clone --mirror gitlab_url to make mirror clones of each project. GIT_SSL_NO_VERIFY=TRUE deals with our labs weird SSL certificate issues, while using --mirror means all the branches are pulled, not just the default branch. This is useful to maintain a history of branches over the course of the project development.

Note that this means you get a directory with “projectName.git”, not just “projectName”. If you look inside the “projectName.git” folder, it’s actually all of the git contents, not the usual files you would see for a normal clone operation.

Duplicate Projects

One thing I didn’t think about when doing this is that users on GitLab can fork each others projects, resulting in duplicate projects. If I had been smart, I would have cloned the projects into user specific directories. Thankfully, we only had a few duplicate projects where I had to do a second clone to a user specific directory.

Empty Projects

Some of the projects are actually empty, either because we thought we were going to do something and didn’t, or someone just wanted issue tracking (that would be me). These we don’t want to push back up to GitHub. However, empty projects with no commits appear to have less than two files in the objects directory, and zero files in the objects/pack directory.

Creating GitHub Repos

I used the GitHub command-line-interface tools (gh) to interface with our GitHub organization. After authorizing the gh tools, for each GitLab project I created a corresponding GitHub project, and then moved to the local clone, and did

github_locs = "/path/to/github/repos"
purrr::walk(gitlab_project_dirs, function(in_dir){
  proj_name = basename(in_dir)
  setwd(github_locs)
  gh_command = paste0("gh repo create --confirm --private orgName/", proj_name)
  system(gh_command)
  setwd(in_dir)
  gh_loc = paste0("git@github.com/orgName/", proj_name)
  git_push = paste0("git push --mirror ", gh_loc)
  system(git_push)
})

You can see here I move back and forth between directories a lot, and used system to run gh and git push. It feels a little hacky, but it worked just fine.

190 Repos Later

I moved over 180 GitLab projects to GitHub repos. With the projects that already had GitHub repos, that means we have over 190 lab repos. That doesn’t count the stuff we left related to testing CI/CD, people learning how to work with GitLab, and 20 copies of the code we use for sys-admin. There were a subset I had to do by hand because of replication issues and not using user directories. I also discovered some missing repos due to a missed letter when copying one lab members user ID.

Reuse

Citation

BibTeX citation:
@online{mflight2021,
  author = {Robert M Flight},
  title = {Migrating {Self-Hosted} {GitLab} {Projects} to {GitHub}},
  date = {2021-11-25},
  url = {https://rmflight.github.io/posts/2021-11-25-migrating-self-hosted-gitlab-projects-to-github},
  langid = {en}
}
For attribution, please cite this work as:
Robert M Flight. 2021. “Migrating Self-Hosted GitLab Projects to GitHub.” November 25, 2021. https://rmflight.github.io/posts/2021-11-25-migrating-self-hosted-gitlab-projects-to-github.