Moving to GitHub
Six years ago, we didn’t really look into what the free academic version of GitHub teams provided for private repositories, and instead decided that we should have a self-hosted GitLab instance. Outside of some weird issues around SSL certificates and reverse proxies (i.e. how our web-hosting worked), this actually worked pretty well. However, instead of making our internal projects public, we would frequently move them over to GitHub. I think this is partly because GitHub has become the de-facto location for open-source tool development. However, it seemed to cause some friction to push things between the two platforms. This may be due to our lab not using git as much as we should be, especially the collaborative aspects in issues and pull requests.
After six years, our PI decided to look into the costs for GitHub teams, and discovered that academic teams get free private repos with rather generous amounts of storage and actions minutes. Therefore, we decided it was time to migrate our projects over to GitHub. The PI delegated this task to yours truly.
Getting the List of Projects
To get the list of projects to migrate, I actually copied the project lists from the web interface. If I was doing this again, I would have just asked for the backup locations of our GitLab projects and used those to get the paths for each project. Some projects had changed their names over time and not updated the corresponding path in the database.
Users to IDs
Because I used lists from the web interface, I also had to convert the users to actual IDs that define the paths to each project. These weren’t too hard to look up.
Local Clones
With a full list of projects as “User Name / Project”, and users to IDs, I could easily script converting users to IDs and the full project path, and then system calls to GIT_SSL_NO_VERIFY=TRUE git clone --mirror gitlab_url
to make mirror clones of each project. GIT_SSL_NO_VERIFY=TRUE
deals with our labs weird SSL certificate issues, while using --mirror
means all the branches are pulled, not just the default branch. This is useful to maintain a history of branches over the course of the project development.
Note that this means you get a directory with “projectName.git”, not just “projectName”. If you look inside the “projectName.git” folder, it’s actually all of the git contents, not the usual files you would see for a normal clone operation.
Duplicate Projects
One thing I didn’t think about when doing this is that users on GitLab can fork each others projects, resulting in duplicate projects. If I had been smart, I would have cloned the projects into user specific directories. Thankfully, we only had a few duplicate projects where I had to do a second clone to a user specific directory.
Empty Projects
Some of the projects are actually empty, either because we thought we were going to do something and didn’t, or someone just wanted issue tracking (that would be me). These we don’t want to push back up to GitHub. However, empty projects with no commits appear to have less than two files in the objects
directory, and zero files in the objects/pack
directory.
Creating GitHub Repos
I used the GitHub command-line-interface tools (gh
) to interface with our GitHub organization. After authorizing the gh
tools, for each GitLab project I created a corresponding GitHub project, and then moved to the local clone, and did
= "/path/to/github/repos"
github_locs ::walk(gitlab_project_dirs, function(in_dir){
purrr= basename(in_dir)
proj_name setwd(github_locs)
= paste0("gh repo create --confirm --private orgName/", proj_name)
gh_command system(gh_command)
setwd(in_dir)
= paste0("git@github.com/orgName/", proj_name)
gh_loc = paste0("git push --mirror ", gh_loc)
git_push system(git_push)
})
You can see here I move back and forth between directories a lot, and used system to run gh
and git push
. It feels a little hacky, but it worked just fine.
190 Repos Later
I moved over 180 GitLab projects to GitHub repos. With the projects that already had GitHub repos, that means we have over 190 lab repos. That doesn’t count the stuff we left related to testing CI/CD, people learning how to work with GitLab, and 20 copies of the code we use for sys-admin. There were a subset I had to do by hand because of replication issues and not using user directories. I also discovered some missing repos due to a missed letter when copying one lab members user ID.
Reuse
Citation
@online{mflight2021,
author = {Robert M Flight},
title = {Migrating {Self-Hosted} {GitLab} {Projects} to {GitHub}},
date = {2021-11-25},
url = {https://rmflight.github.io/posts/2021-11-25-migrating-self-hosted-gitlab-projects-to-github},
langid = {en}
}