Tech Blog: Remote collaboration at scale

Posted on February 01, 2020

How Project Borealis uses Git for distributed workflows

Hello and welcome to our very first tech blog! During the development of Project Borealis we want to remain open and communicative, and while our Development Update videos aim to provide a high level overview of our progress, we wanted to take the opportunity to delve into some specifics from time to time. This is the first in what we hope will be a series of more technical blogs about the development of the game.

Today, we’d like to talk about how we’re collaborating on a large scale game project effectively in an entirely remote setting, while keeping costs at a minimum. We didn’t get this right on the first (or second) try, so we’ll first go over some of our less successful systems, what went wrong with them, and what we learned. Finally, we’ll detail our current setup and how it’s evolved to scale for a team of nearly 100 game developers working all around the world. Feel free to skip to the second part if you’d like to get right into the technical details on what we currently use, but we think the failures of our first attempts provide some useful context.

If at first you don’t succeed, try, try again…

In the first few days of our project, when Epistle 3 had just been published, things were hectic. We were effectively making a game development studio from scratch, and it had to be entirely remote. Game development is traditionally done at a physical studio, because of the large assets which are hard to transfer over the Internet, as well as many collaboration and iteration problems normally solved by being in the same room as people you’re working with.

Once the team had settled and we had made some larger decisions (like which engine to use!), we instinctively turned to Git and Git LFS (Large File Storage) to host our game repo.

Repo v1

We first settled on using GitLab.com. We were able to find Git LFS to store our art assets and GitLab.com provided storage for LFS for free. This worked well for a time, since our game repo was mostly code at that point — our artists were working independently in their art tools and design was in the pre-production stage, so we didn’t have to worry about levels yet either. Programmers would branch off our main branch and submit pull requests (merge requests in GitLab). So far so good. Once artists began submitting assets, we just let them branch however they wanted and merge back in whenever they wanted. We hit some small road bumps, such as where artists were missing editor binaries to launch the game editor - programmers just instructed artists on how to install Visual Studio so they could compile it themselves. This system worked for the most part…but not very well. It was also not at all friendly to artists who weren’t familiar with these systems. But we faced a bigger issue than technical ease-of-use: repo size. GitLab.com was quite unstable at the time we were using it, and with larger repository sizes, it could cut out regularly, especially for new people cloning the entire repo. Team members were also required to download the entire project for their work, even when they didn’t need it all. In addition to this we were also quickly approaching the 10GB repo size limit with all the new art assets and initial levels being added. In an attempt to mitigate these issues we redesigned our repository structure. It was probably one of the worst technical decisions we made.

Repo v2

First, we split the repo into submodules, grouped by project type and then asset type. They were named Shared, for all core assets, Imported, for all Valve reference assets imported to Unreal Engine, Development, for any test assets, Ravenholm, for all tech demo specific assets, and Episode3, for all Episode 3 specific assets. Each of these groups had submodules for particles, models, materials, music, sound, etc. And this was all backed by a mandatory command line script called Icebreaker that would, at least theoretically, manage all these submodules for you. It sounded great and seemed like it would solve all of our problems, however it quickly became a convoluted nightmare.

The actual result was a whole host of bugs, technical issues, and confused artists, for very little benefit. Unfortunately we had backed ourselves into a corner with this system. The massive amount of submodules and the workflows being built around them were too unwieldy to use with anything else except for Icebreaker. So we dealt with it. Support requests went through the roof, and there was a lot of frustration with getting content into the repository. Ultimately it was a huge waste of time, but we couldn’t commit to changing as the issues piled up because we were getting closer to Development Update 4 and needed to show off Ravenholm. We couldn’t pause development to rework the repo until afterwards. After Update 4, our team took a well deserved break and programming was free to restructure the repo once more. And this time, we wanted to make a scalable and robust collaboration solution that was easy to use.

Repo v2: Episode 1

We had learned a lot from our previous repos and took a good look at where the problems were to see how we could best address them.

The first decision we made was to move to a self-hosted GitLab server. This was to avoid the downtime, and instability that had plagued GitLab.com, and to remove the repo size limit. This allowed us to have a unified repository again, which was a great first step.

Next, we wanted to address the “anything-goes” branching structure which had caused so many of merge conflicts (where people modify the same file, and the changes from one version can’t be kept), as well as general confusion from our artists. A formal branch structure document was made, complete with full software lifecycle details, merge policy and schedule. We created a dev branch from which programmers created branches, a main branch which contained the latest generally stable code ready to be used by to creatives, and then content branches from main. After a short time, we realized that we were not getting much value out of git flow for creatives and thus just made a content-main branch which all creatives pushed to and pulled from directly, and it would be merged to main for stable game build releases. Ultimately, this structure wasn’t as efficient as it could be at managing where changes flowed in preparation for a game build release.

On the hosting side, we managed a distributed object storage network with some edge server caching through CloudFlare for LFS objects which had become the majority of our bandwidth. We also made use of Argo Smart Routing to cut down on latency for requests, mainly for file locking. This increased speeds by a lot, but we were still being bottlenecked by bandwidth on git object updates and diffing on our self hosted server. It was also a lot of new infrastructure to maintain ourselves.

Finally, for tools, we started out with a simple batch script for pulling the latest from the repo, updating our custom engine with ue4versionator, and launching the project. The rest of source control (checking out files, committing, pushing) was done in the UE Git plugin, to which we added changes to for parallel locking to speed up check out operations within the editor instead of locking one by one using our Git LFS fork, as well as a few other improvements. However, that simple batch script got unwieldy over time as we matured our source control workflow to cover common problems, streamline getting the latest changes, and manage binary dependencies. Thus, PBSync and PBGet were born.

Our solution to highly distributed game development

Repo v3

Third time’s the charm. For our final repository iteration, we created PBSync and PBGet (now open source!) to scale to our demands for a complete solution for repository and dependency management.

PBSync is a robust, battle-tested set of Python scripts which assist users in syncing the project, handling source control edge cases, updating binaries, restoring bad repositories to a good state, deploying game builds, and much more! This came about when our previous batch scripts became too unwieldy for us. Now, we sync project editor DLL binaries through Nuget, rather than in source control. This keeps our source control lighter, and uses a package manager for what it was made to do: handling binary packages. We are also able to maintain complex logic and edge cases to streamline project syncing even in bad repository states and fresh clones, with guided setup.

PBGet is an underlying framework for working with Nuget to download packages, which are linked to their expected location for UE4 using junctions. It also allows programmers to bump our project version, and push new source files to Nuget.

But having a programmer manage all of that wouldn’t work at scale, so using these robust tools, we were able to create an entirely automated Continuous Integration system which assists us in these pipelines and source control flows. We currently use TeamCity to automatically build binaries and merge changes for creatives to use on their own branch, and to sync content changes back to the programmer development branch. Code feature branches are automatically statically analyzed, and we also run a style check and Doxygen build with linting. This ensures that code review doesn’t miss any small changes and we have our rules enforced to maintain a clean and well functioning code base.

As a side note, we also migrated our repository to GitHub from our self-hosted GitLab, to ease up some maintenance pressure on our end, and to relieve some performance concerns by taking advantage of GitHub’s great engineering efforts to create a reliable and efficient distributed source control infrastructure. This was largely a straightforward process thanks to Git’s decentralized nature.

Finally, we adjusted our git config to take advantage of upcoming performance features and usability enhancements to further make game development on git fast and easy. We use the commit graph for optimizing many local operations, watchman as a git filesystem monitor to passively monitor file changes, and many more optimizations for repositories with many files. We also use rebasing for our project default pull method, so that there are not lock conflicts in merge commits, and it also makes it easier and cleaner for creatives to make sure they’re on the latest changes.

All of this together made for an overwhelmingly positive response. Programmers and creatives alike love the system we have built, and find it to be streamlined and pain free! Our stats reflect this: tech support tickets have gone down by over 95%! So now we are left with a high speed, low latency source control that works with our repository with over 14,000 files totalling over 28GB for an entirely remote team distributed around the world from California to Germany to New Zealand.

We’ll talk more about our build pipeline (up to game build releases) and merge policies in a future post, but until then, you can view our new project wiki, where we’re publishing some internal documents about our source control. Stay tuned for that on our Twitter, and feel free to ask questions on our Discord.

And as a bonus for making it to the end of the blog, we have a new video up, which visualizes our project repository’s long and rich history through Gource!