navigation

Account

In your account you can view the status of your application, save incomplete applications and view current news and events

enEnglish

deGerman

February 04, 2021

Switching from Jenkins to GitHub as a CI/CD system

Development

This is not a pipeline

For our customers, otto.de is a large webshop with a huge range of products. Behind the seamless façade, numerous systems are involved. This highly complex landscape is now managed by 30 teams; each team is responsible for a manageable number of microservices, and each microservice provides exactly one functionality for otto.de. Teams can deploy and redeploy their microservices on their own initiative and very largely independently, and also provide these with updated software (read more).
To ensure that deployments run smoothly, each microservice has its own build pipeline that automatically tests and rolls out the software. This process is called CD/CI and is an integral part of otto.de. Every microservice therefore always includes a system that tests and updates it.

The story so far

In the first few years after the rollout of Lhotse, the successful OTTO in-house development (read more), Jenkins was used by all teams. That changed over the following years and the build-systems landscape grew more diverse with GitLab CI, GoCD, and LambdaCD.
My team is responsible for static resources on otto.de and uses node.js for back-end systems. Jenkins has long been the tool of choice for us too.
When OTTO opened up more towards the cloud in 2018 and GitLab was replaced by GitHub, our team decided to switch from GitLab CI to CircleCI. In retrospect the migration was not too difficult, but it took a long time. Nevertheless, we were very happy to use a build system ‘as a service’. The combination of GitHub and CircleCI appeared very suitable to us. Many teams followed our lead and soon CircleCI became almost overcrowded. Because our resources there were contractually limited, we sometimes had to put up with longer queues. Bottlenecks like these are critical in key live deployments.

Evaluation and implementation

In April 2020, GitHub announced its own CI/CD platform, and thanks to the enterprise contract an alternative to CircleCI became available to us pretty much overnight. Since we already use the Github Package Registry for Docker images and NPM packages, we definitely wanted to try out the new actions. Our first impression, however, was rather sobering – the actions seemed unfinished and alien. In addition, a 1:1 migration was impossible. The concept of manual gates that we use for deployments at CircleCI does not exist there. So far, we have found it reassuring that pipelines do not update software on live systems, even following successful checks, without our intervention – some changes need to be very closely monitored. A manual gate only allows us to deploy code if we have the capacity for monitoring and a rollback.
Since we’ve put a lot of work into our CircleCI workflows over the last two years, there was no interest in having to restart from scratch.
After an initial proof of concept, our opinion changed for the better and we decided to dare to make the move.
We have migrated a total of 64 CircleCI pipelines to GitHub over the last few weeks and can now proudly give GitHub Actions a warm welcome!

Checking the status quo

Before we launched the move we identified the similarities between our pipelines.
All our pipelines:

go through a build step (in our team, specifically: npm ci and npm run build)
run unit/interactions/E2E tests (npm run test)
audit dependencies (npm audit)
check software licences, as we are not allowed to use all licences
provide documentation in a central S3 bucket.

Besides this we also have three types of pipelines:

pipelines that build within the AWS infrastructure (terraform apply) and then roll out software
pipelines that provide software as an NPM package or Docker image
pipelines that require both.

Pipelines with an impact on the customer also report their deployment to a central monitoring function.
Besides this, we also have the following rules:

feature branches require a gate to be rolled out to a develop system
only the main branch can launch a live deployment behind a gate.

We were able to create appropriate GitHub workflows for all requirements. Only the gates could not be depicted. Instead, we use the ‘Released’ trigger for a deployment, which can be activated via the GitHub GUI or the GitHub API. We use prereleases for deployments that are only to be rolled out up to the develop system.

name: Deploy Terraform
'on':
  release:
     types:
       - prereleased
       - released

Another advantage is that JavaScript is a First Class Citizen at GitHub Actions. This is very convenient for our team, which feels most comfortable using Node.js.

Keeping it DRY

Pipelines, especially if they are very similar, are unfortunately vulnerable to code duplications. For CircleCI we use Orbs and YAML anchors to make sure we stay DRY – ‘Don’t Repeat Yourself’.
For example, a deployment to our develop system is different from a live deployment in one place precisely – the target environment (live instead of develop). All other steps are completely identical. In the YAML files for CircleCI we were able to create a single source of truth via Anchors and Aliases, so we only have to make changes in one place.
Because we have pipelines that deploy within up to 6 different AWS accounts, we were able to cut out much of our identical code and keep a clear overview of the workflow. Orbs helped us reuse code across pipelines.
Regrettably there are no anchors on GitHub Actions and this feature is sorely missed by the community. GitHub is aware of this pain point; so far, however, there has only been one announcement that they’re working on it.
But since we didn't want to start duplicating code at all, we wrote an application called Gitty that updates workflows in all repositories via the GitHub API.
In each repo we have a config that Gitty reads via GitHub API. YAML workflows are generated from the config and checked back in by commit. Our initial intensive effort quickly paid off.

Example:
When it was recently announced that set-env would soon be deactivated, we needed to adjust code in just one place to update all repositories (read more).

Secrets

Pipelines outside AWS need credentials to create resources within AWS. At CircleCI we had already distributed the AWS Credentials via Lambda and API and rotated them regularly. This is also possible with GitHub. But beware! With the GitHub API, the abuse detection mechanism strikes quickly. An unlimited Promise.all() against all repos triggers a very snappy 403.

Availability

Unfortunately, a CI system can always crash. This has rarely been the case with CircleCI in recent years, but when it has actually occurred it has always been at the wrong time. We have therefore designed all pipelines so that they can also be executed locally.
That's why we use an AWS role in the pipeline that can also be ‘assumed’ by our local AWS users. For instance, a deployment no longer fails because of a missing IAM policy. Fortunately, there are just a few core requirements for a development computer: Node.js, terraform, git, GITHUB_TOKEN, and AWS Credentials are enough to build, test and deploy all services locally. We definitely want to maintain this independence in all situations. While CI systems handle a lot of the hard work, they should never be the only systems available to roll out code in an emergency.

Summing up

We are now very happy with GitHub Actions and have hardly had to make any further adjustments over the last few weeks. Jobs are running fast and stable. There are no waiting times for us at the moment, although things may soon get tight here too.
Changing the CI system is always a good opportunity for a good spring clean, and we have taken full advantage of this. The close dovetailing with GitHub has proven to be a great advantage. We are really happy to have dared to change and do not want to do without this solution in future!