navigation

Account

In your account you can view the status of your application, save incomplete applications and view current news and events

enEnglish

deGerman

February 04, 2021

SLIM: Hydrating cloud native CI/CD pipelines to securely access GCP projects

Architecture

Development

by Dr. Mahmoud Reza Rahbar Azad

0 Comments

0 Likes

What is the article about?

Kubernetes has undoubtedly established itself as the state-of-the-art orchestration layer when using containers. Out of the box, it takes care of many necessary things when operating a containerized workload like configuration management, secret management, service discovery, autoscaling and many more. Mastering all bits and pieces of Kubernetes has become a routine for many DevOps engineers nowadays. Using a managed Kubernetes service like Google Kubernetes Engine (GKE) can make the lives of DevOps easier. Nevertheless, one common task is to securely talk to APIs from inside a container, for example to use other GCP services. Only too often this is handled by generating and copying keys around. This technique has disadvantages from a security standpoint. The situation becomes especially crucial when privileged keys are involved, for example when used inside of CI/CD pipelines to roll out infrastructure as code (IaC). In this blog post we will introduce our secret-less-identity-management system, an alternative approach to how we provide an identity to our CI/CD pipelines without sharing secrets bit based on their context.

Accessing secrets from a Kubernetes pod

In a cloud environment such as Google Cloud Platform (GCP), services often need to talk to cloud APIs, e.g. Cloud Storage or Cloud SQL to store data, Cloud Pub/Sub to deliver messages to other systems or Cloud functions to trigger a serverless workflow. For authentication GCP uses Google service accounts (short GSA) and for authorization an associated identity access management (short IAM) binding [1]. In Kubernetes the situation looks similar. If a container wants to talk to the Kubernetes API it first needs a Kubernetes service account (short KSA) and for authorization an associated role binding in the RBAC system [2]. So, what would a Kubernetes container need to talk to a Google API? Well, it of course still needs an GSA. The common approach to achieve this is by creating a GSA key and mount it into the container via a Kubernetes secret from where the container service can pick it up as shown in fig. 1 (1). Alternatively, a container service can access the GSA of the underlying compute instance via the GCE metadata service [3] as depicted in fig. 1 (2). While both methods work, they do have their drawbacks. In the first method the GSA key is statically deployed in a Kubernetes secret and needs to be regularly rotated whereas in the second method all containers on a node automatically share the same GSA and thus the same identity and access.

Fig. 1: Two prevalent options to access a GSA key from a pod running in GKE. (1) Mount the key into the pods filesystem via a Kubernetes secret. (2) Fetch the GSA key from the underlying compute instance the pod is currently running on via the metadata service endpoint.

So, what would be a situation where such a setup is necessary? Well for example we operate a dedicated Gitlab instance for hosting source code and providing CI/CD capabilities [4]. Our autonomous agile working teams deploy their software products continuously to production using Gitlab CI. So, our centralized CI infrastructure needs high privileged access to various productive GCP projects to deploy IaC and software code.
Our Gitlab instance runs inside a GKE cluster and for each pipeline job a dedicated Gitlab runner pod will be dynamically spawned. From a security standpoint it is challenging to associate an authorized identity to these Gitlab runner pods and to give them access to roll out code in a GCP project.
Especially because this identity needs to be accessible to the workload at any given time as CI jobs can be scheduled at any time.
Our first idea was to create dedicated node pools per team, each with their own GSA. This works but heavily breaks the cloud native paradigm of Kubernetes because job runner pods can’t be scheduled anymore where resources are available but only on a constraint environment. Thus, efforts must be made to make this setup scalable. That’s why we have first started with a trivial solution for this challenge to bootstrap our Gitlab environment and afterwards incorporated our learnings into secret-less-identity-management (SLIM for short) system.

Keep calm and carry on storing secrets in environment variables

The most straightforward way is to create a unique GSA key per GCP project and store them as a Gitlab CI variable [5]. These variables can be deployed per repository or repository group and needs to be made protected [6], which results in a visual obfuscation in the UI as well as in a restricted accessibility to pipeline jobs running on protected branches and tags.
Obviously, this approach is very easy to implement: We have full admin privileges on our Gitlab instance, so we create a group for each GCP project. Each group belongs to a team; hence we automatically give every team member a Gitlab role to access all repositories inside this group [7]. Then we create for every GCP project a GSA with Owner IAM binding [1] and deploy the GSA key to the corresponding Gitlab group as a CI variable. Making use of these GSA keys is as easy as dumping the content of the variable into a temporary file and activating it via the well-known environment variable GOOGLE_APPLICATION_CREDENTIALS [8] or via gcloud:

> echo "${CI_ENVIRONMENT_VARIABLE}" >
/tmp/service_account_key_from_env.json
> export GOOGLE_APPLICATION_CREDENTIALS=/tmp/service_account_key_from_env.
json
> gcloud auth activate-service-account - key-file $GOOGLE_APPLICATION_CREDENTIALS

Clearly this approach brings several drawbacks: First, there is no automatic rotation mechanism for the GSA keys in place. This implies some security problems, as these keys are valid for 10 years [9]. Thus, we must come up with a solution for this ourselves. Furthermore, the access management to the keys is a nightmare because the accessibility of these high privileged GSA keys is reduced to whether branches and tags are made protected or not. In other words: if anyone manages to commit to a protected branch or create a protected tag, he has full control over everything in the associated Google Cloud project. Additionally, we also had the situation that we have leaked the GSA keys in the Gitlab backup. Making the backup artifacts more security sensitive as they should be.

Cloud native secret management with SLIM

Going forward, we believe that we can build a better solution for this problem. Our main goal was to get rid of any secrets stored in Gitlab CI variables and simultaneously make it as simple as possible for developers to use. So, we came up with the idea to use the context from where a secret is requested without a shared secret. We also drew some inspiration from the GKE Workload Identity feature [10] as well as the “AWS auth method” of HashiCorps Vault project [11].

On a high level view the flow looks like this:

A script running inside a CI pipeline needs to access a secret. It authenticates against SLIM service with metadata of its current context. The SLIM service will then validate the metadata and respond with an authentication token. With this token the CI pipeline can fetch a service account key from the SLIM service for a limited time until the token becomes invalid.
This is basically it, making SLIM easy to use and reducing the overall attack surface against SLIM. This is especially important since SLIM needs to have access to all privileged GSA keys requiring high security considerations. But let’s dig deeper on how SLIM validates the context and decides whether a request is trustworthy or not. In figure 2 the architecture and the working principle of SLIM are depicted. Let’s say a CI script needs to access ‘target-project’ via a service account key. For this the runner pod gets its initial GSA using workload identity [10] (1–4). We call this identity the initial context identity. Then the CI script performs a sign JWT request against Cloud IAM [12] with its initial context identity and send this JWT over to the SLIM server with additional metadata from Gitlab CI [13] namely the newly introduced CI_JOB_JWT (5). We call this metadata the context identifier. The SLIM server will first validate the signature of JWT’s with the public key provided by Google [14] and Gitlab [15] (6). Then it adds a database semaphore with the context identifier so that they can’t be reused by any other request. After that it checks with the help of a context provider in our case the Gitlab API if the pipeline job is currently running in the specified repository (7–8). Then it will use a user identity provider, in our case Google Admin Directory, to check if the user who triggered the pipeline job still exists and retrieves the Google groups, the user is member of (9–10). With this information SLIM will use an access provider, in our case a simple Cloud SQL database table, to verify if an associated ACL exists (11). An ACL basically specifies if a resource, the service account key in this case, can be accessed from a user identity or group from a specific context. If this is the case, SLIM will finally use a secret provider, in our case Cloud Secret Manager [16], to retrieve the service account key for ‘target-project’. Before the key is forwarded to the client, an audit provider will be used to audit the whole process. Finally, the runner pod can use the retrieved GSA key to access the ‘target-project’ as usual.

Fig. 2: Workflow how a Gitlab runner pod gets a privileged GSA key via SLIM to access a GCP project.

This process may on first glance seem complex, but it basically boils down to two steps: the authentication and the authorization step. In particular, the latter step uses the same mechanics Workload Identity uses internally, that is to validate context metadata.

YASM? No, SLIM!

You may wonder if SLIM is yet another secret manager and why we do not use for example Hashicorps Vault project in the first place. Well, SLIM is actually not a secret manager but rather a secret manager proxy. It’s even possible to use Vault as a backend instead of Cloud Secret Manager of GCP. We rather like to compare SLIM with the Access Proxy in Google’s BeyondCorp architecture [17] because it grants access to a secret based on the identity and the context where the request came from and can thus be more dynamic, a property which typical secret managers usually don’t have.
We built SLIM extensible so that it can be suited for other use cases then only Gitlab CI. These could be accessing secrets from a GKE cluster, a Cloud Function or an App Engine App which simply would need another context provider implementation. How would this look like? For instance, we plan to use SLIM for Kubernetes pods to fetch their credentials to connect to a managed Cloud SQL database. In this dynamic environment where pods come and go, SLIM could use a context provider which uses the pod name, the namespace, cluster name and the GCP project id to check if the pod is actually running and could even decide to only hand out the database credentials on pod initialization and not when the pod is already running.
Internally, we use SLIM daily and for our most crucial CI pipelines where system critical GCP infrastructure is rolled out. We are happy with the system itself and are thinking how we could transfer the concept to more use cases or expand it to new areas. We would appreciate your comments, ideas or suggestions.

This article has been made together with Jan Hicken, Grzegorz Rygielski and Lukas Janssen from Otto Group data.works.

Here is an update on the topic:

0No comments yet.

Write a comment

Answer to: Reply directly to the topic

Written by

Dr. Mahmoud Reza Rahbar Azad

Senior Cloud Engineer

About the author

We want to improve out content with your feedback.

How interesting is this blogpost?

We have received your feedback.

Cookies erlauben?

OTTO und drei Partner brauchen deine Einwilligung (Klick auf "OK") bei einzelnen Datennutzungen, um Informationen auf einem Gerät zu speichern und/oder abzurufen (IP-Adresse, Nutzer-ID, Browser-Informationen).
Die Datennutzung erfolgt für personalisierte Anzeigen und Inhalte, Anzeigen- und Inhaltsmessungen sowie um Erkenntnisse über Zielgruppen und Produktentwicklungen zu gewinnen. Mehr Infos zur Einwilligung gibt’s jederzeit hier. Mit Klick auf den Link "Cookies ablehnen" kannst du deine Einwilligung jederzeit ablehnen.

Datennutzungen

OTTO arbeitet mit Partnern zusammen, die von deinem Endgerät abgerufene Daten (Trackingdaten) auch zu eigenen Zwecken (z.B. Profilbildungen) / zu Zwecken Dritter verarbeiten. Vor diesem Hintergrund erfordert nicht nur die Erhebung der Trackingdaten, sondern auch deren Weiterverarbeitung durch diese Anbieter einer Einwilligung. Die Trackingdaten werden erst dann erhoben, wenn du auf den in dem Banner auf otto.de wiedergebenden Button „OK” klickst. Bei den Partnern handelt es sich um die folgenden Unternehmen:
Google Inc., Meta Platforms Ireland Limited, elbwalker GmbH
Weitere Informationen zu den Datenverarbeitungen durch diese Partner findest du in der Datenschutzerklärung auf otto.de/jobs. Die Informationen sind außerdem über einen Link in dem Banner abrufbar.

Cookies ablehnen mehr Informationen

SLIM: Hydrating cloud native CI/CD pipelines to securely access GCP projects

What is the article about?

Accessing secrets from a Kubernetes pod

Keep calm and carry on storing secrets in environment variables

Cloud native secret management with SLIM

On a high level view the flow looks like this:

YASM? No, SLIM!

Here is an update on the topic:

0No comments yet.

Written by

Similar Articles

Confluent Helps Power a Diverse and Scalable Online Shopping Experience for OTTO

Machine Learning Ops: What It's Like in Practice.

Your profile -
Your advantages

A people company.

Driven by technology.

We want to improve out content with your feedback.

Cookies erlauben?

Datennutzungen