CI/CD Pipelines

Pipelines built with Github Actions, going all the way to the Deployment stage.

By convention pipelines are setup in .yml files that are comitted to the repository. They should be all under .github/workflows/.

Philosophy

The same philosophy we follow for the Frontend’s pipelines is also followed on the backend.

Whilst we already have put a set of automation steps focused on each workstation through Git Hooks, there are definitely many ways to circumvent those, and we shouldn’t have to rely entirely on each developer’s willingness to adhere to protocol but a centralized and (in theory) incorruptible source of thruth. Hence there are other mechanisms to keep tabs on code quality and correctness (according to a in-house set of rules).

Pull Requests
CI/CD Pipelines

For the backend we will focus on running all checking and building logic, with additional steps such as Building a Docker Image, Uploading it to Google Cloud Artifact Registry and Deploying that image in a Google Cloud Run instance. Specifically for how things integrate with the Google Side of things, everything that’s neccesary will be laid out at the Google Cloud Run and Docker image sections.

And in that endeavor of making the pipelines to be as fast as possible, we will leverage caching, unlike the Frontend pipelines, we are going to cache both Docker layers, alongside NuGet packages with the usage of a actions.cache pre-built action script. This approach is a bit more manual, yet it leverages the same idea of packages that were already downloaded and we shouldn’t have the need to download them again.

Due to the nature of the application, there are some security concerns tied with best practices that we should discuss in order to explain the decisions behind the pipelines design:

We should put behind variables and secrets references to versions or credentials, or things that in nature are dynamic and or sensitive.

Github has a feature to store sensitive data securely, you can configure secrets that way. That’s simply a key-value structure attached to a repo, an environment or a whole organization. In our case we will use a repository secrets store.

Secrets

All values stored behind secrets contain sensitive information and should not be exposed in the workflow file. Github goes out of its way to completely obfuscate these values on logs and any other place that could leak their actual values.

Variable Description

Variable	Description
`GCP_SA_KEY`	Contains the JSON key for your Google Cloud service account, which grants access to deploy services. Must always be kept secret.
`PROJECT_ID`	While not as sensitive, it is generally a good practice to store it in secrets if you want to avoid accidental exposure.
`ARTIFACT_REGISTRY`	If it’s specific to your cloud provider and shouldn’t be exposed in logs, store it in secrets. (This is the region plus the specific domain)
`REPOSITORY_NAME`	If you use private repositories in Artifact Registry, this should be secret.
`IMAGE_NAME`	If you use naming conventions that expose internal project details, store it in secrets.

GCP_SA_KEY

Contains the JSON key for your Google Cloud service account, which grants access to deploy services. Must always be kept secret.

PROJECT_ID

While not as sensitive, it is generally a good practice to store it in secrets if you want to avoid accidental exposure.

ARTIFACT_REGISTRY

If it’s specific to your cloud provider and shouldn’t be exposed in logs, store it in secrets. (This is the region plus the specific domain)

REPOSITORY_NAME

If you use private repositories in Artifact Registry, this should be secret.

IMAGE_NAME

If you use naming conventions that expose internal project details, store it in secrets.

NOTE: Once you add a secret to the vault, you will not be able to see its value ever again. So be mindful of that.

GCP

In order for the Github Action to be able to log into GCloud, push images and then deploy Cloud Run instances, we need to set a Service Account Key:

Reference

The easiest way to get this is by using the GCloud Console web app and downloading the service key as a .json file. This should never be uploaded anywhere, but the github secrets vault. We should paste the whole json file as the value of the key. (After the generation of one, it will be automatically downloaded, however, you can’t redownload it ever again. So be mindful of this condition).

Note: Be sure to be under the correct google account and the correct project when generating this service key.

Variables

Following the same idea im programming to avoid magic numbers or strings, we should try to use variables, these in nature should be safe to explore in a workflow.

Variable Description

Variable	Description
`DOTNET_VERSION`	Just specifies which .NET version to use; no security concerns.
`SERVICE_NAME`	The Cloud Run service name is not sensitive.
`REGION`	The region is not sensitive unless you’re trying to obscure deployment details.
`CACHE_VERSION`	Used for cache invalidation and not security-sensitive.

DOTNET_VERSION

Just specifies which .NET version to use; no security concerns.

SERVICE_NAME

The Cloud Run service name is not sensitive.

REGION

The region is not sensitive unless you’re trying to obscure deployment details.

CACHE_VERSION

Used for cache invalidation and not security-sensitive.

Line endings with dotnet format

When working cross-platform, a common concept we have to be aware of is that Windows has a different line ending than Unix-based OS'. Carriage Return + Line Feed \r\n vs Line Feed`\n`. It’s based on this concept that when running a pipeline with dotnet format a Linux distribution, will immediately highlight the difference and fail. In order to get around this, we have to make 2 changes:

We can leverage git in order to enforce that it handles line endings, so that the moment we commit to a repository, regardless of the OS, the \n (Line Feed) character is used.

For this we simply have to create a .gitattributes file at the root of the repo and add it with the following content:
```
# Set the default behavior for all files
* text=auto

# Force specific line endings for certain file types
*.cs text eol=lf
```
If we have already incorrect-format files, we have to manually leverage the IDE so that we save the file and switch the CRLF format to LF.

But lastly, we need to make sure our .editorconfig settings are correct, for more info you can head down to the Code Quality section.

PR Pipeline

This consists of one job, "Build and Test".

Build and Test

Steps:

Checkout code with pre-built script, and point to the second level solution folder.
Setup actions/setup-dotnet@v4 to use .NET 9
Cache NuGet (and possible node) packages from .NET.
- actions/cache@v4 is a built-in action using to cache files across workflow runs. (General approach not tech-stack specific).
- It specifices what to cache, .NET stores downloaded dependencies at ~/.nuget/packages.
- We then build a unique key to identify if we have a current project state that has the same packages. The different components to this key are:
  - runner.os ⇒ The OS running the workflow (ubuntu-latest, windows-latest, etc)
  - nuget ⇒ A fixed identifier to describe the cache type.
  - env.CACHE_VERSION ⇒ A variable that can be incremented manually to force cache invalidation
  - A hash of .csproj and packages.lock.json files. If any of these files are changed, the cache becomes invalid and dependencies are downloaded again.
- We also have a fallback mechanism if the exact key isn’t found that’s built only with a partial prefix: runner.os-nuget-env.CACHE_VERSION.
  - E.g: If the key ubuntu-latest-nuget-v1-abc123 isn’t found, GitHub will try a less specific key like ubuntu-latest-nuget-v1-.
- IMPORTANT: This is probably one of the most important steps when it comes to speeding up builds, we avoid re-downloading dependencies, reduce network usage, and workflow consistency is enforced since we ensure we use the same dependencies across runs. For a bit more of a breakdown of how this step is idempotent and useful you can check out the GitHub Action Caching section.
Restore dependencies
Check code format
Build the project
Run tests but on Release mode, this build is optimized so tests may run differently to a Debug build. Also, we are skipping building since that was already done in a previous step (faster times), and also in the end we want a bit more detail on tests if something wrongs pops up, hence we are stating for verbosity to be normal, since by default it’s only at minimal.
Check outdated packages
Run vulnerabilities check

Master Pipeline

This consists of three jobs, "Build and Test", "Build and push docker image" and "Deploy image to Cloud Run".

Build and Test

Steps:

Checkout code with pre-built script, and point to the second level solution folder.
Setup actions/setup-dotnet@v4 to use .NET 9
Cache NuGet (and possible node) packages from .NET.
- actions/cache@v4 is a built-in action using to cache files across workflow runs. (General approach not tech-stack specific).
- It specifices what to cache, .NET stores downloaded dependencies at ~/.nuget/packages.
- We then build a unique key to identify if we have a current project state that has the same packages. The different components to this key are:
  - runner.os ⇒ The OS running the workflow (ubuntu-latest, windows-latest, etc)
  - nuget ⇒ A fixed identifier to describe the cache type.
  - env.CACHE_VERSION ⇒ A variable that can be incremented manually to force cache invalidation
  - A hash of .csproj and packages.lock.json files. If any of these files are changed, the cache becomes invalid and dependencies are downloaded again.
- We also have a fallback mechanism if the exact key isn’t found that’s built only with a partial prefix: runner.os-nuget-env.CACHE_VERSION.
  - E.g: If the key ubuntu-latest-nuget-v1-abc123 isn’t found, GitHub will try a less specific key like ubuntu-latest-nuget-v1-.
- IMPORTANT: This is probably one of the most important steps when it comes to speeding up builds, we avoid re-downloading dependencies, reduce network usage, and workflow consistency is enforced since we ensure we use the same dependencies across runs. For a bit more of a breakdown of how this step is idempotent and useful you can check out the GitHub Action Caching section.
Restore dependencies
Check code format
Build the project
Run tests but on Release mode, this build is optimized so tests may run differently to a Debug build. Also, we are skipping building since that was already done in a previous step (faster times), and also in the end we want a bit more detail on tests if something wrongs pops up, hence we are stating for verbosity to be normal, since by default it’s only at minimal.
Check outdated packages
Run vulnerabilities check

Build and push docker image

Steps:

Checkout code with built-in action
Setup another specific built in action to build a docker image: docker/setup-buildx-action@v3
Following the same pattern for caching as in Build and Test, we set a specific cache key built from runner.os-buildx-env.CACHE_VERSION-github-sha
- One difference is that the action we will use for building and then pushing will save its docker layers at /tmp/.buildx-cache, hence we reference that specific path.
- This optimization is micro in nature, since it will only re-use layers that are specific for a commit, hence github.sha is there to make the caches unique to a specific HEAD commit.
- And the fallback with omit that HASH, in any case, we will try our best to restore cached layers from previous runs and when running the docker build and all it will pick up on the things it can reuse or the things it can’t. When the code changes (as new commits are pushed), the layers should be re-built in theory, hence we add the commit variable.
We then make usage of another built-in repice: docker/login-action@v3. This takes parameters such as the artifact registry (that’s the root, e.g: your-region-docker.pkg-dev), a username that if hard-set at _json_key it will expect to have the whole json string fed to it in the following filed under password.
We leverage another built-in recipe: docker/build-push-action@v4. This in essence builds a Dockerfile and it then pushes that to a configured registry.
- The context is given through a parameter, since we have to be at the solution folder level we adjust it to ./Kakeibro.API
- The action can be configured to not do a push by the end, but we want to so we set it to true.
- A good practice is to always push two tags, one referring to the new latest and then another under the specific commit_id. But there’s something to take into consideration here. We are not pushing two images. We are pushing one image, but that can be accessed by two different tags, a latest one and a commit-id one. In subsequent runs the latest tag will point to the newer image that should also have its new commit-id tag. Hence we are always pushing the latest tag to the latest and just using references to make integration seamless. But this, overtime, can generate junk for stale images we won’t use at all, for more info on this you can check out the Cleaning Image Registries section.
- Under cache-from and cache-to we can configure the action to try to restore a previous cache (and notice how it’s pointing to /tmp/.buildx-cache), and also save all layers and artifacts to the same location (so that our caching action can then save it for subsequent runs), and lastly with the mode parameter it will always try to store the maximum number of layers. This way Docker will only rebuild the layers that have changed, using the cached layers for everything else.
- Lastly, since our project is not compliance sensitive, provenance is not something we care about, and to avoid generating more artifacts that we have no use for (and that generate noise), we are turning it off

Deploy image to Cloud Run

Steps:

Checkout code with built-in script
We use another built-in action google-github-actions/auth@v2, that automatically receives a credentials_json and will authenticate with GCloud.
We make sure we have the GCloud SDK setup so that we can start consuming the respective APIs to spin up an image from the Registry that has the latest pushed image from the Build and push docker image job. Luckily google-github-actions/setup-gcloud@v2 takes care of abstracting all of that.
We run a deploy to our configured kakeibro-api service with the same recommendations under Google Cloud Run, but now in a CLI command version. We will be always pushing the latest tag, and on an already existing instance it will simply replace the image that its hosting with the newly built one. Every detail counts, the replacement will only take place if we are pointing to the same name and same configurations, if something like the region is different, it will create two services with the same name, but on different regions.
- This deploy also takes into account environment variables, we build at runtime an .env file, and we then add different environment variables to swap out behavior accordingly (this follows gcloud’s specific syntax that’s akin to KEY_1: "VALUE_1" KEY_2: "VALUE_2"), this then is fed to the cloud run deploy command so that it parses them into its runtime configuration. We also delete this temporary .env file once the deploy is done.

Cache Cleanup Pipeline

Following the same principle as in the frontend cache cleanup pipeline, we will apply a clean up on the backend, with its specific conditions taken into account.

NOTE: The explanation of the code in here is less verbose as in the aforementioned frontend cache cleanup link, for further breakdown of what each command does please refer to said page, this is advisable as the knowledge is built on top of said pipeline.

Meaning, that we have two types of cache, one for the docker builds (with a buildx tag. E.g., Linux-buildx-v…), and another for the NuGet packages. E.g., (Linux-nuget-v…). And so our script will have to discriminate between these two types and keep the latest caches for each respectively.

name: Cleanup Action Caches

on:
  schedule:
    - cron: "0 0 * * *"
  workflow_dispatch:

permissions:
  actions: write

jobs:
  clean-cache:
    runs-on: ubuntu-latest
    steps:
      - name: Get caches list
        id: list-caches
        run: |
          response=$(curl -s -H "Authorization: Bearer ${{ secrets.GITHUB_TOKEN }}" \
            -H "Accept: application/vnd.github+json" \
            "https://api.github.com/repos/${{ github.repository }}/actions/caches")
          echo "$response" | jq -r '.actions_caches | sort_by(.created_at) | reverse' > caches.json

      - name: Filter and delete old caches
        run: |
          jq -c '[.[] | select(.key | startswith("Linux-buildx"))]' caches.json > buildx_caches.json (1)
          jq -c '[.[] | select(.key | startswith("Linux-nuget"))]' caches.json > nuget_caches.json (2)

          latest_buildx_cache_id=$(jq -r '.[0].id' buildx_caches.json)
          latest_nuget_cache_id=$(jq -r '.[0].id' nuget_caches.json) (3)

          jq -c '.[1:] | .[]' buildx_caches.json | while read -r cache; do (4)
            cache_id=$(echo $cache | jq -r '.id')
            curl -X DELETE -s -H "Authorization: Bearer ${{ secrets.GITHUB_TOKEN }}" \
              -H "Accept: application/vnd.github+json" \
              "https://api.github.com/repos/${{ github.repository }}/actions/caches/$cache_id"
          done

          jq -c '.[1:] | .[]' nuget_caches.json | while read -r cache; do (5)
            cache_id=$(echo $cache | jq -r '.id')
            curl -X DELETE -s -H "Authorization: Bearer ${{ secrets.GITHUB_TOKEN }}" \
              -H "Accept: application/vnd.github+json" \
              "https://api.github.com/repos/${{ github.repository }}/actions/caches/$cache_id"
          done
        if: success()

      - name: Cleanup files
        run: |
          rm -f buildx_caches.json
          rm -f nuget_caches.json

1	Since we have 2 types of cache now that are important, we can’t just filter the latest and delete the rest, that would effectively make it so that our pipelines always have to rebuild the cache. And so we apply clever filtering techniques, based on an already existing raw list of cache entries that are ordered in descending order (most to least recent). We will then iterate over each element, select its `key` field and then filter out if the value of said key starts with `Linux-buildx`. This will effectively spit out all the docker caches (already ordered). This will be then saved on a temporal `buildx.caches.json` file.
2	Just like in <1> we are iterating over the main list with all caches and then just filtering the ones that start with `Linux-nuget`. A key part of the command here is at the square brackets that sorrounds the whole result of `jq` *[ ]*. We need to wrap the whole result in square brackets so that the result is also a JSON Array, if this is not set in place we will get a structure that’s not a JSON array and we won’t be able to iterate over it.
3	Just as a good practice we are extracting the latest IDs for both cache types
4	And so, we now slice the specific `buildx` json file with its cache entries, we then iterate over those and delete all the old caches.
5	Same as in <4> we will slice the most recent cache in the `nuget` json file, and all the remaining cache entries will then be deleted by using the Github API.

Docker image

Since .NET 8 a big emphasis was put into working with Docker out the box and with best practices in mind. Hence, there’s already a Dockerfile file already present in the repo from scaffolding directly, and it makes use of an app or APP_UID user to run the app and not root, and standard HTTP and HTTPS ports (8080, 8081 respectively).

The command to build the API project’s Docker image is:

docker build -t kakeibro-api -f .\src\KakeiBro.API\Dockerfile .

By convention all scaffolded Dockerfile files are expecting to be run at the solution folder level, hence we have to be standing at Kakibro.API and from there run the command.

And in order to run a container manually you can do something like this:

docker run --rm -p 5214:8080 kakeibro-api

Comment: When working with Visual Studio, you might also see a Microsoft.VisualStudio.Azure.Containers.Tools.Targets package installed, this is leveraged in order to generate Dockerfile files that are context aware, meaning that it will have all instructions referencing to the current state of the application, in case we use Visual Studio this could be useful, but for other IDE’s this would be redundant.

After the app user convention it was later published a new convention, hence we don’t have to use USER app but USER $APP_UID. Reference.

It’s also worth noting that with Centralized Package Management. Dockerfile files have to have some additions to the default structure:

# This stage is used to build the service project
FROM mcr.microsoft.com/dotnet/sdk:9.0 AS build
ARG BUILD_CONFIGURATION=Release
WORKDIR /src
COPY ["Directory.Packages.props", "."]
COPY ["Directory.Build.props", "."]
COPY ["src/KakeiBro.API/KakeiBro.API.csproj", "src/KakeiBro.API/"]
RUN dotnet restore "./src/KakeiBro.API/KakeiBro.API.csproj"
COPY . .
WORKDIR "/src/src/KakeiBro.API"
RUN dotnet build "./KakeiBro.API.csproj" -c $BUILD_CONFIGURATION -o /app/build

It is at the restore stage specifically, that if we don’t have the Directory.* files copied at the same level, we won’t be able to resolve the NuGet dependencies and other settings that should be tied to the specific project we are containerizing. Hence we have to copy those files and then start the build process.

Google Cloud Run

HINT: Reference.

Leveraging the same GCloud Console Project, we will make use of the Artifact Registry and the Cloud Run services in order to have docker images hosted plus spinning them up on demand (cold starts will be assumed).

After creating an Artifact Registry repository, (in our case it is called kakeibro-api), we can then copy its URL (e.g: us-east1-docker.pkg.dev/kakeibro/kakeibro-api/<image-name>), and with that you can tag an existing image you have on your local machine docker tag kakeibro-api <URL>. And after that you can push the image by doing a docker push <URL>. All images that are under that repository will be pushed.

It is after an image is present in a registry that we can switch to Cloud Run and then configure a service to spin up an instance of the service under that image.

Don’t forget to make the service to be unauthenticated (unless you want to leverage Google Auth as an intermediate layer), but either way, if you turn this off any public IP will be able to talk to the instance.

We are setting it up with 1 GB ram, and for it to not have a minimum number of instances, and that is due to the fact we will incurr in billing costs if we do allocate it.

Once we have tied the service to our docker image and start the service, after GCloud has allocated the resources and everything behind the scenes, we should be able to hit the endpoint in the cloud and we should be getting back something. We can also configure custom domains, and in our case we will leverage the dsbalderrama.top domain that has been purchased, to map a web url to it, but specific to our KakeiBro domain.

This involves going into the domain provider’s website, and adding CNAME records that GCloud Console provides to us in a step-by-step window. It will take some time for provisioning though, so just wait until GCloud has synchronized itself with the DNS replication.

When trying to construct the URL to push the image, at the beginning you will have an empty repository folder, you need to then add a last segment for the name of the file that will be the image. E.g: <region-server>/kakeibro/<kakeibro-api>. The last kakeibro-api is the name of the actual image file. If you don’t add the last segment you will get an error.

Now pushing will not work unless you are logged into GCloud, you need the gcloud CLI installed for that. With it, you should log into your account that has the project that will host the docker image alongside the cloud run instance.

gcloud auth login
gcloud config set project PROJECT_ID

Once you have been logged in, and you are at the specific GCloud project (the ID can be retrieved from the list of project’s at the home page of GCloud Console). You should be able to push the docker image normally.

Lastly, the endpoint URL for the service is https://kakeibro-api.dsbalderrama.top

A particularity to the pre-built action on Build and push docker image, is that in Google Artifact Registry it might generate more artifacts without tags, (Reference), and so, an active decision that was taken was to use the latest version: docker/build-push-action@v6, plus setting provenance: false, for more info on provenance you can head into its own Build Push Action Provenance section

Build Push Action Provenance

Provenance metadata includes details about:

The build environment (e.g, Git commit SHA, timestamp, build parameters)
The source used for building the image (e.g., repository URL)
The toolchain used (e.g., Docker BuildKit version)

This metadata helps with supply chain security by making builds traceable and verifiable, which is useful for SBOM (Software Bill of Materials) and SLSA (Supply-chain Levels for Software Artifacts) compliance.

Turning it off does the following:

It disables the inclusion of build provenance metadata in the final image.
This results in a smaller image size.
It may also improve build performance slightly.
However it removes traceability, making it harder to verify the image source in security audits.

When to use it?

You don’t need supply chain security features
You want a smaller, simpler image.
You’re working in a private, trusted environment.

When should you let provenance?

You need to track the origin of builds (e.g., for compliance reasons)
You’re in a security-sensitive environment (e.g., public images, regulated industries)

GitHub Action Caching

This is a quick breakdown of how the actions/cache@v4 works:

Condition	Behavior
Cache exists and matches the key	Dependencies are restored from cache.
Cache exists but doesn’t fully match	GitHub restores from the closest matching restore-keys prefix.
Cache doesn’t exist	Dependencies are downloaded, then saved to cache for future runs.

Condition

Behavior

Cache exists and matches the key

Dependencies are restored from cache.

Cache exists but doesn’t fully match

GitHub restores from the closest matching restore-keys prefix.

Cache doesn’t exist

Dependencies are downloaded, then saved to cache for future runs.

In short, we try to get from an external nebulous site our cached dependencies and they get restored (copied) into the current action run (at least they try their best to).

When heading down to Actions > Caches we will see a list of the caches that were created through GitHub Actions. So this isn’t as automated as you might think, these caches will start piling up as the project starts to grow, so a good way to housekeep is to have scripts, or some other job in charge of getting rid of stale caches that have no usage.

Cleaning Image Registries

A good practice is to periodically clean up stale image tags, especially in a CI/CD pipeline that generates a lot of tags over time (like commit-specific ones). Unmanaged or redundant image tags can lead to unnecesary storage costs and clutter your registry.

Retention Policy

The latest tag should always be updated, and we should only have one. Commit hash tags are useful for referencing specific versions, but since they can accumulate over time, we can remove old commit tags, and lastly, we can add versioning to our tags, however, we should also remove them when no longer used.
Automating Cleanup

We can leverage registry features, such as Google Artifact Registry, Docker Hub, etc, if they provide features to manage and delete unused or old tags automatically. The policies can range from old tags after a specific period or when the number of tags exceeds a certain threshold. The second options would be a custom script, this is what we will do, we can run checks on Image age, number of tags, or tags associated with unmerged or stale branches.
Best Practices for Cleanup

We should keep the latest and stable versions, so that you have the latest and also some image you can roll-back to. Automate cleanup based on usage, only keep images that are actively used in deployments or are tagged as critical releases.

CRON GitHub Action

First of all, we will assume we have a scripts folder at .github so that we put in scripts that actions might want to run:

.github/
  workflows/
    cleanup-image-registry.yml
  scripts/
    image-cleanup.sh

This is the action itself:

name: CRON Job to Housekeep Google Artifact Registry

on:
  workflow_dispatch: (1)
  schedule:
    - cron: '0 0 * * *'  # Runs daily at midnight UTC (2)

jobs:
  connect-setup-and-run-script:
    name: Connect and Setup Google Cloud
    runs-on: ubuntu-latest

    steps:
      # Checkout the repository
      - name: Checkout code
        uses: actions/checkout@v3

      # Set up Google Cloud authentication
      - name: Authenticate to Google Cloud
        uses: google-github-actions/auth@v2
        with:
          credentials_json: ${{ secrets.GCP_SA_KEY }} # JSON key for GCP service account

      # Set up Google Cloud SDK
      - name: Set up Google Cloud SDK
        uses: google-github-actions/setup-gcloud@v2

      # Run the script from the .github/scripts folder
      - name: Run custom script
        env: (3)
          SERVICE_NAME: 'kakeibro-api' # Cloud Run service name
          REGION: 'us-central1' # Cloud Run region (e.g., us-central1)
          PROJECT_ID: ${{ secrets.PROJECT_ID }}
          REPOSITORY_NAME: ${{ secrets.REPOSITORY_NAME }}
          IMAGE_NAME: ${{ secrets.IMAGE_NAME }}

        run: |
          chmod +x .github/scripts/image-cleanup.sh (4)
          .github/scripts/image-cleanup.sh (5)

1	If `workflow_dispatch` is stated under the trigger conditions of the action, it will allow for us to head to the Actios tab at the repo level and click a button to trigger it manually.
2	GitHub Actions can be configured to run on a CRON manner, great for periodic jobs that should run even when we are not there. In our case we will always clean up the gcloud artifact registry at midnight, and we will only keep the last 2 images alive. Remember that we have 2 tags that are the same image, meaning that `latest` will take one space, and the following two will be the latest image’s commit tag and the previous commit’s tag. So we are keeping in essence a latest and one previous version backup. However, since at a physical level, we are not generating 2 images, if we query through `gcloud artifacts docker images list` we will only be seeing one artifact (but that might have two tags attached to it).
3	Just we keep security in mind and good practices in here as well, we can declare env variables at a specific step of a job, and so that Github obfuscates values that might come from the secrets vault, we have to assign them like this.

And the specific script is:

#!/bin/bash

# Variables
SERVICE_NAME=$SERVICE_NAME (1)
REGION=$REGION
PROJECT_ID=$PROJECT_ID
REPOSITORY_NAME=$REPOSITORY_NAME
IMAGE_NAME=$IMAGE_NAME
ARTIFACT_REGISTRY=$ARTIFACT_REGISTRY

# Authenticate to Google Cloud (already done in the workflow, but included for completeness)
echo "Authenticating to Google Cloud..."
gcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIALS (2)

# Get the list of tags for the specified package, sorted by creation time in descending order
echo "Fetching tags for package..."
FULL_NAME="$ARTIFACT_REGISTRY/$PROJECT_ID/$REPOSITORY_NAME/$IMAGE_NAME" (3)
IMAGES=$(gcloud artifacts docker images list $FULL_NAME \ (4)
  --project=$PROJECT_ID \
  --format="value(DIGEST)" \ (5)
  --sort-by="~CREATE_TIME") (6)

# Convert the list of digests into an array
IMAGE_ARRAY=($IMAGES)

# Check if there are more than 2 images
if [ ${#IMAGE_ARRAY[@]} -gt 2 ]; then
  echo "Found ${#IMAGE_ARRAY[@]} images. Keeping the 2 most recent ones and deleting the rest..."

  # Loop through the tags, skipping the first 2 (most recent)
  for ((i=2; i<${#IMAGE_ARRAY[@]}; i++)); do
    DIGEST=${IMAGE_ARRAY[$i]}
    echo "Deleting image: $DIGEST"
    gcloud artifacts docker images delete "$FULL_NAME@$DIGEST" \ (7)
      --project=$PROJECT_ID \
      --quiet \
      --delete-tags
  done
else
  echo "Only ${#IMAGE_ARRAY[@]} tags found. No cleanup needed."
fi

echo "Cleanup complete!"

1	Just so that the script gets access to the upstream env variables we need to make this assignment (for all variables).
2	Luckily, we get access to the env variable with our google credentials, this should be populated already due to the previous steps that ran before this script. (In case this needs to be leveraged elsewhere, be sure to be logged into GCloud before trying to run this script)
3	We build the full path to the image at the repository level (notice how this is the same path that we use when pushing an image on the `master-push` action).
4	The same repository name that we have under secrets, this is the root that we created at the Artifact Registry level, from there on we have to build the path all the way to the specific image that will be used by Cloud Run to spin up an instance.
5	Part of glcloud’s docs references the idea that we can both just get specific fields of the whole payload, in this case we just want the digest to point specifically to it.
6	Part of conventions by gcloud allows for us to feed a sorting field reference and with the tilde `~` we can invert the default ascending so that we start from the 2nd newest onwards. Also, the `gcloud` command outputs two lines before the digests, or well it seems like it, but we don’t have to worry about that, if we try to pipe anything to the output it will only output the digests separated by new lines.
7	And so we will get the digest’s for each docker image that’s hosted, and we will only keep the last 2 images (the last one that was pushed and that should be on our Cloud Run instance and a previous version as a backup). Any other image will be deleted, and no trace should be left behind. Note: One small thing to take into account is that layers might be present at the registy level, and other things specific to the inner-workings, so you might see listed way more artifacts than the titular 2 we decided to keep. But don’t try to delete them manually, there are alerts for some of them also in place, but if we delete things without really knowing what they are, we might break other services. However, it’s worth noting that provenance is part of what generates untagged extra artifacts.

Since pipelines (at least most of the time should) run on a Linux build machine, we will have to leverage bash scripts and linux specific commands when automating our process.