CI/CD Pipelines

First of all, the pipelines will be built using Github Actions, and they will be all the way to deployment (Continous Integration/Continuous Deployment). With that in mind we will break down the phylosophy of how they are setup and key points to take into consideration.

By convention pipelines are setup in .yml files that are comitted to the repository. They should be all under .github/workflows/.

Philosophy

A robust CI/CD pipeline for any modern application is crucial for ensuring smooth development, testing, and deployment processes. This specific project takes both personal experience, but also nuggets of wisdom coming from across experts in the field, a source for the ideas applied is: Hands-On Continuous Integration and Delivery.

In short, manual processes, and desorganized processes should be mitigated as much as possible, and it’s through communication with different areas of the business that we can find areas of improvement. Automation should be a key word when it comes to relegating processes that are grunt-work in nature and that only take away from more important tasks for a developer. Besides that, we should have a clear flow of work in order for the software we build to be fast on delivery and health checking. If someone coded something that breaks the system, we should pick up on that as quickly as possible.

Whilst we already have put a set of automation steps focused on each workstation through Husky, there are definitely many ways to circumvent those, and we shouldn’t have to rely entirely on each developer’s willingness to adhere to protocol but a centralized and (in theory) incorruptible source of thruth. Hence there are other mechanisms to keep tabs on code quality and correctness (according to a in-house set of rules).

  • Pull Requests

  • CI/CD Pipelines

Due to the nature of this project, pull requests will definitely won’t be leveraged as much, unless the need or opportunity for them arises, however they should be still be in place so that a whole team can have a look at code changes that are about to be integrated into the codebase and give feedback, and also pick up on possible issues or opportunities for improvement.

But, Pipelines, on the other hand will be leveraged in the form of GitHub Actions, there will be one pipeline for PRs and another pipeline for CI in main. (It’s here that you can see that PRs and Pipelines should also come in hand and have arguments when it comes to trimming down checks for one use case and then adding others for another).

Even if a developer has already go through the layers we have setup in the form of hooks, we will have pipelines with some of the same steps plus others specific to it so that the process is faster when running on the Remote Repository platform (GitHub).

The idea behind Continous Deployment is Agile in nature. Always be building something, testing, and publishing it. It will depend highly on the team and product if you want to get up until that point, since there might be justified use cases in which you only want to go all the way to Continous Delivery. And by some manual intervention, the deployment has to be done by someone assigned to said task.

In our case, we are going all the way to deployment, but be aware that this is extremely taxing in nature, since we are running on a build machine. So be sure to optimize when pipelines should run and not incurr in them becoming a hassle and source of frustation, when they should be there to make things easier.

And in that endeavor of making the pipelines to be as fast as possible, we will leverage caching, and specifically caching with pnpm. PNPM is already a tool aimed at reusing node modules if we have them across multiple projects, and it also has optimized performance built-into it. With that plus a correct usage of module caching we can aim at having cold starts for our frontend builds, but later builds to be much faster since they will leverage all previously installed node modules, instead of having to always install them from scratch.

PR Pipeline

This consists in one job called "Lint, Build, Check". It will only run on PRs pointed at master.

CI

Steps:

  • Run on a base Ubuntu image, and from the root of the repo, head down to the ./kakeibro-web folder.

  • Establish a strategy: Run in parallel two instances of the pipeline (matrix), one with Node 20 and another one with Node 22

    • A good practice is to test out the build process in different node versions. Since we are following best practices, we should be testing on the Active LTS and Active Maintenance versions of node.

  • We checkout the code with a pre-defined actions/checkout@v4 action

    • Note: At the time of writing the pre-defined actions are at v4 this could end up getting updated later, and we should be mindful of this in case there are actions that are still on other versions such as v3.

    • Note 2: We have to use these pre-defined sub-routines since github encourages it and are meant to abstract really general use cases such as cloning the repo first, (the build machine will always start with an empty state), they get the added bonus of having other syntax that can build on top of the base functionality without us having to code it from scratch (e.g: Fetch specific commits, optimize performance, attaching submodules)

  • We install pnpm with pnpm/action-setup@v4. Just like the previous step it will work with a pre-defined script aimed at abstracting logic and making the file look cleaner

  • We install node with actions/setup-node@v4, in this specific instance we want to tie it to pnpm, and due to the matrix we setup at the beginning we have to now query the matrix.node-version variable in order to accomodate node with the respective node version, besides that we can setup caching by simply pointing at it to use a strategy that caches with pnpm. We also have to point specifically to the pnpm-lock.yaml file so that the pipeline and the build machine can read it and check their own cache of packages and if all the hashes match it will then use the cache instead of trying to install everything again from scratch

    • IMPORTANT: It is for these types of use cases that comitting the -lock file can be extremely helpful.

  • We install all the dependencies with pnpm and with the --frozen-lockfile flag so that it doesn’t try to overwrite something on the lockfile.

    • Note: pnpm i --frozen-lockfile is recommended to be used on CI/CD pipelines, to ensure consistency. The command install dependencies without modifying the pnpm-lock.yaml file.

  • We then run ESLint by leveraging the scripts that are also leveraged by Husky.

  • We then run a build to check that all the code is okay and nothing broke.

  • We then leverage a script called debug-check.js that’s under a scripts folder at the web app folder. This is used by both Husky and now the CI/CD pipeline in order to check for console.log or debugger statements in the code and failing if it does find them.

  • We then run a check for outdated packages, but we fail silently, still the table of all dependencies that are outdated will show in the pipeline summary.

  • We will then run a script that checks for package versions that have vulnerabilities in them that were found.

Master Pipeline

This consists of two jobs, "Lint, Build, Check", "Deploy to Firebase"

CI

Steps:

  • Run on a base Ubuntu image, and from the root of the repo, head down to the ./kakeibro-web folder.

  • Establish a strategy: Run in parallel two instances of the pipeline (matrix), one with Node 20 and another one with Node 22

    • A good practice is to test out the build process in different node versions. Since we are following best practices, we should be testing on the Active LTS and Active Maintenance versions of node.

  • We checkout the code with a pre-defined actions/checkout@v4 action

    • Note: At the time of writing the pre-defined actions are at v4 this could end up getting updated later, and we should be mindful of this in case there are actions that are still on other versions such as v3.

    • Note 2: We have to use these pre-defined sub-routines since github encourages it and are meant to abstract really general use cases such as cloning the repo first, (the build machine will always start with an empty state), they get the added bonus of having other syntax that can build on top of the base functionality without us having to code it from scratch (e.g: Fetch specific commits, optimize performance, attaching submodules)

  • We install pnpm with pnpm/action-setup@v4. Just like the previous step it will work with a pre-defined script aimed at abstracting logic and making the file look cleaner

  • We install node with actions/setup-node@v4, in this specific instance we want to tie it to pnpm, and due to the matrix we setup at the beginning we have to now query the matrix.node-version variable in order to accomodate node with the respective node version, besides that we can setup caching by simply pointing at it to use a strategy that caches with pnpm. We also have to point specifically to the pnpm-lock.yaml file so that the pipeline and the build machine can read it and check their own cache of packages and if all the hashes match it will then use the cache instead of trying to install everything again from scratch

    • IMPORTANT: It is for these types of use cases that comitting the -lock file can be extremely helpful.

  • We install all the dependencies with pnpm and with the --frozen-lockfile flag so that it doesn’t try to overwrite something on the lockfile.

    • Note: pnpm i --frozen-lockfile is recommended to be used on CI/CD pipelines, to ensure consistency. The command install dependencies without modifying the pnpm-lock.yaml file.

  • We then run ESLint by leveraging the scripts that are also leveraged by Husky.

  • We then run a build to check that all the code is okay and nothing broke.

  • We then leverage a script called debug-check.js that’s under a scripts folder at the web app folder. This is used by both Husky and now the CI/CD pipeline in order to check for console.log or debugger statements in the code and failing if it does find them.

  • We then run a check for outdated packages, but we fail silently, still the table of all dependencies that are outdated will show in the pipeline summary.

  • We will then run a script that checks for package versions that have vulnerabilities in them that were found.

Deploy

Steps:

  • This will depend on the previous CI step, if it doesn’t fail then deploy will run normally

  • Run checkout recipe

  • Run pnpm recipe

  • Run node repice, with a specific node version, we are aiming at staying up to date and with the most modern yet in Active LTS or Maintenance LTS from Node, and so we setup node with Version 22. Same setup to take from cache for modules if we have them there.

  • Install dependencies with frozen-lockfile

  • Retrieve service account details from the repository’s secrets vault and save it under a temporary .json file so that the firebase CLI can pick up on it and use it for authorization/

  • Run the deploy script. Since the firebase CLI is installed through the package.json we should have it present, however due to the nature of this being a build machine we need the service account .json credentials in a file and its path referenced through an env variable (GOOGLE_APPLICATION_CREDENTIALS)

    • NOTE: We should have configured the token on GitHub. Otherwise the flow will fail since Firebase won’t authorize the machine to do the deploy.

Cleanup Pipeline

As mentioned in the PR pipeline and the Master pipeline, we leverage caching in order to make builds faster, but that doesn’t come "for free", we need to be well aware that this has to live somewhere, and whether we use pnpm or the action for general purpose caching. The moment the cache varies, may it be due to new dependencies detected, or other external factors, we will have to generate new caches. And they will start adding up, if we have no use for old caches we should clean them up. And automation of this cleanup is key.

It’s with this in mind that all repositories have a cache cleanup CRON Action, the one specific for the web app is this:

name: Cleanup Action Caches

on:
  schedule:
    - cron: "0 0 * * *" (1)
  workflow_dispatch: (2)

permissions:
  actions: write (3)

jobs:
  clean-cache:
    runs-on: ubuntu-latest
    steps:
      - name: Get caches list
        id: list-caches
        run: |
          response=$(curl -s -H "Authorization: Bearer ${{ secrets.GITHUB_TOKEN }}" \ (4)
            -H "Accept: application/vnd.github+json" \ (5)
            "https://api.github.com/repos/${{ github.repository }}/actions/caches") (6)
          echo "$response" | jq -r '.actions_caches | sort_by(.created_at) | reverse' > caches.json (7)

      - name: Delete old caches
        run: |
          latest_cache_id=$(jq -r '.[0].id' caches.json) (8)
          jq -c '.[1:] | .[]' caches.json | while read -r cache; do (9)
            cache_id=$(echo $cache | jq -r '.id') (10)
            curl -X DELETE -s -H "Authorization: Bearer ${{ secrets.GITHUB_TOKEN }}" \
              -H "Accept: application/vnd.github+json" \
              "https://api.github.com/repos/${{ github.repository }}/actions/caches/$cache_id" (11)
          done
        if: success() (12)
1 This Action will be configured as a CRON that runs at 00:00 UTC every day.
2 In case we need to manually trigger the Action it has the respective flag so that the option is enabled
3 Every action has its own token injected into its runtime, however it’s by default read-only. And to delete caches we need for that to also inherit write permissions.
4 As mentioned in <3>, the Action will have injected a token for it, it will be saved under secrets.GITHUB_TOKEN, we will leverage the GitHub API in order to retrieve our cache list and clean it up, we have to send an Auth token so that the API grants us access and responds correctly
5 application/vnd.github+json is the media-type from which github responds with, so we are configuring it to accept the response with that.
6 Another part of already built-in utilities is the fact that under the github namespace, we can retrieve variables such as .repository. The value of said variable will be a unique ID, that in the realm of the GitHub API references our current repository at which the Action is running.
7 Whenever we have saved under a variable something, a good practice is to always reference it with quotation marks to avoid issues with spaces or special characters, it’s the result of printing the contents of the variable (that has the response from the GitHub API) that we pipe into jq a utility to work with JSON strings, another good practice with it is to use the -r flag so that it won’t interpret backslashes as escape characters (so that we don’t manipulate the response). With jq we can pipe different instructions that help us navigate a json structure: .actions_caches | sort_by(.created_at) | reverse firstly points to the .actions_caches property of the JSON object, (we know it’s an array), and so we tell the tool to try to sort what’s unde the key by a created_at field (if the structure is not an array this would fail), and lastly after the items have been sorted, we will reverse that result, since we want for the most recent entry at the beginning (the top). In the end the final form of what we transformed is saved under a caches.json file.
8 We have no technical use for this line but just so that the pipeline breaks if it doesn’t find a valid entry and so that we can see the value on GitHub, we will retrieve the latest cache ID under a variable
9 The source is a JSON array, and so in order to iterate over it and do something with the data, we have to apply jq -c '.[1:] | .[]' caches.json, which leverages jq again in order to read the structure and with -c it will make sure that each JSON object will be in one separate line, and with .[1:] we try to slice the array to exclude the first element (the latest cache entry), it’s then the result of that, that we iterate over and output it as a separate JSON object with .[]. To that whole result that jq yields we then apply Bash shell constructs, we are applying a a loop because read is a command that reads a single line from the input, applied like this we basically are configuring the read to be reading line-by-line and saving those contents under a cache variable, the moment there are no more lines to read then the loop will end.
10 We then use jq again to extract the specific id field of the sole JSON object that we should have at cache, and we save that id value under a cache_id variable.
11 Just as the previous step we will then hit a GitHub API endpoint in charge of deleting a cache entry, we send the Action’s token plus the ID for the cache that we extracted.
12 Part of GitHub Actions, use this if you want to enforce a condition on a step or job, so that if the previous step/job was successful then this step can run, otherwise it won’t. A good way to enforce short-circuits, still brittle and you have to add it to each step or job that comes after the declaration of the pivot element. (Same idea as our conditional execution for deployment)

Firebase

Firebase has a service called Hosting. Following its documentation we can integrate firebase into our web app repository.

In the Firebase Console it’s best to create a project. We have created one for the whole kakeibro app.

A good practice so that we adhere to DevOps practices, is to have the firebase CLI in the form of a node package, meaning it should be declared on our packages.json but as a devDependency, since this package has nothing to do with the user experience, it’s for deployment purposes. This also integrates better with CI/CD pipelines due to the fact that we can cache later that dependency. Instead of wanting to run it with a global tool that doesn’t cache unless we do more work.

  • Head into the web app folder

  • (With the Firebase CLI intalled) run firebase init

  • Type Y

  • Select Hosting

  • Use an existing project

  • Select the project that you have in Firebase

  • Since vite outputs a bundle to a ./dist folder, we will put that as the public directory.

  • Accept Configure as SPA

  • Do not setup GitHub Actions (we will do that on our own)

  • Do not overwrite the index.html (we should have ran a build already)

  • Right after this whole setup we should have a .firebaserc and firebase.json files in our repo. We should commit those, they are key instruments when running a firebase deploy

  • Add a script at package.json so that it automates things (you can name it deploy). It should build the whole project and then run the firebase CLI tool.

Due to the nature of Firebase and just to test out the script ourselves, it would be best to run locally the pnpm deploy command. To see if we are able to deploy our app to Firebase. This can definitely be automated with IaC, such as Terraform, however for our purposes, it will be a one-time manual configuration so that we make sure we deploy the app correctly and then configure a custom domain. Wait for SSL providing and everything, and then relying on the action to deploy it after that. (We shouldn’t have to run it manually anymore). More info at parameters for scripts.

After we have wired up the firebase domain through a CNAME record for our custom domain, (don’t forget to add 2 records one for the www subdomain and another one without it). We should be able to hit both kakeibro.dsbalderrama.top and www.kakeibro.dsbalderrama.top and get our app right there running.

NOTE: Don’t forget to add the .firebase/ folder to .gitignore.

DISCLAIMER: The next paragraph’s approach has been deprecated, and might break in the future. The project uses a service account, however for the purposes of having the docs with more information these pieces of information are kept.

In order to generate a token so that our action can then run a deploy. We can run firebase login:ci. This will open up a web browser with Google OAuth so that we log in with the account that has access to the Firebase project, we have to consent and everything. After the flow is done, we will get the token printed in the console screen. SAVE THIS AS A GITHUB SECRET AND NEVER TOUCH IT AGAIN. We leverage it on the github action to log into firebase with the token so that we can then run the deploy.

The whole firebase login:ci command generates a token specifically to be run on a pipeline and it is by design enforced to be passed only at the firebase deploy --only hosting --token $FIREBASE_TOKEN level. We can’t log into Firebase with this token, and it’s in this specific endeavor that the deploy script has to be modified so that we can pass to the firebase command all of the different parameters tsc -b && vite build && firebase deploy --only hosting. By adding the last parameter we will be setting by default for the command to always run the deployment only for hosting and we can then attach to it extra arguments. And so, when we run pnpm run deploy --token ${{ secrets.FIREBASE_TOKEN }} we will have all of those parameters after deploy attached to the firebase deploy --only hosting …​ section of the deploy script.

Small note: A good practice, to avoid issues, would be the usage of --only hosting, so that the whole project only tries to deploy to the SPA section of the whole Firebase project. There might be times at which running that without the flag will attempt to deploy to other services and end up breaking something.

About www domain vs clean name

Firebase will need to have a second domain added to it so that it redirects the www subdomain to our non-www subdomain. (Github Pages sets that up automatically).

A good recommendation is at the DNS level to setup the second CNAME record to point to the non-www CNAME.

The DNS configuration side isn’t enough though, the server needs to take care of redirecting things, the HTTP redirection if you will. Hence we will have to make these two configurations ourselves. One on Firebase (the server), and another on our DNS provider.

Danger of deprecation of --token

When running the previous --token parameter approach we will get a warning stating:

Authenticating with `--token` is deprecated and will be removed in a future major version of `firebase-tools`. Instead, use a service account key with `GOOGLE_APPLICATION_CREDENTIALS`: https://cloud.google.com/docs/authentication/getting-started

This indicates that we should move away from using this CI token and instead use the service account keys approach. We should be managing different keys for different services, since using a master .json credentials key is a one-point-of-failure. If it gets compromised, ALL OF YOUR SERVICES ARE COMPROMISED. We have to download it as a .json file, and then upload it as a secret to the repository (the text).

Firebase projects end up as different GCloud projects, the best way to manage their credentials is to head to the Project Settings and under the Service Accounts tab we can then click on the All service accounts link. It will take us to a GCloud instance in where we might see an already configured Service Account for us. We can then head down to Service Accounts in the GCloud pannel and generate a key for that service account that has access to our firebase project in charge of the website.

It’s with that in mind that the command can get rid of the --token parameter, and we can leverage env variables and the secret text like this:

- name: Setup service account credentials
  run: echo '${{ secrets.GCLOUD_SERVICE_ACCOUNT }}' > $HOME/gcloud-service-key.json (1)

- name: Set GOOGLE_APPLICATION_CREDENTIALS
  run: echo "GOOGLE_APPLICATION_CREDENTIALS=$HOME/gcloud-service-key.json" >> $GITHUB_ENV (2)

- name: Build and Deploy to Firebase
  env:
  run: pnpm run deploy

- name: Cleanup service account file
  run: rm -f $HOME/gcloud-service-key.json (3)
1 The main thing is that the env variable is expecting a .json file path, not the actual json string. Hence we have to copy the contents for a brief second to a file.
2 It’s then that we save as an env variable (so that it persists for the next steps), the path to the file we saved with the service account json string.
3 Lastly, as a security measure, we will delete the temporary file with our credentials as a form of cleanup.

References:

You can edit the levels of access for a service account by heading to IAM selecting the service account you want to modify, and with the pencil icon. You get a menu to modify the levels of access of the account.

Possible audit blockers

When running the pnpm audit step and we do find a vulnerability for a package, then that immediately returns a 1. And a pipeline will fail, whilst this is great because our controls are in check, it might end up blocking development if we don’t do something about it, and this is due to the fact that sometimes this package might be a transitive dependency. And until that gets fixed in our direct referenced packages with a version bump, we will be stuck, this depends on other’s and we can’t block the development flow due to this.

And so, an options is to fail silently, but staying on top updates, this can be done as easily as pnpm audit || echo 'Failing audit silently, waiting for esbuild bump'.

We could even go further and automate this, but nevertheless, this is just part of the workflow that’s established, some of the pains we have to accept and how to still get value out of everything that was setup.

As mentioned in CI, we have a matrix, one cool thing about GitHub is that it will indeed try to optimize runs, and that is in the form that if some of the matrix instances fails, if another one is still in the middle of running, it will stop immediately and be cancelled.

A way to be on top of updates is to always be checking on forums (hopefully GitHub) about how the maintainers will deal with this dependency.

E.g: GitHub Issue.

Depending on how people are looking at it you might even want to start adding the specific vulnerability to exceptions:

  "pnpm":{
    "auditConfig": {
      "ignoreGhsas": [
        "GHSA-67mh-4wv8-2f99"
      ]
    }
  }

This, for example, takes care of an esbuild vulnerability that won’t be fixed until a next major release for Vite, hence we will have to make do with the vulnerability. (And also adds extra info to our end as developers with a dependency), a decision in this specific use case was to make an update to V6.2 (for example) the moment it’s available since that will get rid of this vulnerability.

Outdated: Depending on the versions, we could get the pnpm outdated package failing instead of just adding "warnings". E.g.: globals went from version 15.15.0 to 16.0.0 and the step now started failing. How would you fix this? Try to run a pnpm update, if this doesn’t bump it to the version that is the latest, manually edit it and then run pnpm update again.