Files
knowledge-base/CICD.en.md
Stanislav Hubacek 3fa11ef0f6 comiiit
2026-06-11 15:27:28 +02:00

25 KiB
Raw Blame History

🔄 CI/CD and DevOps

CI/CD Pipeline

Code Commit → Build → Test → Package → Deploy to Staging → Integration Tests → Deploy to Production

Detailed Pipeline Stages

1. Checkout ──→ 2. Lint ──→ 3. Test ──→ 4. Build ──→ 5. Scan ──→ 6. Publish ──→ 7. Deploy
                   │            │                      │
              ESLint/     Unit/Integ/           SAST/SCA/
              Prettier    e2e tests             Container scan
Stage Tools What Happens
Checkout git clone, fetch Retrieve code from repository, including submodules
Lint ESLint, Prettier, RuboCop, golangci-lint Static code analysis, formatting
Test (unit) Jest, pytest, JUnit Fast tests (ms to s), no dependencies
Test (integration) Testcontainers, Docker Compose Tests with DB, message queue, external services
Test (e2e) Playwright, Cypress, Selenium Full-stack tests in the browser
Build Docker build, go build, npm build, Maven Compilation, artifact assembly
Scan (SAST) Semgrep, SonarQube, CodeQL Static security analysis
Scan (DAST) OWASP ZAP, Burp Suite Dynamic analysis (running application)
Scan (SCA) Dependabot, Snyk, Trivy Dependency and CVE analysis
Publish Docker push, npm publish, Maven deploy Upload artifact to registry
Deploy ArgoCD, Terraform, Helm, kubectl Deploy to target environment

Continuous Integration (CI)

  • Automatic build and tests on every commit
  • Fast feedback loop (< 10 min)
  • Linting, type checking, unit tests, security scan (SAST)

Continuous Delivery (CD)

  • Automatic deployment to staging / test environments
  • Manual approval for production (optional)
  • Smoke tests after deployment

Continuous Deployment

  • Fully automatic deployment to production
  • Requires high confidence in tests and monitoring
  • Feature flags for risk management

GitHub Actions Detail

Workflow Syntax

name: CI Pipeline
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

env:
  NODE_VERSION: "22"

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}
      - run: npm ci
      - run: npm run lint

  test:
    runs-on: ubuntu-latest
    needs: lint
    strategy:
      matrix:
        node-version: [22, 24]
    steps:
      - uses: actions/checkout@v4
      - name: Run tests
        run: npm test

Matrix Builds

  • Run the same jobs with different parameters (OS, language version, architecture)
  • strategy.matrix — parameter combinations (Cartesian product)
  • strategy.fail-fast — stop all if one fails

Reusable Workflows

# .github/workflows/deploy.yml (called)
on:
  workflow_call:
    inputs:
      environment:
        required: true
        type: string
    secrets:
      cloud_role:
        required: true

# Call in caller workflow
jobs:
  deploy:
    uses: ./.github/workflows/deploy.yml
    with:
      environment: staging
    secrets:
      cloud_role: ${{ secrets.STAGING_ROLE }}

Composite Actions

  • Custom actions without needing a separate repository
  • Combination of run, uses, shell steps
  • Use case: standardize lint/test/build across repositories

Self-hosted Runners

  • Own infrastructure for running GitHub Actions
  • Use case: private network, GPU, specific HW, compliance
  • Scaling: actions-runner-controller (Kubernetes), auto-scaling groups
  • Security: job isolation, ephemeral runners

GitLab CI Detail

stages:
  - lint
  - test
  - build
  - deploy

variables:
  DOCKER_IMAGE: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA

lint:
  stage: lint
  image: node:22
  script:
    - npm ci
    - npm run lint

test:
  stage: test
  image: node:22
  needs: ["lint"]
  script:
    - npm test
  artifacts:
    paths:
      - coverage/
    reports:
      coverage_report:
        coverage_format: cobertura
        path: coverage/cobertura-coverage.xml

deploy-staging:
  stage: deploy
  needs: ["build"]
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
  environment:
    name: staging
    url: https://staging.example.com
  script:
    - kubectl set image deployment/app app=$DOCKER_IMAGE

Concepts:

  • Stages — sequential phases (each stage can have multiple parallel jobs)
  • Rules — execution conditions (branch, tag, changes, variables) — replaces only/except
  • Needs — DAG dependencies (job doesn't have to wait for entire stage)
  • Artifacts — file sharing between jobs (binaries, reports, cache)
  • Environments — deployment tracking (rollback, history, approvals)

DAG Pipelines (Needs)

lint ──→ test ──→ build ──→ deploy-staging ──→ deploy-prod
                        ↓
                   build-arm ──→ test-arm
  • Defines dependencies between jobs (not necessarily stages)
  • Enables parallelization of independent jobs
  • Reduces overall pipeline time

Infrastructure as Code (IaC)

Tool Type Language
Terraform Declarative HCL
OpenTofu Declarative HCL (Terraform fork)
Pulumi Declarative TypeScript, Python, Go, C#
AWS CDK Declarative TypeScript, Python, Java, C#
CloudFormation Declarative YAML/JSON (AWS)
Azure ARM/Bicep Declarative Bicep, JSON
Ansible Imperative/Config YAML
Chef/Puppet Config mgmt Ruby DSL

Infrastructure as Code (2nd Edition) — Kief Morris

Key reference for designing and operating dynamic cloud infrastructure with IaC. The book is tool-agnostic — it focuses on patterns and practices, not specific tools.

Three Fundamental Practices

Practice Description
Define everything as code All infrastructure defined in code, version control, repeatability
Continuously test and deliver Every change goes through a pipeline with automated tests
Small, independent pieces Small, loosely coupled components — easier change and testing

Principles of Cloud Infrastructure

  • Systems reproducible — infrastructure can be recreated from code at any time
  • Systems disposable — instances can be destroyed and recreated
  • Systems consistent — all environments identical (no snowflake servers)
  • Processes repeatable — automation instead of manual procedures
  • Design always changing — infrastructure is constantly evolving (not build-and-forget)

Anti-patterns (Pitfalls)

Anti-pattern Description
Snowflake server Each server different, cannot reproduce
Configuration drift Manual changes → deviations from defined state
Server sprawl Too many servers without management
Fragile infrastructure Changes often break the system
Automation fear Fear of automation → manual interventions

Book Structure (4 Parts)

  1. Foundations — framework of tools and technologies for cloud platforms
  2. Working with infrastructure stacks — defining, provisioning, testing and CD of infrastructure changes
  3. Working with servers and application runtime platforms — provisioning and configuring servers and clusters
  4. Working with large systems and teams — workflow, governance, architectural patterns for multiple teams

IaC Code Organization

Pattern Description
Monorepo One repository for everything — build-time integration, suitable for small teams
Microrepo Separate repository for each project — isolation, suitable for large teams
Domain organization Organizing code by domain concepts (not by technology)

Recommendations:

  • Infrastructure and applications can be in the same or separate repository depending on organizational structure (Team Topologies)
  • Per-environment configuration files (test, staging, production) stored within the project
  • Tests belong to the project, integration tests can be in a separate project
  • Infrastructure code should not directly deploy applications — use OS packaging (RPM, deb)

Expand-Contract Pattern for Infrastructure Changes

Same principle as database migrations:

  1. Expand — add new resource (old version still running)
  2. Migrate — move traffic / dependencies to the new resource
  3. Contract — remove old resource

Prevents outages when refactoring infrastructure.

Terraform Detail

State Locking Mechanism

Backend Locking Mechanism Note
S3 + DynamoDB DynamoDB (ConditionalPut) Most common, cheap, simple
Terraform Cloud Built-in (API) SaaS, audit logs, VCS integration
Azure Storage Azure Blob Lease Similar to S3 model
GCS Cloud Storage Object Hold Limited
Consul Consul KV session_lock High-availability
PostgreSQL pg_advisory_lock / row lock Custom backend

State Backends Comparison

Property S3 + DynamoDB Terraform Cloud Consul
Cost $ (S3 + DynamoDB) $$ (free tier limited) $$ (infra)
Team workflow GitHub Actions + OIDC Native RBAC, runs Custom
Locking DynamoDB Built-in Consul session
History S3 versioning Full history, diff None
Remote ops No (state only) Yes (remote runs) No
Encryption SSE-S3/KMS At rest + in transit TLS

Workspaces vs Terragrunt

Aspect Terraform Workspaces Terragrunt
State separation One backend, key: env:/workspace Separate backend per env
Code reuse Same code, different variables DRY configuration, modules
Risk Accidentally apply to wrong workspace Isolated backends
When to use Simple projects, <5 envs Microservices, multi-env, multi-team
Extra features Dependency, include, before_hook

Provider Versioning

terraform {
  required_version = ">= 1.5, < 2.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = ">= 2.23"
    }
  }
}
  • ~> 5.0 — only patch versions (5.x, x ≥ 0)
  • >= 2.23, < 3.0 — any 2.x from 2.23
  • ~> constraints prevent breaking changes in major/minor

Terraform Workflow

terraform init     → Download provider modules
terraform plan     → Show changes
terraform apply    → Apply changes
terraform destroy  → Destroy infrastructure
terraform validate → Syntax validation
terraform fmt      → Format HCL

State Management

  • Remote state (S3, Terraform Cloud, Azure Storage)
  • State locking (DynamoDB, Consul)
  • Workspaces for environment separation

Terraform: Up and Running (3rd ed.) — Yevgeniy Brikman

Practical guide to Terraform from the founder of Gruntwork. The 3rd edition (2022) adds over 100 pages of new content, updates from Terraform 0.12 to 1.2, and two new chapters.

What's New in the 3rd Edition

New Feature Description
Chapter: Secrets management Managing secrets with Terraform — Vault, AWS Secrets Manager, KMS, OIDC, sensitive variables
Chapter: Multiple providers Working with multiple regions, accounts, clouds including Kubernetes (AWS EKS)
Terraform 1.0+ Backward compatibility promise, stability, HashiCorp IPO
Provider versioning required_providers block + terraform.lock.hcl (lock file)
Module iteration count and for_each on modules (since Terraform 0.13)
Variable validation validation {} blocks, precondition / postcondition
Refactoring moved blocks — safe refactoring without manual state manipulation
CI/CD security OIDC authentication, isolated workers for terraform apply

Secrets Management with Terraform

# Variable marked as sensitive — never shown in log
variable "db_password" {
  type      = string
  sensitive = true
}

# Reading secrets from AWS Secrets Manager
data "aws_secretsmanager_secret" "db" {
  name = "production/db/master"
}

data "aws_secretsmanager_secret_version" "db" {
  secret_id = data.aws_secretsmanager_secret.db.id
}

Recommended Security Hierarchy:

  1. OIDC — most secure, no creds on CI server (GitHub Actions → IAM role)
  2. IAM role — instance profile (EC2, ECS, EKS)
  3. Environment variables — limited, risk of log leakage
  4. Isolated workers — separate worker with admin permissions, API only plan/apply

Testing Terraform Code

Layer Tools Description
Static analysis terraform validate, tflint, tfsec, checkov Code analysis without execution
Plan testing conftest + OPA (Rego), terraform plan parse Plan validation against policy
Unit tests Terratest (Go), terraform fmt, terraform validate Testing modules in isolation
Integration tests Terratest (Go) Actual provisioning + assert
End-to-end tests Terratest Full stack, smoke tests

Policy Enforcement

# OPA / conftest — deny public S3 bucket
package main

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_s3_bucket"
  resource.change.after.acl == "public-read"
  msg = sprintf("%s must not be public", [resource.address])
}

Production-grade Checklist by Brikman

  1. Small modules — one module = one thing (single responsibility)
  2. Composable modules — modules can be composed into larger units
  3. Testable modules — each module has tests (Terratest)
  4. Releasable modules — versioning (Git tags, Terraform Registry)
  5. Version control — everything in git, including .terraform.lock.hcl
  6. Remote state — S3 + DynamoDB or Terraform Cloud
  7. CI/CD pipelineplan on MR, apply after merge to main
  8. Secrets management — no secrets in plaintext in code
  9. Policy as code — OPA / Sentinel for compliance
  10. Sandbox environment — each developer has their own isolated environment

Golden Rule of Terraform

Master branch state must always be in sync with the production environment. Never run terraform apply manually locally on production — always via CI/CD.

Dockerfile Best Practices

# Multi-stage build
FROM node:22-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

# Runtime stage — distroless
FROM gcr.io/distroless/nodejs22-debian12
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
USER nonroot:nonroot
EXPOSE 3000
CMD ["dist/server.js"]

Rules:

  • Multi-stage build — separate build tools from runtime
  • Distroless images — minimal attack surface (no shell, package manager)
  • Non-root user — USER nonroot (security best practice)
  • Layer caching — copy less-frequently changing files first (package.json → npm ci → code)
  • Small base image — Alpine (5 MB), distroless (minimal), scratch (Go static binary)
  • Healthcheck — HEALTHCHECK instruction for orchestrator
  • Labels — LABEL maintainer, version, git commit
  • .dockerignore — minimize build context

Artifact Management

Docker Registries

Registry Public/Private Cost Integration
Docker Hub Both Public free, private $5/month GitHub Actions, GitLab
ECR (AWS) Private $0.10/GB/month + data transfer IAM, ECS, EKS
GHCR (GitHub) Both Public free, private 500 MB free GitHub Actions, npm
GCR / Artifact Registry Private $0.10/GB/month GKE, Cloud Build
ACR (Azure) Private $0.11/GB/month AKS, Azure DevOps
Harbor Private (self-hosted) Free (open source) Custom, CNCF

Helm Charts

  • Repository — index.yaml + chart .tgz on HTTP server (S3, GitHub Pages, ChartMuseum)
  • OCI registry — Helm 3.8+ supports storing charts in OCI registries (ECR, GHCR, Harbor)
  • Versioning — chart version (package) + app version (application)

SBOM (Software Bill of Materials)

  • SPDX / CycloneDX — standard SBOM formats
  • Generation: Trivy, Syft, grype
  • Use case: supply chain security, compliance (EO 14028, EU CRA)

Configuration and Secrets

Tool Description
Vault (HashiCorp) Dynamic secrets, encryption-as-a-service
AWS Secrets Manager Managed, auto-rotation
Azure Key Vault Managed, HSM support
GCP Secret Manager Managed
SOPS Encryption in git repos
Sealed Secrets Encrypted secrets for Kubernetes

Secret Management Workflows

Vault Agent Injector (Kubernetes)

  • Sidecar container (vault-agent) injects secrets into the pod
  • Secrets mounted as tmpfs volume (not into environment variables)
  • Auto-rotation: vault-agent periodically refreshes secrets

External Secrets Operator (Kubernetes)

  • CRD: ExternalSecret → creates Secret in K8s
  • Backend: AWS Secrets Manager, Azure Key Vault, GCP Secret Manager, Vault
  • Push-based refresh: change in external store → propagate to K8s

Sealed Secrets

  • kubeseal encrypts Secret on the cluster (controller has private key)
  • Encrypted manifest (SealedSecret) can be safely in git
  • Controller decrypts on deploy

GitOps

  • Principle: Git is the single source of truth
  • Tools: ArgoCD, Flux, Rancher Fleet
  • Pull-based deploy — agent in the cluster watches repo and applies changes
  • Auto-sync + drift detection

Environment Promotion (dev → staging → prod)

Code → Dev (auto-deploy) → Staging (auto + smoke tests) → Prod (manual approval + gating)

Quality Gates:

  1. Unit tests — pass rate 100 %, code coverage ≥ 80 %
  2. Integration tests — all critical paths pass
  3. SAST scan — no critical/high vulnerabilities
  4. SCA scan — no known critical CVEs
  5. Container scan — all fixable vulns addressed
  6. Smoke tests — after staging deploy (health endpoint, basic flow)
  7. Manual approval — for production (optional with CD)

Deployment Strategies

Strategy Description Risk
Rolling update Gradual instance replacement Low
Blue/Green Two identical environments, traffic switch Medium
Canary % traffic to new version, gradual increase Low
Feature flag Toggle feature on/off without deploy Very low
A/B testing Different versions for different users Low

Git Branching Strategies

Strategy Description Suitable For
Trunk-based Single main branch, short feature branches (< 1 day) CD, microservices, mature teams
GitHub Flow Main + feature branches, PRs, simple Startups, web apps
GitLab Flow Main + environment branches (staging, prod) + feature branches Enterprise, regulated
GitFlow Develop + main + feature/release/hotfix branches Release-based, enterprise legacy
One Flow Simplified GitFlow (no develop branch) Medium teams

Rollback Strategies

Strategy Description Speed Risk
Forward fix New deploy with hotfix Slow (build + deploy) Low
Rollback (revert commit) Git revert, new deploy Medium Low
Blue/Green switchback Switch back to old version Instant DB incompatibility
Database rollback Revert DB migration (migrate down) Slow Data loss risk

Database Rollback Challenges

  • Breaking changes — removing a column/table means rollback problem (data lost)
  • Best practice: Expand → Migrate → Contract (never remove in a single deploy)
  • Tooling: Flyway undo (limited), Liquibase rollback, pgroll (Postgres)
  • Feature flags as prevention — new code is behind a flag, rollback = disable flag

CI/CD Design Patterns

Modern CI/CD pipelines solve recurring problems using design patterns:

Pattern Description
Pipeline as Code Pipeline defined in YAML/Kotlin DSL (.gitlab-ci.yml, .github/workflows/)
Immutable Pipeline Each build is an artifact, never changed
Quality Gate Branch protection, required checks, code coverage threshold
Deployment Strategy Blue/Green, Canary, Rolling (see table below)
GitOps Pull-based deploy with auto-sync and drift detection
Shift-Left Security SAST/DAST/SCA part of the pipeline
Dependency Caching Cache layer between pipeline runs

Shift Left Security

SCA (Software Composition Analysis)

Tool Type Integration
Dependabot GitHub native GitHub, auto-PR for fix
Renovate Multi-platform GitHub, GitLab, Bitbucket
Snyk SaaS + CLI All platforms, Docker, IaC
Trivy CLI, OSS CI/CD pipeline (GitHub Actions, GitLab)

SAST (Static Application Security Testing)

Tool Languages Characteristics
Semgrep 30+ (Python, Java, Go, JS/TS) Fast, custom rules, CI-native
SonarQube 30+ Comprehensive, quality gates, tech debt
CodeQL 12 (C++, C#, Go, Java, JS/TS, Python) GitHub native, query-based
Checkmarx 30+ Enterprise, CxSAST, CxFlow
Fortify 30+ Enterprise, SAST + DAST

Container Scanning

Tool Description
Trivy OSS, scans OS packages + language-specific + IaC
Grype OSS, from Anchore, fast, Syft for SBOM
Clair Red Hat, OSS, OCI-compatible
Docker Scout Docker Desktop / CLI, integration with Docker Hub

AI-Native Software Delivery (20252026)

AI is transforming DevOps 2.0:

  • AI-assisted CI/CD — automatic pipeline failure diagnosis, resource allocation optimization
  • Agent Control Protocol (ACP) / Model Context Protocol (MCP) — standards for AI agent interaction with tooling
  • AI-driven cost management — FinOps cloud optimization
  • Intelligent test selection — ML determines which tests to run based on code changes
  • Self-healing pipelines — AI auto-detects and fixes common issues

New tools: Harness (AI-native CD), GitLab 19.0 (agentic MR workflows, secrets manager), Octopus Deploy.

Pipeline Tools

  • GitHub Actions — integrated with GitHub, large marketplace
  • GitLab CI — native in GitLab, auto DevOps
  • Jenkins — oldest, extensible, self-hosted
  • CircleCI — SaaS, fast
  • Argo Workflows — Kubernetes native
  • Buildkite — hybrid (own agents, SaaS orchestrator)

Best Practices

  • Idempotent pipeline — repeated runs give the same result
  • Immutable infrastructure — never modify a running server, always redeploy
  • Shift left — tests and security as early as possible in the pipeline
  • Artifact management — all builds versioned in registry (Docker Hub, ECR, GHCR)
  • Dependency caching — speed up pipeline (npm ci, pip cache, Docker layer caching)
  • Fail fast — pipeline fails as early as possible on error

Resources

Links, books and standards: sources/cicd/sources.md

Book Authors ISBN Key Contribution
The DevOps Handbook Kim, Humble, Debois, Willis 978-1942788003 CALMS principles (Culture, Automation, Lean, Measurement, Sharing), flow map, deployment pipeline
Continuous Delivery Humble, Farley 978-0321601912 Deployment pipeline, commit stage, acceptance tests, capacity testing, zero-downtime release
CI/CD Design Patterns Bajpai, Schildmeijer, Piwosz, Mishra 978-1-83588-965-7 30+ design patterns for CI/CD — pipeline patterns, GitOps, security, testing, deployment strategies
DevOps Frameworks, Techniques, and Tools Vijayakumaran, Kofler, Öggl, Springer 978-1-4932-2670-2 Framework for DevOps adoption, tool comparison (Jenkins vs GitLab vs GitHub Actions), techniques for monitoring and observability
  • Quality gates — automated checks before every promotion to the next environment
  • Pipeline visibility — dashboard with current status of all pipelines (GitHub, GitLab, ArgoCD)

OpenStack CI/CD

OpenStack ecosystem uses its own CI/CD tools:

Zuul

  • CI/CD system developed by the OpenStack community (now standalone, used outside OpenStack)
  • Gating — changes are tested before merge (not after merge) — prevents breaking main branch
  • Ansible-based — jobs are Ansible playbooks
  • Nodepool — dynamic test VM allocation in the cloud (OpenStack, AWS)
  • Pipeline — check, gate, post, periodic, tag, release

OpenStack Infra (OpenDev)

  • Public CI infrastructure for OpenStack projects
  • Tools: Gerrit (code review), Zuul (CI), Nodepool (test nodes), Storyboard (issue tracking)
  • Base jobs: tempest (integration tests), grenade (upgrade tests), devstack-gate (gate tests)

Integration with External Tools

  • Terraform — OpenStack provider for provisioning (terraform-provider-openstack)
  • Ansible — openstack.cloud collection for managing OpenStack resources
  • Packer — build OpenStack images (openstack builder)
  • Jenkins — older CI, still used in some distributions

Last revised: 2026-06-03