Files

Stanislav Hubacek 3fa11ef0f6 comiiit

2026-06-11 15:27:28 +02:00

25 KiB

Raw Blame History

🔄 CI/CD and DevOps

CI/CD Pipeline

Code Commit → Build → Test → Package → Deploy to Staging → Integration Tests → Deploy to Production

Detailed Pipeline Stages

1. Checkout ──→ 2. Lint ──→ 3. Test ──→ 4. Build ──→ 5. Scan ──→ 6. Publish ──→ 7. Deploy
                   │            │                      │
              ESLint/     Unit/Integ/           SAST/SCA/
              Prettier    e2e tests             Container scan

Stage	Tools	What Happens
Checkout	git clone, fetch	Retrieve code from repository, including submodules
Lint	ESLint, Prettier, RuboCop, golangci-lint	Static code analysis, formatting
Test (unit)	Jest, pytest, JUnit	Fast tests (ms to s), no dependencies
Test (integration)	Testcontainers, Docker Compose	Tests with DB, message queue, external services
Test (e2e)	Playwright, Cypress, Selenium	Full-stack tests in the browser
Build	Docker build, go build, npm build, Maven	Compilation, artifact assembly
Scan (SAST)	Semgrep, SonarQube, CodeQL	Static security analysis
Scan (DAST)	OWASP ZAP, Burp Suite	Dynamic analysis (running application)
Scan (SCA)	Dependabot, Snyk, Trivy	Dependency and CVE analysis
Publish	Docker push, npm publish, Maven deploy	Upload artifact to registry
Deploy	ArgoCD, Terraform, Helm, kubectl	Deploy to target environment

Continuous Integration (CI)

Automatic build and tests on every commit
Fast feedback loop (< 10 min)
Linting, type checking, unit tests, security scan (SAST)

Continuous Delivery (CD)

Automatic deployment to staging / test environments
Manual approval for production (optional)
Smoke tests after deployment

Continuous Deployment

Fully automatic deployment to production
Requires high confidence in tests and monitoring
Feature flags for risk management

GitHub Actions Detail

Workflow Syntax

name: CI Pipeline
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

env:
  NODE_VERSION: "22"

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}
      - run: npm ci
      - run: npm run lint

  test:
    runs-on: ubuntu-latest
    needs: lint
    strategy:
      matrix:
        node-version: [22, 24]
    steps:
      - uses: actions/checkout@v4
      - name: Run tests
        run: npm test

Matrix Builds

Run the same jobs with different parameters (OS, language version, architecture)
strategy.matrix — parameter combinations (Cartesian product)
strategy.fail-fast — stop all if one fails

Reusable Workflows

# .github/workflows/deploy.yml (called)
on:
  workflow_call:
    inputs:
      environment:
        required: true
        type: string
    secrets:
      cloud_role:
        required: true

# Call in caller workflow
jobs:
  deploy:
    uses: ./.github/workflows/deploy.yml
    with:
      environment: staging
    secrets:
      cloud_role: ${{ secrets.STAGING_ROLE }}

Composite Actions

Custom actions without needing a separate repository
Combination of run, uses, shell steps
Use case: standardize lint/test/build across repositories

Self-hosted Runners

Own infrastructure for running GitHub Actions
Use case: private network, GPU, specific HW, compliance
Scaling: actions-runner-controller (Kubernetes), auto-scaling groups
Security: job isolation, ephemeral runners

GitLab CI Detail

stages:
  - lint
  - test
  - build
  - deploy

variables:
  DOCKER_IMAGE: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA

lint:
  stage: lint
  image: node:22
  script:
    - npm ci
    - npm run lint

test:
  stage: test
  image: node:22
  needs: ["lint"]
  script:
    - npm test
  artifacts:
    paths:
      - coverage/
    reports:
      coverage_report:
        coverage_format: cobertura
        path: coverage/cobertura-coverage.xml

deploy-staging:
  stage: deploy
  needs: ["build"]
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
  environment:
    name: staging
    url: https://staging.example.com
  script:
    - kubectl set image deployment/app app=$DOCKER_IMAGE

Concepts:

Stages — sequential phases (each stage can have multiple parallel jobs)
Rules — execution conditions (branch, tag, changes, variables) — replaces only/except
Needs — DAG dependencies (job doesn't have to wait for entire stage)
Artifacts — file sharing between jobs (binaries, reports, cache)
Environments — deployment tracking (rollback, history, approvals)

DAG Pipelines (Needs)

lint ──→ test ──→ build ──→ deploy-staging ──→ deploy-prod
                        ↓
                   build-arm ──→ test-arm

Defines dependencies between jobs (not necessarily stages)
Enables parallelization of independent jobs
Reduces overall pipeline time

Infrastructure as Code (IaC)

Tool	Type	Language
Terraform	Declarative	HCL
OpenTofu	Declarative	HCL (Terraform fork)
Pulumi	Declarative	TypeScript, Python, Go, C#
AWS CDK	Declarative	TypeScript, Python, Java, C#
CloudFormation	Declarative	YAML/JSON (AWS)
Azure ARM/Bicep	Declarative	Bicep, JSON
Ansible	Imperative/Config	YAML
Chef/Puppet	Config mgmt	Ruby DSL

Infrastructure as Code (2nd Edition) — Kief Morris

Key reference for designing and operating dynamic cloud infrastructure with IaC. The book is tool-agnostic — it focuses on patterns and practices, not specific tools.

Three Fundamental Practices

Practice	Description
Define everything as code	All infrastructure defined in code, version control, repeatability
Continuously test and deliver	Every change goes through a pipeline with automated tests
Small, independent pieces	Small, loosely coupled components — easier change and testing

Principles of Cloud Infrastructure

Systems reproducible — infrastructure can be recreated from code at any time
Systems disposable — instances can be destroyed and recreated
Systems consistent — all environments identical (no snowflake servers)
Processes repeatable — automation instead of manual procedures
Design always changing — infrastructure is constantly evolving (not build-and-forget)

Anti-patterns (Pitfalls)

Anti-pattern	Description
Snowflake server	Each server different, cannot reproduce
Configuration drift	Manual changes → deviations from defined state
Server sprawl	Too many servers without management
Fragile infrastructure	Changes often break the system
Automation fear	Fear of automation → manual interventions

Book Structure (4 Parts)

Foundations — framework of tools and technologies for cloud platforms
Working with infrastructure stacks — defining, provisioning, testing and CD of infrastructure changes
Working with servers and application runtime platforms — provisioning and configuring servers and clusters
Working with large systems and teams — workflow, governance, architectural patterns for multiple teams

IaC Code Organization

Pattern	Description
Monorepo	One repository for everything — build-time integration, suitable for small teams
Microrepo	Separate repository for each project — isolation, suitable for large teams
Domain organization	Organizing code by domain concepts (not by technology)

Recommendations:

Infrastructure and applications can be in the same or separate repository depending on organizational structure (Team Topologies)
Per-environment configuration files (test, staging, production) stored within the project
Tests belong to the project, integration tests can be in a separate project
Infrastructure code should not directly deploy applications — use OS packaging (RPM, deb)

Expand-Contract Pattern for Infrastructure Changes

Same principle as database migrations:

Expand — add new resource (old version still running)
Migrate — move traffic / dependencies to the new resource
Contract — remove old resource

Prevents outages when refactoring infrastructure.

Terraform Detail

State Locking Mechanism

Backend	Locking Mechanism	Note
S3 + DynamoDB	DynamoDB (ConditionalPut)	Most common, cheap, simple
Terraform Cloud	Built-in (API)	SaaS, audit logs, VCS integration
Azure Storage	Azure Blob Lease	Similar to S3 model
GCS	Cloud Storage Object Hold	Limited
Consul	Consul KV session_lock	High-availability
PostgreSQL	pg_advisory_lock / row lock	Custom backend

State Backends Comparison

Property	S3 + DynamoDB	Terraform Cloud	Consul
Cost	$ (S3 + DynamoDB)	$$ (free tier limited)	$$ (infra)
Team workflow	GitHub Actions + OIDC	Native RBAC, runs	Custom
Locking	DynamoDB	Built-in	Consul session
History	S3 versioning	Full history, diff	None
Remote ops	No (state only)	Yes (remote runs)	No
Encryption	SSE-S3/KMS	At rest + in transit	TLS

Workspaces vs Terragrunt

Aspect	Terraform Workspaces	Terragrunt
State separation	One backend, key: `env:/workspace`	Separate backend per env
Code reuse	Same code, different variables	DRY configuration, modules
Risk	Accidentally `apply` to wrong workspace	Isolated backends
When to use	Simple projects, <5 envs	Microservices, multi-env, multi-team
Extra features	—	Dependency, include, before_hook

Provider Versioning

terraform {
  required_version = ">= 1.5, < 2.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = ">= 2.23"
    }
  }
}

~> 5.0 — only patch versions (5.x, x ≥ 0)
>= 2.23, < 3.0 — any 2.x from 2.23
~> constraints prevent breaking changes in major/minor

Terraform Workflow

terraform init     → Download provider modules
terraform plan     → Show changes
terraform apply    → Apply changes
terraform destroy  → Destroy infrastructure
terraform validate → Syntax validation
terraform fmt      → Format HCL

State Management

Remote state (S3, Terraform Cloud, Azure Storage)
State locking (DynamoDB, Consul)
Workspaces for environment separation

Terraform: Up and Running (3rd ed.) — Yevgeniy Brikman

Practical guide to Terraform from the founder of Gruntwork. The 3rd edition (2022) adds over 100 pages of new content, updates from Terraform 0.12 to 1.2, and two new chapters.

What's New in the 3rd Edition

New Feature	Description
Chapter: Secrets management	Managing secrets with Terraform — Vault, AWS Secrets Manager, KMS, OIDC, `sensitive` variables
Chapter: Multiple providers	Working with multiple regions, accounts, clouds including Kubernetes (AWS EKS)
Terraform 1.0+	Backward compatibility promise, stability, HashiCorp IPO
Provider versioning	`required_providers` block + `terraform.lock.hcl` (lock file)
Module iteration	`count` and `for_each` on modules (since Terraform 0.13)
Variable validation	`validation {}` blocks, `precondition` / `postcondition`
Refactoring	`moved` blocks — safe refactoring without manual state manipulation
CI/CD security	OIDC authentication, isolated workers for `terraform apply`

Secrets Management with Terraform

# Variable marked as sensitive — never shown in log
variable "db_password" {
  type      = string
  sensitive = true
}

# Reading secrets from AWS Secrets Manager
data "aws_secretsmanager_secret" "db" {
  name = "production/db/master"
}

data "aws_secretsmanager_secret_version" "db" {
  secret_id = data.aws_secretsmanager_secret.db.id
}

Recommended Security Hierarchy:

OIDC — most secure, no creds on CI server (GitHub Actions → IAM role)
IAM role — instance profile (EC2, ECS, EKS)
Environment variables — limited, risk of log leakage
Isolated workers — separate worker with admin permissions, API only plan/apply

Testing Terraform Code

Layer	Tools	Description
Static analysis	`terraform validate`, `tflint`, `tfsec`, `checkov`	Code analysis without execution
Plan testing	`conftest` + OPA (Rego), `terraform plan` parse	Plan validation against policy
Unit tests	Terratest (Go), `terraform fmt`, `terraform validate`	Testing modules in isolation
Integration tests	Terratest (Go)	Actual provisioning + assert
End-to-end tests	Terratest	Full stack, smoke tests

Policy Enforcement

# OPA / conftest — deny public S3 bucket
package main

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_s3_bucket"
  resource.change.after.acl == "public-read"
  msg = sprintf("%s must not be public", [resource.address])
}

Production-grade Checklist by Brikman

Small modules — one module = one thing (single responsibility)
Composable modules — modules can be composed into larger units
Testable modules — each module has tests (Terratest)
Releasable modules — versioning (Git tags, Terraform Registry)
Version control — everything in git, including .terraform.lock.hcl
Remote state — S3 + DynamoDB or Terraform Cloud
CI/CD pipeline — plan on MR, apply after merge to main
Secrets management — no secrets in plaintext in code
Policy as code — OPA / Sentinel for compliance
Sandbox environment — each developer has their own isolated environment

Golden Rule of Terraform

Master branch state must always be in sync with the production environment. Never run terraform apply manually locally on production — always via CI/CD.

Dockerfile Best Practices

# Multi-stage build
FROM node:22-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

# Runtime stage — distroless
FROM gcr.io/distroless/nodejs22-debian12
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
USER nonroot:nonroot
EXPOSE 3000
CMD ["dist/server.js"]

Rules:

Multi-stage build — separate build tools from runtime
Distroless images — minimal attack surface (no shell, package manager)
Non-root user — USER nonroot (security best practice)
Layer caching — copy less-frequently changing files first (package.json → npm ci → code)
Small base image — Alpine (5 MB), distroless (minimal), scratch (Go static binary)
Healthcheck — HEALTHCHECK instruction for orchestrator
Labels — LABEL maintainer, version, git commit
.dockerignore — minimize build context

Artifact Management

Docker Registries

Registry	Public/Private	Cost	Integration
Docker Hub	Both	Public free, private $5/month	GitHub Actions, GitLab
ECR (AWS)	Private	$0.10/GB/month + data transfer	IAM, ECS, EKS
GHCR (GitHub)	Both	Public free, private 500 MB free	GitHub Actions, npm
GCR / Artifact Registry	Private	$0.10/GB/month	GKE, Cloud Build
ACR (Azure)	Private	$0.11/GB/month	AKS, Azure DevOps
Harbor	Private (self-hosted)	Free (open source)	Custom, CNCF

Helm Charts

Repository — index.yaml + chart .tgz on HTTP server (S3, GitHub Pages, ChartMuseum)
OCI registry — Helm 3.8+ supports storing charts in OCI registries (ECR, GHCR, Harbor)
Versioning — chart version (package) + app version (application)

SBOM (Software Bill of Materials)

SPDX / CycloneDX — standard SBOM formats
Generation: Trivy, Syft, grype
Use case: supply chain security, compliance (EO 14028, EU CRA)

Configuration and Secrets

Tool	Description
Vault (HashiCorp)	Dynamic secrets, encryption-as-a-service
AWS Secrets Manager	Managed, auto-rotation
Azure Key Vault	Managed, HSM support
GCP Secret Manager	Managed
SOPS	Encryption in git repos
Sealed Secrets	Encrypted secrets for Kubernetes

Secret Management Workflows

Vault Agent Injector (Kubernetes)

Sidecar container (vault-agent) injects secrets into the pod
Secrets mounted as tmpfs volume (not into environment variables)
Auto-rotation: vault-agent periodically refreshes secrets

External Secrets Operator (Kubernetes)

CRD: ExternalSecret → creates Secret in K8s
Backend: AWS Secrets Manager, Azure Key Vault, GCP Secret Manager, Vault
Push-based refresh: change in external store → propagate to K8s

Sealed Secrets

kubeseal encrypts Secret on the cluster (controller has private key)
Encrypted manifest (SealedSecret) can be safely in git
Controller decrypts on deploy

GitOps

Principle: Git is the single source of truth
Tools: ArgoCD, Flux, Rancher Fleet
Pull-based deploy — agent in the cluster watches repo and applies changes
Auto-sync + drift detection

Environment Promotion (dev → staging → prod)

Code → Dev (auto-deploy) → Staging (auto + smoke tests) → Prod (manual approval + gating)

Quality Gates:

Unit tests — pass rate 100 %, code coverage ≥ 80 %
Integration tests — all critical paths pass
SAST scan — no critical/high vulnerabilities
SCA scan — no known critical CVEs
Container scan — all fixable vulns addressed
Smoke tests — after staging deploy (health endpoint, basic flow)
Manual approval — for production (optional with CD)

Deployment Strategies

Strategy	Description	Risk
Rolling update	Gradual instance replacement	Low
Blue/Green	Two identical environments, traffic switch	Medium
Canary	% traffic to new version, gradual increase	Low
Feature flag	Toggle feature on/off without deploy	Very low
A/B testing	Different versions for different users	Low

Git Branching Strategies

Strategy	Description	Suitable For
Trunk-based	Single main branch, short feature branches (< 1 day)	CD, microservices, mature teams
GitHub Flow	Main + feature branches, PRs, simple	Startups, web apps
GitLab Flow	Main + environment branches (staging, prod) + feature branches	Enterprise, regulated
GitFlow	Develop + main + feature/release/hotfix branches	Release-based, enterprise legacy
One Flow	Simplified GitFlow (no develop branch)	Medium teams

Rollback Strategies

Strategy	Description	Speed	Risk
Forward fix	New deploy with hotfix	Slow (build + deploy)	Low
Rollback (revert commit)	Git revert, new deploy	Medium	Low
Blue/Green switchback	Switch back to old version	Instant	DB incompatibility
Database rollback	Revert DB migration (migrate down)	Slow	Data loss risk

Database Rollback Challenges

Breaking changes — removing a column/table means rollback problem (data lost)
Best practice: Expand → Migrate → Contract (never remove in a single deploy)
Tooling: Flyway undo (limited), Liquibase rollback, pgroll (Postgres)
Feature flags as prevention — new code is behind a flag, rollback = disable flag

CI/CD Design Patterns

Modern CI/CD pipelines solve recurring problems using design patterns:

Pattern	Description
Pipeline as Code	Pipeline defined in YAML/Kotlin DSL (`.gitlab-ci.yml`, `.github/workflows/`)
Immutable Pipeline	Each build is an artifact, never changed
Quality Gate	Branch protection, required checks, code coverage threshold
Deployment Strategy	Blue/Green, Canary, Rolling (see table below)
GitOps	Pull-based deploy with auto-sync and drift detection
Shift-Left Security	SAST/DAST/SCA part of the pipeline
Dependency Caching	Cache layer between pipeline runs

Shift Left Security

SCA (Software Composition Analysis)

Tool	Type	Integration
Dependabot	GitHub native	GitHub, auto-PR for fix
Renovate	Multi-platform	GitHub, GitLab, Bitbucket
Snyk	SaaS + CLI	All platforms, Docker, IaC
Trivy	CLI, OSS	CI/CD pipeline (GitHub Actions, GitLab)

SAST (Static Application Security Testing)

Tool	Languages	Characteristics
Semgrep	30+ (Python, Java, Go, JS/TS)	Fast, custom rules, CI-native
SonarQube	30+	Comprehensive, quality gates, tech debt
CodeQL	12 (C++, C#, Go, Java, JS/TS, Python)	GitHub native, query-based
Checkmarx	30+	Enterprise, CxSAST, CxFlow
Fortify	30+	Enterprise, SAST + DAST

Container Scanning

Tool	Description
Trivy	OSS, scans OS packages + language-specific + IaC
Grype	OSS, from Anchore, fast, Syft for SBOM
Clair	Red Hat, OSS, OCI-compatible
Docker Scout	Docker Desktop / CLI, integration with Docker Hub

AI-Native Software Delivery (2025–2026)

AI is transforming DevOps 2.0:

AI-assisted CI/CD — automatic pipeline failure diagnosis, resource allocation optimization
Agent Control Protocol (ACP) / Model Context Protocol (MCP) — standards for AI agent interaction with tooling
AI-driven cost management — FinOps cloud optimization
Intelligent test selection — ML determines which tests to run based on code changes
Self-healing pipelines — AI auto-detects and fixes common issues

New tools: Harness (AI-native CD), GitLab 19.0 (agentic MR workflows, secrets manager), Octopus Deploy.

Pipeline Tools

GitHub Actions — integrated with GitHub, large marketplace
GitLab CI — native in GitLab, auto DevOps
Jenkins — oldest, extensible, self-hosted
CircleCI — SaaS, fast
Argo Workflows — Kubernetes native
Buildkite — hybrid (own agents, SaaS orchestrator)

Best Practices

Idempotent pipeline — repeated runs give the same result
Immutable infrastructure — never modify a running server, always redeploy
Shift left — tests and security as early as possible in the pipeline
Artifact management — all builds versioned in registry (Docker Hub, ECR, GHCR)
Dependency caching — speed up pipeline (npm ci, pip cache, Docker layer caching)
Fail fast — pipeline fails as early as possible on error

Resources

Links, books and standards: sources/cicd/sources.md

Book	Authors	ISBN	Key Contribution
The DevOps Handbook	Kim, Humble, Debois, Willis	978-1942788003	CALMS principles (Culture, Automation, Lean, Measurement, Sharing), flow map, deployment pipeline
Continuous Delivery	Humble, Farley	978-0321601912	Deployment pipeline, commit stage, acceptance tests, capacity testing, zero-downtime release
CI/CD Design Patterns	Bajpai, Schildmeijer, Piwosz, Mishra	978-1-83588-965-7	30+ design patterns for CI/CD — pipeline patterns, GitOps, security, testing, deployment strategies
DevOps Frameworks, Techniques, and Tools	Vijayakumaran, Kofler, Öggl, Springer	978-1-4932-2670-2	Framework for DevOps adoption, tool comparison (Jenkins vs GitLab vs GitHub Actions), techniques for monitoring and observability

OpenStack CI/CD

OpenStack ecosystem uses its own CI/CD tools:

Zuul

CI/CD system developed by the OpenStack community (now standalone, used outside OpenStack)
Gating — changes are tested before merge (not after merge) — prevents breaking main branch
Ansible-based — jobs are Ansible playbooks
Nodepool — dynamic test VM allocation in the cloud (OpenStack, AWS)
Pipeline — check, gate, post, periodic, tag, release

OpenStack Infra (OpenDev)

Public CI infrastructure for OpenStack projects
Tools: Gerrit (code review), Zuul (CI), Nodepool (test nodes), Storyboard (issue tracking)
Base jobs: tempest (integration tests), grenade (upgrade tests), devstack-gate (gate tests)

Integration with External Tools

Terraform — OpenStack provider for provisioning (terraform-provider-openstack)
Ansible — openstack.cloud collection for managing OpenStack resources
Packer — build OpenStack images (openstack builder)
Jenkins — older CI, still used in some distributions

Last revised: 2026-06-03

25 KiB Raw Blame History Unescape Escape