comiiit
This commit is contained in:
679
CICD.en.md
Normal file
679
CICD.en.md
Normal file
@@ -0,0 +1,679 @@
|
||||
# 🔄 CI/CD and DevOps
|
||||
|
||||
## CI/CD Pipeline
|
||||
|
||||
```
|
||||
Code Commit → Build → Test → Package → Deploy to Staging → Integration Tests → Deploy to Production
|
||||
```
|
||||
|
||||
### Detailed Pipeline Stages
|
||||
|
||||
```
|
||||
1. Checkout ──→ 2. Lint ──→ 3. Test ──→ 4. Build ──→ 5. Scan ──→ 6. Publish ──→ 7. Deploy
|
||||
│ │ │
|
||||
ESLint/ Unit/Integ/ SAST/SCA/
|
||||
Prettier e2e tests Container scan
|
||||
```
|
||||
|
||||
| Stage | Tools | What Happens |
|
||||
|-------|-------|--------------|
|
||||
| **Checkout** | git clone, fetch | Retrieve code from repository, including submodules |
|
||||
| **Lint** | ESLint, Prettier, RuboCop, golangci-lint | Static code analysis, formatting |
|
||||
| **Test (unit)** | Jest, pytest, JUnit | Fast tests (ms to s), no dependencies |
|
||||
| **Test (integration)** | Testcontainers, Docker Compose | Tests with DB, message queue, external services |
|
||||
| **Test (e2e)** | Playwright, Cypress, Selenium | Full-stack tests in the browser |
|
||||
| **Build** | Docker build, go build, npm build, Maven | Compilation, artifact assembly |
|
||||
| **Scan (SAST)** | Semgrep, SonarQube, CodeQL | Static security analysis |
|
||||
| **Scan (DAST)** | OWASP ZAP, Burp Suite | Dynamic analysis (running application) |
|
||||
| **Scan (SCA)** | Dependabot, Snyk, Trivy | Dependency and CVE analysis |
|
||||
| **Publish** | Docker push, npm publish, Maven deploy | Upload artifact to registry |
|
||||
| **Deploy** | ArgoCD, Terraform, Helm, kubectl | Deploy to target environment |
|
||||
|
||||
### Continuous Integration (CI)
|
||||
|
||||
- Automatic build and tests on every commit
|
||||
- Fast feedback loop (< 10 min)
|
||||
- Linting, type checking, unit tests, security scan (SAST)
|
||||
|
||||
### Continuous Delivery (CD)
|
||||
|
||||
- Automatic deployment to staging / test environments
|
||||
- Manual approval for production (optional)
|
||||
- Smoke tests after deployment
|
||||
|
||||
### Continuous Deployment
|
||||
|
||||
- Fully automatic deployment to production
|
||||
- Requires high confidence in tests and monitoring
|
||||
- Feature flags for risk management
|
||||
|
||||
## GitHub Actions Detail
|
||||
|
||||
### Workflow Syntax
|
||||
|
||||
```yaml
|
||||
name: CI Pipeline
|
||||
on:
|
||||
push:
|
||||
branches: [main]
|
||||
pull_request:
|
||||
branches: [main]
|
||||
|
||||
env:
|
||||
NODE_VERSION: "22"
|
||||
|
||||
jobs:
|
||||
lint:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/setup-node@v4
|
||||
with:
|
||||
node-version: ${{ env.NODE_VERSION }}
|
||||
- run: npm ci
|
||||
- run: npm run lint
|
||||
|
||||
test:
|
||||
runs-on: ubuntu-latest
|
||||
needs: lint
|
||||
strategy:
|
||||
matrix:
|
||||
node-version: [22, 24]
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- name: Run tests
|
||||
run: npm test
|
||||
```
|
||||
|
||||
### Matrix Builds
|
||||
|
||||
- Run the same jobs with different parameters (OS, language version, architecture)
|
||||
- `strategy.matrix` — parameter combinations (Cartesian product)
|
||||
- `strategy.fail-fast` — stop all if one fails
|
||||
|
||||
### Reusable Workflows
|
||||
|
||||
```yaml
|
||||
# .github/workflows/deploy.yml (called)
|
||||
on:
|
||||
workflow_call:
|
||||
inputs:
|
||||
environment:
|
||||
required: true
|
||||
type: string
|
||||
secrets:
|
||||
cloud_role:
|
||||
required: true
|
||||
|
||||
# Call in caller workflow
|
||||
jobs:
|
||||
deploy:
|
||||
uses: ./.github/workflows/deploy.yml
|
||||
with:
|
||||
environment: staging
|
||||
secrets:
|
||||
cloud_role: ${{ secrets.STAGING_ROLE }}
|
||||
```
|
||||
|
||||
### Composite Actions
|
||||
|
||||
- Custom actions without needing a separate repository
|
||||
- Combination of `run`, `uses`, `shell` steps
|
||||
- Use case: standardize lint/test/build across repositories
|
||||
|
||||
### Self-hosted Runners
|
||||
|
||||
- Own infrastructure for running GitHub Actions
|
||||
- Use case: private network, GPU, specific HW, compliance
|
||||
- Scaling: actions-runner-controller (Kubernetes), auto-scaling groups
|
||||
- Security: job isolation, ephemeral runners
|
||||
|
||||
## GitLab CI Detail
|
||||
|
||||
```yaml
|
||||
stages:
|
||||
- lint
|
||||
- test
|
||||
- build
|
||||
- deploy
|
||||
|
||||
variables:
|
||||
DOCKER_IMAGE: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA
|
||||
|
||||
lint:
|
||||
stage: lint
|
||||
image: node:22
|
||||
script:
|
||||
- npm ci
|
||||
- npm run lint
|
||||
|
||||
test:
|
||||
stage: test
|
||||
image: node:22
|
||||
needs: ["lint"]
|
||||
script:
|
||||
- npm test
|
||||
artifacts:
|
||||
paths:
|
||||
- coverage/
|
||||
reports:
|
||||
coverage_report:
|
||||
coverage_format: cobertura
|
||||
path: coverage/cobertura-coverage.xml
|
||||
|
||||
deploy-staging:
|
||||
stage: deploy
|
||||
needs: ["build"]
|
||||
rules:
|
||||
- if: $CI_COMMIT_BRANCH == "main"
|
||||
environment:
|
||||
name: staging
|
||||
url: https://staging.example.com
|
||||
script:
|
||||
- kubectl set image deployment/app app=$DOCKER_IMAGE
|
||||
```
|
||||
|
||||
**Concepts**:
|
||||
- **Stages** — sequential phases (each stage can have multiple parallel jobs)
|
||||
- **Rules** — execution conditions (branch, tag, changes, variables) — replaces `only/except`
|
||||
- **Needs** — DAG dependencies (job doesn't have to wait for entire stage)
|
||||
- **Artifacts** — file sharing between jobs (binaries, reports, cache)
|
||||
- **Environments** — deployment tracking (rollback, history, approvals)
|
||||
|
||||
### DAG Pipelines (Needs)
|
||||
|
||||
```
|
||||
lint ──→ test ──→ build ──→ deploy-staging ──→ deploy-prod
|
||||
↓
|
||||
build-arm ──→ test-arm
|
||||
```
|
||||
|
||||
- Defines dependencies between jobs (not necessarily stages)
|
||||
- Enables parallelization of independent jobs
|
||||
- Reduces overall pipeline time
|
||||
|
||||
## Infrastructure as Code (IaC)
|
||||
|
||||
| Tool | Type | Language |
|
||||
|------|------|----------|
|
||||
| Terraform | Declarative | HCL |
|
||||
| OpenTofu | Declarative | HCL (Terraform fork) |
|
||||
| Pulumi | Declarative | TypeScript, Python, Go, C# |
|
||||
| AWS CDK | Declarative | TypeScript, Python, Java, C# |
|
||||
| CloudFormation | Declarative | YAML/JSON (AWS) |
|
||||
| Azure ARM/Bicep | Declarative | Bicep, JSON |
|
||||
| Ansible | Imperative/Config | YAML |
|
||||
| Chef/Puppet | Config mgmt | Ruby DSL |
|
||||
|
||||
### Infrastructure as Code (2nd Edition) — Kief Morris
|
||||
|
||||
Key reference for designing and operating dynamic cloud infrastructure with IaC. The book is tool-agnostic — it focuses on patterns and practices, not specific tools.
|
||||
|
||||
#### Three Fundamental Practices
|
||||
|
||||
| Practice | Description |
|
||||
|----------|-------------|
|
||||
| **Define everything as code** | All infrastructure defined in code, version control, repeatability |
|
||||
| **Continuously test and deliver** | Every change goes through a pipeline with automated tests |
|
||||
| **Small, independent pieces** | Small, loosely coupled components — easier change and testing |
|
||||
|
||||
#### Principles of Cloud Infrastructure
|
||||
|
||||
- **Systems reproducible** — infrastructure can be recreated from code at any time
|
||||
- **Systems disposable** — instances can be destroyed and recreated
|
||||
- **Systems consistent** — all environments identical (no snowflake servers)
|
||||
- **Processes repeatable** — automation instead of manual procedures
|
||||
- **Design always changing** — infrastructure is constantly evolving (not build-and-forget)
|
||||
|
||||
#### Anti-patterns (Pitfalls)
|
||||
|
||||
| Anti-pattern | Description |
|
||||
|--------------|-------------|
|
||||
| **Snowflake server** | Each server different, cannot reproduce |
|
||||
| **Configuration drift** | Manual changes → deviations from defined state |
|
||||
| **Server sprawl** | Too many servers without management |
|
||||
| **Fragile infrastructure** | Changes often break the system |
|
||||
| **Automation fear** | Fear of automation → manual interventions |
|
||||
|
||||
#### Book Structure (4 Parts)
|
||||
|
||||
1. **Foundations** — framework of tools and technologies for cloud platforms
|
||||
2. **Working with infrastructure stacks** — defining, provisioning, testing and CD of infrastructure changes
|
||||
3. **Working with servers and application runtime platforms** — provisioning and configuring servers and clusters
|
||||
4. **Working with large systems and teams** — workflow, governance, architectural patterns for multiple teams
|
||||
|
||||
#### IaC Code Organization
|
||||
|
||||
| Pattern | Description |
|
||||
|---------|-------------|
|
||||
| **Monorepo** | One repository for everything — build-time integration, suitable for small teams |
|
||||
| **Microrepo** | Separate repository for each project — isolation, suitable for large teams |
|
||||
| **Domain organization** | Organizing code by domain concepts (not by technology) |
|
||||
|
||||
**Recommendations:**
|
||||
- Infrastructure and applications can be in the same or separate repository depending on organizational structure (Team Topologies)
|
||||
- Per-environment configuration files (test, staging, production) stored within the project
|
||||
- Tests belong to the project, integration tests can be in a separate project
|
||||
- Infrastructure code should not directly deploy applications — use OS packaging (RPM, deb)
|
||||
|
||||
#### Expand-Contract Pattern for Infrastructure Changes
|
||||
|
||||
Same principle as database migrations:
|
||||
1. **Expand** — add new resource (old version still running)
|
||||
2. **Migrate** — move traffic / dependencies to the new resource
|
||||
3. **Contract** — remove old resource
|
||||
|
||||
Prevents outages when refactoring infrastructure.
|
||||
|
||||
## Terraform Detail
|
||||
|
||||
#### State Locking Mechanism
|
||||
|
||||
| Backend | Locking Mechanism | Note |
|
||||
|---------|-------------------|------|
|
||||
| **S3 + DynamoDB** | DynamoDB (ConditionalPut) | Most common, cheap, simple |
|
||||
| **Terraform Cloud** | Built-in (API) | SaaS, audit logs, VCS integration |
|
||||
| **Azure Storage** | Azure Blob Lease | Similar to S3 model |
|
||||
| **GCS** | Cloud Storage Object Hold | Limited |
|
||||
| **Consul** | Consul KV session_lock | High-availability |
|
||||
| **PostgreSQL** | pg_advisory_lock / row lock | Custom backend |
|
||||
|
||||
#### State Backends Comparison
|
||||
|
||||
| Property | S3 + DynamoDB | Terraform Cloud | Consul |
|
||||
|----------|---------------|----------------|--------|
|
||||
| Cost | $ (S3 + DynamoDB) | $$ (free tier limited) | $$ (infra) |
|
||||
| Team workflow | GitHub Actions + OIDC | Native RBAC, runs | Custom |
|
||||
| Locking | DynamoDB | Built-in | Consul session |
|
||||
| History | S3 versioning | Full history, diff | None |
|
||||
| Remote ops | No (state only) | Yes (remote runs) | No |
|
||||
| Encryption | SSE-S3/KMS | At rest + in transit | TLS |
|
||||
|
||||
#### Workspaces vs Terragrunt
|
||||
|
||||
| Aspect | Terraform Workspaces | Terragrunt |
|
||||
|--------|---------------------|------------|
|
||||
| **State separation** | One backend, key: `env:/workspace` | Separate backend per env |
|
||||
| **Code reuse** | Same code, different variables | DRY configuration, modules |
|
||||
| **Risk** | Accidentally `apply` to wrong workspace | Isolated backends |
|
||||
| **When to use** | Simple projects, <5 envs | Microservices, multi-env, multi-team |
|
||||
| **Extra features** | — | Dependency, include, before_hook |
|
||||
|
||||
#### Provider Versioning
|
||||
|
||||
```hcl
|
||||
terraform {
|
||||
required_version = ">= 1.5, < 2.0"
|
||||
required_providers {
|
||||
aws = {
|
||||
source = "hashicorp/aws"
|
||||
version = "~> 5.0"
|
||||
}
|
||||
kubernetes = {
|
||||
source = "hashicorp/kubernetes"
|
||||
version = ">= 2.23"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- `~> 5.0` — only patch versions (5.x, x ≥ 0)
|
||||
- `>= 2.23, < 3.0` — any 2.x from 2.23
|
||||
- `~>` constraints prevent breaking changes in major/minor
|
||||
|
||||
### Terraform Workflow
|
||||
|
||||
```
|
||||
terraform init → Download provider modules
|
||||
terraform plan → Show changes
|
||||
terraform apply → Apply changes
|
||||
terraform destroy → Destroy infrastructure
|
||||
terraform validate → Syntax validation
|
||||
terraform fmt → Format HCL
|
||||
```
|
||||
|
||||
### State Management
|
||||
|
||||
- Remote state (S3, Terraform Cloud, Azure Storage)
|
||||
- State locking (DynamoDB, Consul)
|
||||
- Workspaces for environment separation
|
||||
|
||||
### Terraform: Up and Running (3rd ed.) — Yevgeniy Brikman
|
||||
|
||||
Practical guide to Terraform from the founder of Gruntwork. The 3rd edition (2022) adds over 100 pages of new content, updates from Terraform 0.12 to 1.2, and two new chapters.
|
||||
|
||||
#### What's New in the 3rd Edition
|
||||
|
||||
| New Feature | Description |
|
||||
|-------------|-------------|
|
||||
| **Chapter: Secrets management** | Managing secrets with Terraform — Vault, AWS Secrets Manager, KMS, OIDC, `sensitive` variables |
|
||||
| **Chapter: Multiple providers** | Working with multiple regions, accounts, clouds including Kubernetes (AWS EKS) |
|
||||
| **Terraform 1.0+** | Backward compatibility promise, stability, HashiCorp IPO |
|
||||
| **Provider versioning** | `required_providers` block + `terraform.lock.hcl` (lock file) |
|
||||
| **Module iteration** | `count` and `for_each` on modules (since Terraform 0.13) |
|
||||
| **Variable validation** | `validation {}` blocks, `precondition` / `postcondition` |
|
||||
| **Refactoring** | `moved` blocks — safe refactoring without manual state manipulation |
|
||||
| **CI/CD security** | OIDC authentication, isolated workers for `terraform apply` |
|
||||
|
||||
#### Secrets Management with Terraform
|
||||
|
||||
```hcl
|
||||
# Variable marked as sensitive — never shown in log
|
||||
variable "db_password" {
|
||||
type = string
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
# Reading secrets from AWS Secrets Manager
|
||||
data "aws_secretsmanager_secret" "db" {
|
||||
name = "production/db/master"
|
||||
}
|
||||
|
||||
data "aws_secretsmanager_secret_version" "db" {
|
||||
secret_id = data.aws_secretsmanager_secret.db.id
|
||||
}
|
||||
```
|
||||
|
||||
**Recommended Security Hierarchy:**
|
||||
1. **OIDC** — most secure, no creds on CI server (GitHub Actions → IAM role)
|
||||
2. **IAM role** — instance profile (EC2, ECS, EKS)
|
||||
3. **Environment variables** — limited, risk of log leakage
|
||||
4. **Isolated workers** — separate worker with admin permissions, API only `plan`/`apply`
|
||||
|
||||
#### Testing Terraform Code
|
||||
|
||||
| Layer | Tools | Description |
|
||||
|-------|-------|-------------|
|
||||
| **Static analysis** | `terraform validate`, `tflint`, `tfsec`, `checkov` | Code analysis without execution |
|
||||
| **Plan testing** | `conftest` + OPA (Rego), `terraform plan` parse | Plan validation against policy |
|
||||
| **Unit tests** | Terratest (Go), `terraform fmt`, `terraform validate` | Testing modules in isolation |
|
||||
| **Integration tests** | Terratest (Go) | Actual provisioning + assert |
|
||||
| **End-to-end tests** | Terratest | Full stack, smoke tests |
|
||||
|
||||
#### Policy Enforcement
|
||||
|
||||
```rego
|
||||
# OPA / conftest — deny public S3 bucket
|
||||
package main
|
||||
|
||||
deny[msg] {
|
||||
resource := input.resource_changes[_]
|
||||
resource.type == "aws_s3_bucket"
|
||||
resource.change.after.acl == "public-read"
|
||||
msg = sprintf("%s must not be public", [resource.address])
|
||||
}
|
||||
```
|
||||
|
||||
#### Production-grade Checklist by Brikman
|
||||
|
||||
1. **Small modules** — one module = one thing (single responsibility)
|
||||
2. **Composable modules** — modules can be composed into larger units
|
||||
3. **Testable modules** — each module has tests (Terratest)
|
||||
4. **Releasable modules** — versioning (Git tags, Terraform Registry)
|
||||
5. **Version control** — everything in git, including `.terraform.lock.hcl`
|
||||
6. **Remote state** — S3 + DynamoDB or Terraform Cloud
|
||||
7. **CI/CD pipeline** — `plan` on MR, `apply` after merge to main
|
||||
8. **Secrets management** — no secrets in plaintext in code
|
||||
9. **Policy as code** — OPA / Sentinel for compliance
|
||||
10. **Sandbox environment** — each developer has their own isolated environment
|
||||
|
||||
#### Golden Rule of Terraform
|
||||
|
||||
> **Master branch state must always be in sync with the production environment.**
|
||||
> Never run `terraform apply` manually locally on production — always via CI/CD.
|
||||
|
||||
## Dockerfile Best Practices
|
||||
|
||||
```dockerfile
|
||||
# Multi-stage build
|
||||
FROM node:22-alpine AS builder
|
||||
WORKDIR /app
|
||||
COPY package*.json ./
|
||||
RUN npm ci --only=production
|
||||
COPY . .
|
||||
RUN npm run build
|
||||
|
||||
# Runtime stage — distroless
|
||||
FROM gcr.io/distroless/nodejs22-debian12
|
||||
WORKDIR /app
|
||||
COPY --from=builder /app/dist ./dist
|
||||
COPY --from=builder /app/node_modules ./node_modules
|
||||
USER nonroot:nonroot
|
||||
EXPOSE 3000
|
||||
CMD ["dist/server.js"]
|
||||
```
|
||||
|
||||
**Rules**:
|
||||
- **Multi-stage build** — separate build tools from runtime
|
||||
- **Distroless images** — minimal attack surface (no shell, package manager)
|
||||
- **Non-root user** — USER nonroot (security best practice)
|
||||
- **Layer caching** — copy less-frequently changing files first (package.json → npm ci → code)
|
||||
- **Small base image** — Alpine (5 MB), distroless (minimal), scratch (Go static binary)
|
||||
- **Healthcheck** — HEALTHCHECK instruction for orchestrator
|
||||
- **Labels** — LABEL maintainer, version, git commit
|
||||
- **.dockerignore** — minimize build context
|
||||
|
||||
## Artifact Management
|
||||
|
||||
### Docker Registries
|
||||
|
||||
| Registry | Public/Private | Cost | Integration |
|
||||
|----------|---------------|------|-------------|
|
||||
| **Docker Hub** | Both | Public free, private $5/month | GitHub Actions, GitLab |
|
||||
| **ECR (AWS)** | Private | $0.10/GB/month + data transfer | IAM, ECS, EKS |
|
||||
| **GHCR (GitHub)** | Both | Public free, private 500 MB free | GitHub Actions, npm |
|
||||
| **GCR / Artifact Registry** | Private | $0.10/GB/month | GKE, Cloud Build |
|
||||
| **ACR (Azure)** | Private | $0.11/GB/month | AKS, Azure DevOps |
|
||||
| **Harbor** | Private (self-hosted) | Free (open source) | Custom, CNCF |
|
||||
|
||||
### Helm Charts
|
||||
|
||||
- **Repository** — index.yaml + chart .tgz on HTTP server (S3, GitHub Pages, ChartMuseum)
|
||||
- **OCI registry** — Helm 3.8+ supports storing charts in OCI registries (ECR, GHCR, Harbor)
|
||||
- **Versioning** — chart version (package) + app version (application)
|
||||
|
||||
### SBOM (Software Bill of Materials)
|
||||
|
||||
- **SPDX** / **CycloneDX** — standard SBOM formats
|
||||
- Generation: Trivy, Syft, grype
|
||||
- Use case: supply chain security, compliance (EO 14028, EU CRA)
|
||||
|
||||
## Configuration and Secrets
|
||||
|
||||
| Tool | Description |
|
||||
|------|-------------|
|
||||
| Vault (HashiCorp) | Dynamic secrets, encryption-as-a-service |
|
||||
| AWS Secrets Manager | Managed, auto-rotation |
|
||||
| Azure Key Vault | Managed, HSM support |
|
||||
| GCP Secret Manager | Managed |
|
||||
| SOPS | Encryption in git repos |
|
||||
| Sealed Secrets | Encrypted secrets for Kubernetes |
|
||||
|
||||
### Secret Management Workflows
|
||||
|
||||
**Vault Agent Injector** (Kubernetes)
|
||||
- Sidecar container (vault-agent) injects secrets into the pod
|
||||
- Secrets mounted as tmpfs volume (not into environment variables)
|
||||
- Auto-rotation: vault-agent periodically refreshes secrets
|
||||
|
||||
**External Secrets Operator** (Kubernetes)
|
||||
- CRD: `ExternalSecret` → creates `Secret` in K8s
|
||||
- Backend: AWS Secrets Manager, Azure Key Vault, GCP Secret Manager, Vault
|
||||
- Push-based refresh: change in external store → propagate to K8s
|
||||
|
||||
**Sealed Secrets**
|
||||
- `kubeseal` encrypts Secret on the cluster (controller has private key)
|
||||
- Encrypted manifest (SealedSecret) can be safely in git
|
||||
- Controller decrypts on deploy
|
||||
|
||||
## GitOps
|
||||
|
||||
- **Principle**: Git is the single source of truth
|
||||
- **Tools**: ArgoCD, Flux, Rancher Fleet
|
||||
- Pull-based deploy — agent in the cluster watches repo and applies changes
|
||||
- Auto-sync + drift detection
|
||||
|
||||
## Environment Promotion (dev → staging → prod)
|
||||
|
||||
```
|
||||
Code → Dev (auto-deploy) → Staging (auto + smoke tests) → Prod (manual approval + gating)
|
||||
```
|
||||
|
||||
**Quality Gates**:
|
||||
1. **Unit tests** — pass rate 100 %, code coverage ≥ 80 %
|
||||
2. **Integration tests** — all critical paths pass
|
||||
3. **SAST scan** — no critical/high vulnerabilities
|
||||
4. **SCA scan** — no known critical CVEs
|
||||
5. **Container scan** — all fixable vulns addressed
|
||||
6. **Smoke tests** — after staging deploy (health endpoint, basic flow)
|
||||
7. **Manual approval** — for production (optional with CD)
|
||||
|
||||
## Deployment Strategies
|
||||
|
||||
| Strategy | Description | Risk |
|
||||
|----------|-------------|------|
|
||||
| **Rolling update** | Gradual instance replacement | Low |
|
||||
| **Blue/Green** | Two identical environments, traffic switch | Medium |
|
||||
| **Canary** | % traffic to new version, gradual increase | Low |
|
||||
| **Feature flag** | Toggle feature on/off without deploy | Very low |
|
||||
| **A/B testing** | Different versions for different users | Low |
|
||||
|
||||
## Git Branching Strategies
|
||||
|
||||
| Strategy | Description | Suitable For |
|
||||
|----------|-------------|--------------|
|
||||
| **Trunk-based** | Single main branch, short feature branches (< 1 day) | CD, microservices, mature teams |
|
||||
| **GitHub Flow** | Main + feature branches, PRs, simple | Startups, web apps |
|
||||
| **GitLab Flow** | Main + environment branches (staging, prod) + feature branches | Enterprise, regulated |
|
||||
| **GitFlow** | Develop + main + feature/release/hotfix branches | Release-based, enterprise legacy |
|
||||
| **One Flow** | Simplified GitFlow (no develop branch) | Medium teams |
|
||||
|
||||
## Rollback Strategies
|
||||
|
||||
| Strategy | Description | Speed | Risk |
|
||||
|----------|-------------|-------|------|
|
||||
| **Forward fix** | New deploy with hotfix | Slow (build + deploy) | Low |
|
||||
| **Rollback (revert commit)** | Git revert, new deploy | Medium | Low |
|
||||
| **Blue/Green switchback** | Switch back to old version | Instant | DB incompatibility |
|
||||
| **Database rollback** | Revert DB migration (migrate down) | Slow | Data loss risk |
|
||||
|
||||
### Database Rollback Challenges
|
||||
|
||||
- **Breaking changes** — removing a column/table means rollback problem (data lost)
|
||||
- **Best practice**: Expand → Migrate → Contract (never remove in a single deploy)
|
||||
- **Tooling**: Flyway undo (limited), Liquibase rollback, pgroll (Postgres)
|
||||
- **Feature flags** as prevention — new code is behind a flag, rollback = disable flag
|
||||
|
||||
## CI/CD Design Patterns
|
||||
|
||||
Modern CI/CD pipelines solve recurring problems using design patterns:
|
||||
|
||||
| Pattern | Description |
|
||||
|---------|-------------|
|
||||
| **Pipeline as Code** | Pipeline defined in YAML/Kotlin DSL (`.gitlab-ci.yml`, `.github/workflows/`) |
|
||||
| **Immutable Pipeline** | Each build is an artifact, never changed |
|
||||
| **Quality Gate** | Branch protection, required checks, code coverage threshold |
|
||||
| **Deployment Strategy** | Blue/Green, Canary, Rolling (see table below) |
|
||||
| **GitOps** | Pull-based deploy with auto-sync and drift detection |
|
||||
| **Shift-Left Security** | SAST/DAST/SCA part of the pipeline |
|
||||
| **Dependency Caching** | Cache layer between pipeline runs |
|
||||
|
||||
## Shift Left Security
|
||||
|
||||
### SCA (Software Composition Analysis)
|
||||
| Tool | Type | Integration |
|
||||
|------|------|-------------|
|
||||
| **Dependabot** | GitHub native | GitHub, auto-PR for fix |
|
||||
| **Renovate** | Multi-platform | GitHub, GitLab, Bitbucket |
|
||||
| **Snyk** | SaaS + CLI | All platforms, Docker, IaC |
|
||||
| **Trivy** | CLI, OSS | CI/CD pipeline (GitHub Actions, GitLab) |
|
||||
|
||||
### SAST (Static Application Security Testing)
|
||||
| Tool | Languages | Characteristics |
|
||||
|------|-----------|----------------|
|
||||
| **Semgrep** | 30+ (Python, Java, Go, JS/TS) | Fast, custom rules, CI-native |
|
||||
| **SonarQube** | 30+ | Comprehensive, quality gates, tech debt |
|
||||
| **CodeQL** | 12 (C++, C#, Go, Java, JS/TS, Python) | GitHub native, query-based |
|
||||
| **Checkmarx** | 30+ | Enterprise, CxSAST, CxFlow |
|
||||
| **Fortify** | 30+ | Enterprise, SAST + DAST |
|
||||
|
||||
### Container Scanning
|
||||
| Tool | Description |
|
||||
|------|-------------|
|
||||
| **Trivy** | OSS, scans OS packages + language-specific + IaC |
|
||||
| **Grype** | OSS, from Anchore, fast, Syft for SBOM |
|
||||
| **Clair** | Red Hat, OSS, OCI-compatible |
|
||||
| **Docker Scout** | Docker Desktop / CLI, integration with Docker Hub |
|
||||
|
||||
## AI-Native Software Delivery (2025–2026)
|
||||
|
||||
AI is transforming DevOps 2.0:
|
||||
|
||||
- **AI-assisted CI/CD** — automatic pipeline failure diagnosis, resource allocation optimization
|
||||
- **Agent Control Protocol (ACP)** / **Model Context Protocol (MCP)** — standards for AI agent interaction with tooling
|
||||
- **AI-driven cost management** — FinOps cloud optimization
|
||||
- **Intelligent test selection** — ML determines which tests to run based on code changes
|
||||
- **Self-healing pipelines** — AI auto-detects and fixes common issues
|
||||
|
||||
New tools: Harness (AI-native CD), GitLab 19.0 (agentic MR workflows, secrets manager), Octopus Deploy.
|
||||
|
||||
## Pipeline Tools
|
||||
|
||||
- **GitHub Actions** — integrated with GitHub, large marketplace
|
||||
- **GitLab CI** — native in GitLab, auto DevOps
|
||||
- **Jenkins** — oldest, extensible, self-hosted
|
||||
- **CircleCI** — SaaS, fast
|
||||
- **Argo Workflows** — Kubernetes native
|
||||
- **Buildkite** — hybrid (own agents, SaaS orchestrator)
|
||||
|
||||
## Best Practices
|
||||
|
||||
- **Idempotent pipeline** — repeated runs give the same result
|
||||
- **Immutable infrastructure** — never modify a running server, always redeploy
|
||||
- **Shift left** — tests and security as early as possible in the pipeline
|
||||
- **Artifact management** — all builds versioned in registry (Docker Hub, ECR, GHCR)
|
||||
- **Dependency caching** — speed up pipeline (npm ci, pip cache, Docker layer caching)
|
||||
- **Fail fast** — pipeline fails as early as possible on error
|
||||
|
||||
## Resources
|
||||
|
||||
Links, books and standards: [sources/cicd/sources.md](sources/cicd/sources.md)
|
||||
|
||||
### Recommended Reading
|
||||
|
||||
| Book | Authors | ISBN | Key Contribution |
|
||||
|------|---------|------|-----------------|
|
||||
| The DevOps Handbook | Kim, Humble, Debois, Willis | 978-1942788003 | CALMS principles (Culture, Automation, Lean, Measurement, Sharing), flow map, deployment pipeline |
|
||||
| Continuous Delivery | Humble, Farley | 978-0321601912 | Deployment pipeline, commit stage, acceptance tests, capacity testing, zero-downtime release |
|
||||
| CI/CD Design Patterns | Bajpai, Schildmeijer, Piwosz, Mishra | 978-1-83588-965-7 | 30+ design patterns for CI/CD — pipeline patterns, GitOps, security, testing, deployment strategies |
|
||||
| DevOps Frameworks, Techniques, and Tools | Vijayakumaran, Kofler, Öggl, Springer | 978-1-4932-2670-2 | Framework for DevOps adoption, tool comparison (Jenkins vs GitLab vs GitHub Actions), techniques for monitoring and observability |
|
||||
|
||||
- **Quality gates** — automated checks before every promotion to the next environment
|
||||
- **Pipeline visibility** — dashboard with current status of all pipelines (GitHub, GitLab, ArgoCD)
|
||||
|
||||
## OpenStack CI/CD
|
||||
|
||||
OpenStack ecosystem uses its own CI/CD tools:
|
||||
|
||||
### Zuul
|
||||
|
||||
- CI/CD system developed by the OpenStack community (now standalone, used outside OpenStack)
|
||||
- **Gating** — changes are tested before merge (not after merge) — prevents breaking main branch
|
||||
- **Ansible-based** — jobs are Ansible playbooks
|
||||
- **Nodepool** — dynamic test VM allocation in the cloud (OpenStack, AWS)
|
||||
- **Pipeline** — check, gate, post, periodic, tag, release
|
||||
|
||||
### OpenStack Infra (OpenDev)
|
||||
|
||||
- Public CI infrastructure for OpenStack projects
|
||||
- Tools: Gerrit (code review), Zuul (CI), Nodepool (test nodes), Storyboard (issue tracking)
|
||||
- Base jobs: tempest (integration tests), grenade (upgrade tests), devstack-gate (gate tests)
|
||||
|
||||
### Integration with External Tools
|
||||
|
||||
- **Terraform** — OpenStack provider for provisioning (terraform-provider-openstack)
|
||||
- **Ansible** — openstack.cloud collection for managing OpenStack resources
|
||||
- **Packer** — build OpenStack images (openstack builder)
|
||||
- **Jenkins** — older CI, still used in some distributions
|
||||
|
||||
*Last revised: 2026-06-03*
|
||||
Reference in New Issue
Block a user