Files
provisioning/README.md
Stanislav Hubacek 1ae35153b0 first commit
2026-04-07 22:42:42 +02:00

548 lines
12 KiB
Markdown

# Homelab Provisioning Portal — MVP Spec
## Goal
A small internal web application that lets me provision Proxmox LXC containers and selected VMs through a simple web form, then optionally integrate them into my existing homelab workflow:
- Proxmox provisioning
- initial package/profile setup
- Git-backed Caddy config generation
- optional DNS integration
- optional monitoring integration
The portal is not meant to be a generic public PaaS. It is an internal operator tool for a controlled homelab environment.
---
## Core principles
### 1. GUI for input, API-driven execution
The user experience should be a simple web form, but the backend should orchestrate everything through APIs, scripts, and structured jobs.
### 2. Profiles over chaos
Instead of exposing every package and configuration toggle directly, the app should use a small number of workload profiles.
### 3. Git as source of truth where possible
Reverse proxy configuration should not be edited directly. The portal should write service definitions into the Git repo and let the existing deployment pipeline handle Caddy.
### 4. Jobs, not blocking requests
Provisioning should run as background jobs with visible status, logs, and clear success/failure states.
### 5. Safe defaults
The portal should prefer prevalidated combinations of:
- node
- storage
- template
- network mode
- package sets
- exposure rules
---
## MVP scope
### In scope
- create LXC containers
- create from a selected profile
- set hostname, CPU, RAM, disk, node, storage, network mode
- optional static IP or DHCP
- optional package installation through profile provisioning
- optional generation of Caddy service YAML in Git repo
- job history and logs
### Out of scope for MVP
- full VM lifecycle
- Terraform integration
- approvals/workflows
- user accounts / RBAC beyond a single trusted operator
- advanced DNS API automation
- full secrets management UI
- rollback orchestration across all systems
---
## Profiles for MVP
### 1. Static Website
Purpose:
- simple web presentations / landing pages / personal sites
Provisioning:
- Debian LXC
- nginx or apache installed
- document root prepared
- optional domain
- optional Caddy service YAML generated
### 2. Generic Service
Purpose:
- small internal services
Provisioning:
- Debian LXC
- base utilities
- optional Docker
- optional node exporter
- no public exposure by default
### 3. Monitoring Node
Purpose:
- internal monitoring or utility workloads
Provisioning:
- Debian LXC
- node exporter
- curl / git / basic tools
- no public exposure by default
### 4. Web App
Purpose:
- internal or public HTTP application
Provisioning:
- Debian LXC
- web server or runtime preset
- optional Caddy service YAML
- optional health endpoint setting stored in service definition
---
## User flow
### Step 1 — Choose workload type
- LXC
- VM (disabled or marked as coming later in MVP)
### Step 2 — Choose profile
- Static Website
- Generic Service
- Monitoring Node
- Web App
### Step 3 — Basic settings
- hostname
- description
- target node
- storage
- CPU cores
- RAM (MB)
- disk size (GB)
- start after creation (yes/no)
### Step 4 — Network
- DHCP / Static IP
- static IP address (if selected)
- gateway (optional)
- bridge
- VLAN tag (optional)
### Step 5 — Service integration
- domain name (optional)
- expose via Caddy (yes/no)
- public or internal service
- enable monitoring (yes/no)
### Step 6 — Review
Show:
- chosen node
- chosen profile
- resources
- network settings
- generated service definition preview if relevant
- list of post-provision actions
### Step 7 — Create job
- portal stores job
- worker executes it
- UI shows progress and logs
---
## Proposed architecture
## Frontend
Simple server-rendered UI.
Recommended:
- FastAPI + Jinja templates
- minimal JS only where needed
Why:
- lightweight
- easy to deploy in LXC
- low maintenance
- fast iteration
## Backend
FastAPI application with:
- form handling
- validation
- job creation
- background execution
- job log viewing
## Persistence
SQLite for MVP.
Tables:
- jobs
- job_logs
- profiles
- created_resources (optional for tracking)
## Execution layer
The backend should call a dedicated internal execution layer, not shell out from route handlers directly.
Modules:
- proxmox.py
- gitops.py
- provision.py
- jobs.py
- models.py
---
## Backend modules
### proxmox.py
Responsibilities:
- create LXC
- optionally start LXC
- poll for completion
- retrieve created CTID / details
Possible implementation paths:
- Proxmox API directly
- wrapper around `pct` CLI if app runs on a trusted node
Recommendation:
- use Proxmox API for long-term cleanliness
- CLI wrapper acceptable for MVP if simpler in your environment
### provision.py
Responsibilities:
- post-create setup inside container
- install profile packages
- prepare directories
- optionally write default nginx/apache config
Implementation options:
- SSH / pct exec
- Ansible later
Recommendation for MVP:
- use `pct exec` from a trusted node if the app runs close to Proxmox
- move to Ansible later if desired
### gitops.py
Responsibilities:
- write service YAML definitions into repo
- validate generated content
- git add / commit / push
Important:
- this module should only touch approved repo paths
- no arbitrary file writes from user input
### jobs.py
Responsibilities:
- create job records
- store state transitions
- append logs
- expose job status to UI
States:
- pending
- running
- succeeded
- failed
---
## Data model (MVP)
### Job
- id
- created_at
- started_at
- finished_at
- type
- profile
- payload_json
- status
- summary
- result_json
- error_message
### JobLog
- id
- job_id
- timestamp
- level
- message
### ProfileDefinition
Can be hardcoded initially or stored in code/YAML.
Suggested profile fields:
- name
- description
- supported_kind
- default_packages
- allow_public_exposure
- allow_domain
- default_web_server
- post_create_actions
---
## Validation rules
### General
- hostname required, restricted format
- RAM / disk / CPU must be from allowed ranges
- node must be from approved node list
- storage must be from approved storage list
- profile must be from approved profile list
### Network
- if static IP selected, IP must be valid
- optional uniqueness check against known reserved IPs
### Domain/Caddy
- if expose via Caddy is enabled, domain required
- domain uniqueness check against existing service definitions
- backend IP generated from created container, not manually trusted from form
### Safety
- never accept arbitrary shell commands from UI
- never allow arbitrary package strings in MVP
- use whitelisted package sets by profile
---
## Caddy integration design
When a workload is marked as web-exposed, the portal should generate a service YAML in the existing homelab repo.
Example output:
```yaml
name: app.example.com
type: proxy
domain: app.example.com
headers: true
auth: false
backend: 192.168.50.210:80
```
For static website profile with an app server in the container, use proxy mode as well, since the architecture should keep services containerized.
Portal should then:
1. write YAML
2. git add / commit / push
3. rely on existing Git webhook deployment to update Caddy
---
## Monitoring integration
For MVP, monitoring integration can be lightweight.
Options:
- install node exporter in container for selected profiles
- add a label/tag to the resource for later tracking
- optionally note in job result whether monitoring was enabled
Prometheus target automation can come later unless you already have a standard discovery mechanism.
---
## UI pages for MVP
### 1. Dashboard
- recent jobs
- quick create button
- summary counts
### 2. New workload form
- multi-step or grouped single-page form
### 3. Job detail page
- job status
- timestamps
- execution logs
- generated outputs
### 4. Profiles page (optional)
- read-only overview of supported profiles
---
## Recommended tech stack
### App runtime
- Python 3
- FastAPI
- Jinja2
- SQLite
- Uvicorn
### Optional frontend enhancement
- HTMX for better UX without building a full SPA
### Deployment target
- dedicated LXC container
- internal-only exposure via Caddy or VPN
---
## Security model
### Access
- internal-only
- preferably behind VPN or internal-only reverse proxy
### Secrets
Keep these outside Git:
- Proxmox API token
- Gitea token or SSH deploy key
- optional DNS API tokens
Use environment file or systemd environment variables.
### Permissions
Use a restricted Proxmox API token rather than root password where possible.
---
## What needs to be prepared first
## A. Proxmox preparation
- choose whether MVP uses API or local CLI
- prepare at least one approved LXC template
- define approved nodes
- define approved storage targets
- define approved network bridge(s)
- define CTID strategy (manual range or automatic lookup)
## B. Git / Caddy preparation
- confirm repo path and branch for service definitions
- confirm exact YAML schema used by current generator
- define naming convention for generated files
- define whether portal commits directly to main or a staging branch
## C. Host/container provisioning preparation
- decide which base OS template to use for LXC
- define profile package lists
- define default web server choice (nginx or apache)
- decide whether provisioning happens through `pct exec` or SSH
## D. App hosting preparation
- create dedicated LXC for the portal
- internal DNS name, e.g. `portal.hubacek.cloud`
- internal-only reverse proxy rule
- prepare environment file for secrets
---
## Concrete preparation checklist
### 1. Decide execution method
Choose one:
- Proxmox API only
- Proxmox API for create + `pct exec` for post-provision
- local CLI wrapper (`pct`, `qm`) if app runs on a Proxmox node
Recommended MVP:
- Proxmox API for create
- `pct exec` for post-provision if app runs on a trusted node
### 2. Define profile presets
Prepare final values for:
- Static Website
- Generic Service
- Monitoring Node
- Web App
For each profile define:
- template
- packages
- default ports
- whether Caddy integration is allowed
- whether monitoring is installed
### 3. Define allowed infrastructure values
Prepare lists for:
- nodes
- storage
- bridges
- VLAN choices if any
- RAM options
- disk options
- CPU options
### 4. Prepare secrets
Need:
- Proxmox API token
- Gitea credential/token or deploy key
### 5. Decide repo write behavior
Options:
- portal commits directly to main
- portal writes a branch and you merge manually
Recommended MVP:
- direct commit to main if only you use it
---
## Proposed MVP implementation order
### Phase 1
- app skeleton
- SQLite job model
- create LXC with Generic Service profile
- show job logs
### Phase 2
- add Static Website and Web App profiles
- add package installation / post-provision steps
- add Caddy Git integration
### Phase 3
- add Monitoring profile
- add better job output and validation
- add optional DNS integration
### Phase 4
- add VM workflow
- add cloud-init templates
- add role-based access or approvals if ever needed
---
## Immediate next actions
1. Finalize execution strategy
2. Finalize profile definitions
3. Finalize allowed infrastructure options
4. Create dedicated portal container
5. Add Proxmox and Gitea credentials outside Git
6. Build MVP backend skeleton
---
## Recommended execution strategy for your homelab
For your current setup, the most practical MVP is:
- portal app in dedicated LXC
- internal-only exposure
- Proxmox API for provisioning
- Git repo write for Caddy service definitions
- existing webhook pipeline deploys proxy config
- post-provision through trusted remote execution later, if not in MVP day one
This keeps the system aligned with what you already have working instead of replacing it.