Architecture
Overview
Gubernator is an orchestrator designed to offer the simplicity of Docker Swarm with the flexibility of Nomad. It uses a centralized manager pattern with resilient edge workers and a built-in local executor that allows a single node to run a complete cluster.
Component Map
graph TD
subgraph "Manager Node (gbnt serve)"
API["REST API :4000\n(Bearer Token Auth)"]
WEB["Web UI :4001\n(Basic Auth)"]
TEL["Telemetry :4002\n(Swagger / Health / Metrics)"]
SCHED["Scheduler\n(Constraint Matching)"]
DB["SQLite\n(Stacks / Services / Tasks / Nodes)"]
EXEC["Local Executor\n(Built-in, polls every 5s)"]
AQ["Aqueducts\n(CoreDNS + Caddy writer)"]
end
subgraph "Worker Node (gbnt legion join)"
WORKER["Remote Executor\n(polls /v1/node/tasks)"]
end
subgraph "Docker Host"
DOCKER["Docker Engine\n(/var/run/docker.sock)"]
COREDNS["CoreDNS\n(:5353)"]
CADDY["Caddy\n(:80 / :443)"]
end
CLI["gbnt CLI"] -->|"POST /v1/stack/deploy"| API
API --> SCHED
SCHED --> DB
EXEC -->|"polls pending tasks"| DB
EXEC -->|"docker run -p -e -v"| DOCKER
WORKER -->|"GET /v1/node/tasks"| API
WORKER -->|"docker run -p -e -v"| DOCKER
EXEC --> AQ
AQ -->|"writes gubernator.hosts"| COREDNS
AQ -->|"writes Caddyfile"| CADDY
WEB -->|"reads state"| DB
TEL -->|"exposes metrics"| API
Port Architecture
| Port | Service | Authentication | Purpose |
|---|---|---|---|
:4000 |
REST API | Authorization: Bearer <GBNT_API_TOKEN> |
All CLI and management operations |
:4001 |
Web UI | HTTP Basic Auth | Dashboard, compose editor, lifecycle management |
:4002 |
Telemetry | Public (internal use) | Prometheus metrics, Swagger, /health |
Security model: Ports 4000 and 4001 are secured. Port 4002 is intentionally public for internal monitoring scraping but should be firewalled from external traffic.
Core Components
1. Manager (The Senate)
- Exposes the secured REST API (
:4000) for CLI and SDK communication - Hosts the Flutter Web Dashboard (
:4001) for visual cluster management - Maintains global cluster state using SQLite (via GORM)
- Runs the Scheduler to match service constraints against node labels
- Runs the Local Executor to run containers directly without needing a separate worker
2. Worker (The Centurions)
- The same
gbntbinary in worker mode (legion join) - Polls the Manager API every 5s for pending tasks assigned to its node ID
- Communicates with its local Docker Engine to pull images and start containers
- Reports container IP and status back to the Manager
- Sends heartbeats every 10s to keep its
status=activein the DB
3. Local Executor (New in v1.3.27)
The built-in executor runs inside the Manager process, enabling true single-node deployments:
Manager starts → goroutine launched
every 5s:
SELECT tasks WHERE node_id='node-local-manager' AND status='pending'
→ docker pull <image>
→ docker run -d -p <ports> -e <env> -v <volumes> <image>
→ UPDATE task SET status='running', container_ip=..., container_name=...
→ regenerate CoreDNS hosts + Caddyfile
4. Ingress & DNS (The Aqueducts)
As containers start (via either executor), Gubernator writes two files:
| File | Used by | Format |
|---|---|---|
gubernator.hosts |
CoreDNS hosts plugin |
<IP> <service>.<stack>.gbnt |
Caddyfile |
Caddy | Reverse-proxy rules from ingress.host labels |
5. Observability (The Watchtowers)
Port 4002 exposes:
- GET /metrics — Prometheus-format metrics (nodes, tasks, memory)
- GET /health — JSON health check ({"status":"healthy"})
- GET /swagger/index.html — Interactive Swagger API explorer
6. SRE Monitor Stack (gbnt monitor init)
A built-in, one-command SRE monitoring deployment that creates a dedicated Docker network (gbnt-monitor-net) and launches:
| Container | Image | Port | Purpose |
|---|---|---|---|
gbnt-monitor-cadvisor |
gcr.io/cadvisor/cadvisor |
:8081 |
Container resource metrics |
gbnt-monitor-prometheus |
prom/prometheus |
:9090 |
Metrics collection & scraping |
gbnt-monitor-grafana |
grafana/grafana |
:3000 |
Dashboards (Prometheus + Loki datasources pre-configured) |
gbnt-monitor-loki |
grafana/loki |
:3100 |
Log aggregation |
gbnt-monitor-promtail |
grafana/promtail |
— | Log shipping (Docker + system logs → Loki) |
Data Flow:
cAdvisor ──metrics──→ Prometheus ──→ Grafana
Gubernator :4002 ──metrics──→ Prometheus ──→ Grafana
Promtail ──logs──→ Loki ──→ Grafana
Config files are auto-generated in ~/.gbnt/monitor/ and can be customized.
Data Model
Nodes
├── ID, IP, Role (manager|worker), Status (active|down|drain)
└── Labels (JSON) — e.g. {"gbnt.node.gpu": "nvidia"}
Stacks
├── ID, Name
└── RawComposeFile (full YAML stored for edit/redeploy)
Services
├── ID, StackID, Name, Image
├── DesiredReplicas
├── Ports, Env, Volumes, Command ← passed directly to docker run
└── Constraints (JSON array)
Tasks
├── ID, ServiceID, NodeID
├── Status (pending|starting|running|dead)
├── ContainerName ← used for docker stop / docker rm
└── ContainerIP
Deployment Workflow
1. CLI: gbnt stack deploy -c compose.yml mystack
↓
2. API: POST /v1/stack/deploy (with Bearer token)
↓
3. Parse YAML → extract services (image, ports, env, volumes, replicas, constraints)
↓
4. Store: Stack → Services → scheduleService()
↓
5. Scheduler: match constraints against active nodes → create Tasks (status=pending)
↓
6. Executor (local or remote worker):
a. docker pull <image>
b. docker run -d --name gbnt-<taskID> -p ... -e ... -v ... <image> [command]
c. POST /v1/node/tasks/<taskID>/status {status:running, container_ip:..., container_name:...}
↓
7. Aqueducts: write gubernator.hosts + Caddyfile
↓
8. Done: container running, DNS resolvable, Ingress active
Security Model
┌─────────────────────────────────────────────────────────┐
│ Port 4000 — API │
│ Middleware: Authorization: Bearer <GBNT_API_TOKEN> │
│ Default token: "admin" (change in production!) │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ Port 4001 — Web UI │
│ Middleware: HTTP Basic Auth │
│ Credentials: GBNT_WEB_USER / GBNT_WEB_PASSWORD │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ Port 4002 — Telemetry │
│ No authentication (intended for internal scraping) │
│ Firewall recommended for production │
└─────────────────────────────────────────────────────────┘