Fleet Management with Thresh Hub
Version: 1.6.0+ (Hub connectivity), 1.7.0+ (stack orchestration, mid-tier keys)
Time: 30 minutes
Difficulty: Advanced
Overview
Thresh Hub is a centralized management plane that gives you fleet-wide visibility across all your thresh nodes. Agents on each node connect to the Hub over SignalR WebSocket, streaming live metrics, environment status, and accepting remote commands — including stack orchestration.
Prerequisites
- thresh v1.7.0 installed on all nodes you want to manage
- A machine to run Thresh Hub (Windows or Linux, .NET 10 runtime or self-contained binary)
- PostgreSQL database for Hub persistence
- Network connectivity between nodes and Hub (default port 7200)
Architecture
Three-Tier Model
| Component | Role | Key Prefix |
|---|---|---|
| Thresh Hub | Web UI, API, PostgreSQL, fleet dashboard | — |
| Mid-Tier | Aggregates agent connections, routes commands | thresh_mid_* |
| Agent | Runs on each node, streams metrics, executes commands | thresh_live_* |
Key Types (v1.7.0)
Thresh Hub uses two distinct API key types for security isolation:
| Key | Format | Purpose |
|---|---|---|
| Agent key | thresh_live_<account>_<secret> | Node agent → Mid-tier connection |
| Mid-tier key | thresh_mid_<account>_<secret> | Mid-tier → Hub API calls |
Agent keys cannot call mid-tier management APIs, and vice versa. This prevents a compromised node from escalating to fleet management operations.
Setting Up Thresh Hub
1. Deploy the Hub
# Clone and build
git clone https://github.com/dealer426/thresh-hub.git
cd thresh-hub/src/ThreshHubV2
# Configure database connection
# Edit appsettings.json:
# "ConnectionStrings": { "DefaultConnection": "Host=localhost;Database=threshhub;Username=hubuser;Password=..." }
# Run
dotnet run
The Hub starts on port 7200 by default. Access the dashboard at https://your-hub:7200.
2. Generate API Keys
Log into the Hub web UI and navigate to Settings → API Keys:
- Agent key — For nodes to connect:
thresh_live_<account>_<secret> - Mid-tier key — For the mid-tier service:
thresh_mid_<account>_<secret>
3. Deploy the Mid-Tier
# Clone and build
git clone https://github.com/dealer426/thresh-midtier.git
cd thresh-midtier/src/ThreshMidTier
# Configure (appsettings.json):
# "Hub": { "Url": "https://your-hub:7200", "ApiKey": "thresh_mid_<account>_<secret>" }
# Run
dotnet run
The mid-tier connects to the Hub and begins accepting agent connections.
Connecting Nodes
1. Configure the Agent
On each node you want to manage:
# Set the mid-tier URL (agents connect to mid-tier, not directly to Hub)
thresh agent config set midtier-url https://your-midtier:5000
# Set the agent API key
thresh agent config set api-key thresh_live_<account>_<secret>
# For self-signed certs in dev
thresh agent config set tls-verify false
2. Start the Agent
thresh agent start
3. Verify Connection
thresh agent status
Agent Status
────────────────────────────────────────
Agent ID: 5f6d5891-76d2-466f-a33f-7b87acb17653
Status: Connected ✓
Hub URL: https://192.168.4.85:7200
Transport: SignalR
Uptime: 2h 14m
Last Report: 28 seconds ago
The node should also appear in the Hub web dashboard within seconds.
What the Hub Shows
For each connected node, the Hub dashboard displays:
| Metric | Description |
|---|---|
| Status | Online / Offline with last-seen timestamp |
| CPU | Real-time CPU utilization |
| Memory | Used / Total RAM |
| Storage | Disk usage |
| Containers | Running container count |
| Environments | List of thresh-managed environments |
| Agent Version | thresh version and platform |
| Node Name | Custom name or hostname |
Metrics stream at a configurable interval (default: 30 seconds).
Remote Deployment & Management
With agents connected, you can manage fleet nodes and deploy to them remotely using CLI commands:
Node Management
# Authenticate with your Hub
thresh auth login --hub https://your-hub:7200
# List all connected nodes
thresh node list
# View details for a specific node
thresh node info thresh-node-1
# Check real-time metrics
thresh node metrics thresh-node-1
# Deploy a blueprint to a remote node
thresh node up thresh-node-1 python-dev --name ml-training
# List available blueprints on a node
thresh node blueprints thresh-node-1
Cluster Management
# Create a cluster to group related nodes
thresh cluster create staging --description "Staging environment"
# Add nodes to the cluster
thresh cluster add-node staging thresh-node-1
thresh cluster add-node staging thresh-node-2
# View cluster details
thresh cluster info staging
# Remove a node from the cluster
thresh cluster remove-node staging thresh-node-2
Stack Deployment (Hub-Managed)
For multi-service stacks with dependency ordering, deploy through the Hub UI or API:
# Get an auth token for API calls
TOKEN=$(thresh auth token)
# Deploy a stack to a target node
curl -X POST https://your-hub:7200/api/stacks/deploy \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d @webapp.json
# List deployed stacks
curl -H "Authorization: Bearer $TOKEN" \
https://your-hub:7200/api/stacks
See the Deploying Stacks tutorial for full details on stack definitions and deployment.
Transport & Resilience
SignalR WebSocket
Agents maintain a persistent WebSocket connection for low-latency bi-directional communication. If the connection drops:
- Agent detects the disconnect
- Waits the configured
ReconnectDelay(default: 5s) - Reconnects automatically
- Resumes metrics streaming
REST Fallback
For networks that block WebSocket connections, agents fall back to REST API polling:
thresh agent config set transport rest
| Transport | Protocol | Latency | Best For |
|---|---|---|---|
auto | SignalR → REST | Lowest | Default |
signalr | WebSocket only | Lowest | Trusted networks |
rest | HTTP polling | Higher | Restricted networks |
High Availability
Configure a failover Hub for mission-critical setups:
thresh agent config set fallback-url https://backup-hub:7200
thresh agent config set auto-failover true
TLS Configuration
Production
Use valid TLS certificates on the Hub. Agents verify certificates by default.
Development
For self-signed certs on private networks:
thresh agent config set tls-verify false
Hub Behind Reverse Proxy
If the Hub runs behind nginx or Traefik, disable internal HTTPS:
Set Kestrel:DisableHttps=true in appsettings.json and terminate TLS at the reverse proxy.
Only disable TLS verification in trusted, private networks. Always use valid certificates in production.
Stale Agent Cleanup
The Hub automatically prunes agents that haven't reported within a configurable window (default: 24 hours). This keeps the dashboard clean when nodes go offline permanently.
The mid-tier also batches metrics from multiple agents for efficient delivery to the Hub, reducing database write load.
Troubleshooting
Agent Won't Connect
- Check network: Can the node reach the mid-tier URL?
curl -k https://your-midtier:5000/health - Check API key: Is the key a
thresh_live_*key (notthresh_mid_*)? - Check TLS: If using self-signed certs, is
tls-verifyset tofalse?
Agent Shows "Disconnected"
- Check
thresh agent statuson the node - Restart the agent:
thresh agent stop && thresh agent start - Check Hub logs for authentication failures
Mid-Tier Auth Errors (403)
The mid-tier requires a thresh_mid_* key. If you see 403 errors:
- Verify the key type in
appsettings.jsonstarts withthresh_mid_ - Regenerate the key in the Hub UI if needed
Next Steps
- Agent CLI Reference — Agent command documentation
- Auth CLI Reference — Hub authentication commands
- Node CLI Reference — Remote node management commands
- Cluster CLI Reference — Organize nodes into clusters
- Stacks Tutorial — Multi-service deployment through Hub
- Blog: Fleet Management Patterns — Real-world fleet architectures