A roadmap for running multiple AI models, open-source agents, and automated workflows on a single VPS — using tools that already exist.
In "From Hierarchy to Intelligence" (April 2026), Dorsey and Roelof Botha described how Block is restructuring around AI. The essay got 5 million views in 48 hours. The framework is useful. The implementation details are scarce.
┌─────────────────────────────────────────────────────┐ │ Layer 4 · Surfaces Where you interact │ ├─────────────────────────────────────────────────────┤ │ Layer 3 · Intelligence Agents that decide │ ├─────────────────────────────────────────────────────┤ │ Layer 2 · World Model Memory that persists │ ├─────────────────────────────────────────────────────┤ │ Layer 1 · Capabilities The models themselves │ └─────────────────────────────────────────────────────┘
We've been running a version of this for months. Each layer maps to open-source tools you can deploy today.
Brand, growth, engineering, and operations — each with its own personality, skills, and scope.
Not locked to one API. Routes tasks to the best model for the job — open-source and proprietary.
Cron jobs, scheduled reports, content pipelines, monitoring — running without human intervention.
A single Linux server is enough. We use a VPS with 4 vCPUs and 16GB RAM. That runs the agent, the databases, the cron scheduler, and the message gateway — all at once.
Your VPS handles orchestration, memory, routing, and scheduling. The models run through API providers — no GPU required on your end.
Security matters from day one. The moment your VPS is online, automated bots start scanning it. Lock it down before you install anything else.
Full guide: VPS security for beginners — covers SSH hardening, firewalls, fail2ban, Tailscale, Docker isolation, backups, and supply chain basics.
# What your VPS needs Ubuntu 22.04+ / Debian 12+ ├── Docker + Docker Compose ├── Node.js 20+ ├── Python 3.11+ ├── Git ├── UFW (firewall) ├── fail2ban └── Tailscale (optional, recommended) # What it runs Hermes Agent ├── Message gateway (Telegram, Discord) ├── Cron scheduler ├── Memory store (SQLite) ├── Session history ├── Skills engine └── Model router → API providers
Model diversity is the core idea. Different tasks need different strengths. A coding task wants a model that writes clean code. A summarisation task wants speed. A creative brief wants nuance.
Instead of paying one provider for everything, you route tasks to the right model.
# How model routing works User message │ ▼ ┌──────────────┐ │ Main agent │ ← strong model │ (reasoning) │ handles the conversation └──────┬───────┘ │ ├── delegate_task() → subagent │ └── cheaper model, isolated context │ ├── cronjob → background task │ └── fast model, scheduled runs │ └── code execution └── code-specialist model
Open-source models like Llama, Mistral, Qwen, and DeepSeek handle most background tasks — and they're free or near-free to run. You route expensive API calls to proprietary models only when the task demands it.
OpenCode-Go is a Go-based CLI that provides an OpenAI-compatible API for local and remote open-source models. It acts as a bridge: your agent talks to it like any other model provider, and it routes to whichever open-source model you've configured.
# The model stack ┌─────────────────────────────────┐ │ Hermes Agent │ │ (routes tasks to models) │ └────────┬──────────┬─────────────┘ │ │ ┌────▼────┐ ┌──▼──────────┐ │ Propri- │ │ OpenCode- │ │ etary │ │ Go │ │ APIs │ │ │ ├─────────┤ ├─────────────┤ │ Claude │ │ Llama 4 │ │ GPT-4o │ │ Mistral │ │ Gemini │ │ Qwen 2.5 │ │ mimo │ │ DeepSeek │ └─────────┘ └─────────────┘ # config.yaml snippet providers: opencode-go: type: openai-compatible base_url: http://localhost:8080/v1 models: - llama-4-scout - qwen-2.5-coder-32b
Dorsey calls it the "world model" — the data structure that lets AI understand your specific business. He's right about the concept. The implementation is simpler than he makes it sound.
In our stack, memory works at three levels:
SQLite and FTS5 handle the indexing. The agent writes memories proactively — you don't have to tell it to remember.
# Memory architecture
┌────────────────────────────────────┐
│ Session Context │
│ (current conversation window) │
└──────────────┬─────────────────────┘
│
┌──────────▼──────────┐
│ Persistent Memory │
│ (SQLite, FTS5) │
│ │
│ user/ — who you │
│ memory/ — notes │
│ skills/ — procedures│
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Long-term Memory │
│ (semantic search) │
│ │
│ Past sessions │
│ Decisions made │
│ Patterns learned │
└─────────────────────┘
One agent doing everything is a chatbot with extra steps. The better approach is role-based agents with clear boundaries.
Each agent gets:
When agents need to collaborate, they use a structured handoff — not an open-ended conversation. Bounded exchanges. Escalation paths. No bot-to-bot loops.
# Agent roles Zuri (Brand & Growth) ├── Content strategy ├── Copywriting & editing ├── Campaign planning ├── Competitor intelligence └── Social media operations Froggy (Engineering) ├── Code implementation ├── Infrastructure & DevOps ├── Debugging & testing ├── Deployment & CI/CD └── Technical discovery Cron Agents (Background) ├── Daily briefings ├── Monitoring & alerts ├── Content pipelines ├── Data collection └── Scheduled reports Delegated Agents (On-demand) ├── Research tasks ├── Code review ├── Web scraping └── Document processing
The main agent makes decisions. When a task comes in, it decides: do I handle this myself, delegate it to a subagent, schedule it for later, or ask the human?
Delegation is the key pattern. The main agent spawns a subagent with a clear goal, the right tools, and enough context. The subagent works in isolation and returns a summary. The main agent reviews and delivers.
This is how you scale without adding headcount. One conversation can trigger three parallel workstreams, each running on a different model, each returning results the main agent synthesises.
The agent decides when to delegate. You don't have to micromanage the routing — that's what the personality and policy files are for.
# How delegation flows
User: "Research competitors,
draft a blog post, and
check our site uptime"
│
▼
Main Agent
(reads context, plans)
│
├──→ Subagent A
│ [web research]
│ model: fast/cheap
│ returns: summary
│
├──→ Subagent B
│ [content writing]
│ model: creative
│ returns: draft .md
│
└──→ Subagent C
[terminal commands]
model: code-focused
returns: uptime report
│
▼
Main Agent
(reviews, synthesises,
delivers to user)
Our agents live on Telegram and Discord — the platforms we already use every day. No custom dashboard to check. No new app to learn.
The gateway is a message router. It connects to your platforms, receives messages, routes them to the right agent, and delivers the response back. Same message, same thread, same conversation flow you're used to.
The messaging platforms are the frontend. You build the agent, not the UI.
# Message flow ┌──────────┐ ┌──────────────┐ │ Telegram │────▶│ │ └──────────┘ │ │ ┌──────────┐ │ Gateway │ ┌──────────┐ │ Discord │────▶│ (router) │────▶│ Agent │ └──────────┘ │ │ │ Engine │ ┌──────────┐ │ │ └──────────┘ │ CLI │────▶│ │ │ └──────────┘ └──────────────┘ │ ▼ ┌──────────────┐ │ Response │ │ delivered │ │ back to │ │ same chat │ └──────────────┘ # What the gateway handles • Platform auth (OAuth, bot tokens) • Message parsing & routing • Media attachments (images, files) • Thread/topic context • Rate limiting • Multi-agent dispatch
┌─────────────────────────────────────────────────────────────────────┐ │ YOUR VPS │ │ │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ Layer 4 Telegram · Discord · CLI · Web │ │ │ └──────────────────────────┬──────────────────────────────────┘ │ │ │ │ │ ┌──────────────────────────▼──────────────────────────────────┐ │ │ │ Layer 3 Hermes Agent Engine │ │ │ │ ├── Zuri (brand & growth) │ │ │ │ ├── Froggy (engineering) │ │ │ │ ├── Cron scheduler │ │ │ │ └── Delegation engine │ │ │ └──────────────────────────┬──────────────────────────────────┘ │ │ │ │ │ ┌──────────────────────────▼──────────────────────────────────┐ │ │ │ Layer 2 SQLite + FTS5 · Session store · Memory index │ │ │ └──────────────────────────┬──────────────────────────────────┘ │ │ │ │ │ ┌──────────────────────────▼──────────────────────────────────┐ │ │ │ Layer 1 Model Router │ │ │ │ ├── Proprietary APIs (Claude, GPT, Gemini) │ │ │ │ └── OpenCode-Go (Llama, Mistral, Qwen) │ │ │ └─────────────────────────────────────────────────────────────┘ │ │ │ │ Docker · UFW · fail2ban · Tailscale · Backups │ └─────────────────────────────────────────────────────────────────────┘
# Quick start commands # 1. Secure your server sudo ufw default deny incoming sudo ufw allow 22/tcp sudo ufw enable sudo apt install fail2ban # 2. Install Docker curl -fsSL https://get.docker.com | sh sudo usermod -aG docker $USER # 3. Clone Hermes git clone https://github.com/nousresearch/hermes-agent cd hermes-agent # 4. Configure cp config.example.yaml config.yaml # Add your API keys and platform tokens # 5. Launch docker compose up -d # 6. Set up OpenCode-Go # for open-source model access go install github.com/opencode-ai/opencode@latest opencode serve --port 8080
| Component | What you use | Monthly cost |
|---|---|---|
| VPS | 4 vCPU / 16GB RAM | $20–40 |
| Hermes Agent | Open-source, self-hosted | $0 |
| OpenCode-Go | Open-source models via API | $0 |
| Proprietary models | Claude / GPT / Gemini APIs | $10–100 |
| Telegram / Discord | Bot APIs (free tier) | $0 |
| Domain (optional) | Custom URL for web access | $1 |
Total: $30–140/month for a full AI-native operation. Compare that to SaaS platforms charging per seat, per workflow, per integration. The models are the only recurring cost that scales with usage — and open-source models keep that cost near zero for background tasks.
# The stack at a glance Hermes Agent → orchestration OpenCode-Go → open-source models SQLite + FTS5 → memory & search Docker → isolation UFW + fail2ban → security Tailscale → private networking Telegram/Discord → your interface Cron → automation Skills → reusable procedures # Total cost of entry Time: one weekend Money: ~$30/month Lock-in: zero
Multiple models. Open-source tools. One VPS. Agents with clear roles. Memory that persists. Automation that runs while you sleep.