Open-source AI infrastructure

How to build an AI-native stack
without paying SaaS prices

A roadmap for running multiple AI models, open-source agents, and automated workflows on a single VPS — using tools that already exist.

June 2026 · Inspired by @ericosiu's implementation of Dorsey's "world intelligence" framework

The framework

Jack Dorsey says AI-native companies
need four layers

In "From Hierarchy to Intelligence" (April 2026), Dorsey and Roelof Botha described how Block is restructuring around AI. The essay got 5 million views in 48 hours. The framework is useful. The implementation details are scarce.

┌─────────────────────────────────────────────────────┐
│  Layer 4 · Surfaces        Where you interact      │
├─────────────────────────────────────────────────────┤
│  Layer 3 · Intelligence    Agents that decide       │
├─────────────────────────────────────────────────────┤
│  Layer 2 · World Model    Memory that persists     │
├─────────────────────────────────────────────────────┤
│  Layer 1 · Capabilities  The models themselves     │
└─────────────────────────────────────────────────────┘

We've been running a version of this for months. Each layer maps to open-source tools you can deploy today.

What we built

One VPS. Multiple models.
Zero SaaS subscriptions for AI.

4

Agent roles

Brand, growth, engineering, and operations — each with its own personality, skills, and scope.

3+

Model providers

Not locked to one API. Routes tasks to the best model for the job — open-source and proprietary.

50+

Daily automations

Cron jobs, scheduled reports, content pipelines, monitoring — running without human intervention.

Foundation

Start with a VPS you control

A single Linux server is enough. We use a VPS with 4 vCPUs and 16GB RAM. That runs the agent, the databases, the cron scheduler, and the message gateway — all at once.

Your VPS handles orchestration, memory, routing, and scheduling. The models run through API providers — no GPU required on your end.

Security matters from day one. The moment your VPS is online, automated bots start scanning it. Lock it down before you install anything else.

Full guide: VPS security for beginners — covers SSH hardening, firewalls, fail2ban, Tailscale, Docker isolation, backups, and supply chain basics.

# What your VPS needs

Ubuntu 22.04+ / Debian 12+
├── Docker + Docker Compose
├── Node.js 20+
├── Python 3.11+
├── Git
├── UFW (firewall)
├── fail2ban
└── Tailscale (optional, recommended)

# What it runs

Hermes Agent
├── Message gateway (Telegram, Discord)
├── Cron scheduler
├── Memory store (SQLite)
├── Session history
├── Skills engine
└── Model router → API providers

Layer 1 · Capabilities

Use multiple models.
Route tasks to the right one.

Model diversity is the core idea. Different tasks need different strengths. A coding task wants a model that writes clean code. A summarisation task wants speed. A creative brief wants nuance.

Instead of paying one provider for everything, you route tasks to the right model.

  • Main conversation — a strong general model for reasoning and planning
  • Subagents — faster, cheaper models for background work
  • Code generation — models trained specifically for code
  • Creative writing — models with better voice and tone control
# How model routing works

User message
    │
    ▼
┌──────────────┐
│  Main agent  │  ← strong model
│  (reasoning) │     handles the conversation
└──────┬───────┘
       │
       ├── delegate_task() → subagent
       │   └── cheaper model, isolated context
       │
       ├── cronjob → background task
       │   └── fast model, scheduled runs
       │
       └── code execution
           └── code-specialist model

Layer 1 · Open-source bridge

OpenCode-Go gives you access
to open-source models

Open-source models like Llama, Mistral, Qwen, and DeepSeek handle most background tasks — and they're free or near-free to run. You route expensive API calls to proprietary models only when the task demands it.

OpenCode-Go is a Go-based CLI that provides an OpenAI-compatible API for local and remote open-source models. It acts as a bridge: your agent talks to it like any other model provider, and it routes to whichever open-source model you've configured.

  • Run models locally on your VPS (smaller models) or connect to hosted endpoints
  • Drop-in replacement for OpenAI's API format
  • Works as a Hermes model provider out of the box
  • Use it for subagents, cron jobs, and background tasks
# The model stack

┌─────────────────────────────────┐
│         Hermes Agent            │
│   (routes tasks to models)      │
└────────┬──────────┬─────────────┘
         │          │
    ┌────▼────┐  ┌──▼──────────┐
    │ Propri- │  │  OpenCode-  │
    │ etary   │  │  Go         │
    │ APIs    │  │             │
    ├─────────┤  ├─────────────┤
    │ Claude  │  │ Llama 4     │
    │ GPT-4o  │  │ Mistral     │
    │ Gemini  │  │ Qwen 2.5    │
    │ mimo    │  │ DeepSeek    │
    └─────────┘  └─────────────┘

# config.yaml snippet

providers:
  opencode-go:
    type: openai-compatible
    base_url: http://localhost:8080/v1
    models:
      - llama-4-scout
      - qwen-2.5-coder-32b

Layer 2 · World model

Memory is what makes
the agent useful past turn one

Dorsey calls it the "world model" — the data structure that lets AI understand your specific business. He's right about the concept. The implementation is simpler than he makes it sound.

In our stack, memory works at three levels:

  • Session memory — the conversation you're having right now. Dies when the session ends.
  • Persistent memory — facts the agent has learned about you, your preferences, your environment. Survives across sessions.
  • Long-term memory — semantic search across all past sessions. The agent can recall what you discussed three weeks ago.

SQLite and FTS5 handle the indexing. The agent writes memories proactively — you don't have to tell it to remember.

# Memory architecture

┌────────────────────────────────────┐
│          Session Context           │
│   (current conversation window)    │
└──────────────┬─────────────────────┘
               │
    ┌──────────▼──────────┐
    │  Persistent Memory  │
    │  (SQLite, FTS5)     │
    │                     │
    │  user/  — who you   │
    │  memory/ — notes    │
    │  skills/ — procedures│
    └──────────┬──────────┘
               │
    ┌──────────▼──────────┐
    │  Long-term Memory   │
    │  (semantic search)  │
    │                     │
    │  Past sessions      │
    │  Decisions made     │
    │  Patterns learned   │
    └─────────────────────┘

Layer 3 · Intelligence

Agents with jobs,
not agents doing everything

One agent doing everything is a chatbot with extra steps. The better approach is role-based agents with clear boundaries.

Each agent gets:

  • A personality file that defines how it thinks and talks
  • A policy file that defines how it operates
  • Its own skills — loaded on demand, not all at once
  • Its own memory scope
  • Clear boundaries on what it owns and what it doesn't

When agents need to collaborate, they use a structured handoff — not an open-ended conversation. Bounded exchanges. Escalation paths. No bot-to-bot loops.

# Agent roles

Zuri (Brand & Growth)
├── Content strategy
├── Copywriting & editing
├── Campaign planning
├── Competitor intelligence
└── Social media operations

Froggy (Engineering)
├── Code implementation
├── Infrastructure & DevOps
├── Debugging & testing
├── Deployment & CI/CD
└── Technical discovery

Cron Agents (Background)
├── Daily briefings
├── Monitoring & alerts
├── Content pipelines
├── Data collection
└── Scheduled reports

Delegated Agents (On-demand)
├── Research tasks
├── Code review
├── Web scraping
└── Document processing

Layer 3 · Orchestration

The agent decides what to delegate.
Then it delegates.

The main agent makes decisions. When a task comes in, it decides: do I handle this myself, delegate it to a subagent, schedule it for later, or ask the human?

Delegation is the key pattern. The main agent spawns a subagent with a clear goal, the right tools, and enough context. The subagent works in isolation and returns a summary. The main agent reviews and delivers.

This is how you scale without adding headcount. One conversation can trigger three parallel workstreams, each running on a different model, each returning results the main agent synthesises.

The agent decides when to delegate. You don't have to micromanage the routing — that's what the personality and policy files are for.

# How delegation flows

User: "Research competitors,
       draft a blog post, and
       check our site uptime"

       │
       ▼
   Main Agent
   (reads context, plans)
       │
       ├──→ Subagent A
       │    [web research]
       │    model: fast/cheap
       │    returns: summary
       │
       ├──→ Subagent B
       │    [content writing]
       │    model: creative
       │    returns: draft .md
       │
       └──→ Subagent C
            [terminal commands]
            model: code-focused
            returns: uptime report

       │
       ▼
   Main Agent
   (reviews, synthesises,
    delivers to user)

Layer 4 · Surfaces

Talk to your agents
where you already talk

Our agents live on Telegram and Discord — the platforms we already use every day. No custom dashboard to check. No new app to learn.

The gateway is a message router. It connects to your platforms, receives messages, routes them to the right agent, and delivers the response back. Same message, same thread, same conversation flow you're used to.

  • Telegram — personal chats, group channels, topics
  • Discord — servers, threads, channels
  • CLI — direct terminal access for power use
  • Web — browser-based access when needed

The messaging platforms are the frontend. You build the agent, not the UI.

# Message flow

┌──────────┐     ┌──────────────┐
│ Telegram │────▶│              │
└──────────┘     │              │
┌──────────┐     │   Gateway    │     ┌──────────┐
│ Discord  │────▶│   (router)   │────▶│  Agent   │
└──────────┘     │              │     │  Engine  │
┌──────────┐     │              │     └──────────┘
│   CLI    │────▶│              │          │
└──────────┘     └──────────────┘          │
                                           ▼
                                    ┌──────────────┐
                                    │   Response   │
                                    │   delivered  │
                                    │   back to    │
                                    │   same chat  │
                                    └──────────────┘

# What the gateway handles

• Platform auth (OAuth, bot tokens)
• Message parsing & routing
• Media attachments (images, files)
• Thread/topic context
• Rate limiting
• Multi-agent dispatch

The full picture

Everything on one VPS

┌─────────────────────────────────────────────────────────────────────┐
│                         YOUR VPS                                   │
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │  Layer 4  Telegram · Discord · CLI · Web                    │   │
│  └──────────────────────────┬──────────────────────────────────┘   │
│                              │                                      │
│  ┌──────────────────────────▼──────────────────────────────────┐   │
│  │  Layer 3  Hermes Agent Engine                               │   │
│  │           ├── Zuri (brand & growth)                          │   │
│  │           ├── Froggy (engineering)                            │   │
│  │           ├── Cron scheduler                                  │   │
│  │           └── Delegation engine                               │   │
│  └──────────────────────────┬──────────────────────────────────┘   │
│                              │                                      │
│  ┌──────────────────────────▼──────────────────────────────────┐   │
│  │  Layer 2  SQLite + FTS5 · Session store · Memory index      │   │
│  └──────────────────────────┬──────────────────────────────────┘   │
│                              │                                      │
│  ┌──────────────────────────▼──────────────────────────────────┐   │
│  │  Layer 1  Model Router                                      │   │
│  │           ├── Proprietary APIs  (Claude, GPT, Gemini)        │   │
│  │           └── OpenCode-Go        (Llama, Mistral, Qwen)      │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                                                                     │
│  Docker · UFW · fail2ban · Tailscale · Backups                     │
└─────────────────────────────────────────────────────────────────────┘

Getting started

Build it in a weekend

  1. Provision a VPS — 4 vCPUs, 16GB RAM minimum. Ubuntu 22.04. Any provider works: Hetzner, DigitalOcean, Linode.
  2. Lock it down — SSH keys, UFW firewall, fail2ban, auto-updates. Follow this guide.
  3. Install Docker — The agent and its dependencies run in containers. Docker Compose handles the wiring.
  4. Set up Hermes Agent — Clone the repo, configure your first profile, connect a messaging platform.
  5. Connect model providers — Add API keys for proprietary models. Set up OpenCode-Go for open-source models.
  6. Write your first agent — Personality file, policy file, skills. Start with one role. Expand when it's working.
  7. Add automation — Cron jobs for recurring tasks. Scheduled reports. Content pipelines.
# Quick start commands

# 1. Secure your server
sudo ufw default deny incoming
sudo ufw allow 22/tcp
sudo ufw enable
sudo apt install fail2ban

# 2. Install Docker
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER

# 3. Clone Hermes
git clone https://github.com/nousresearch/hermes-agent
cd hermes-agent

# 4. Configure
cp config.example.yaml config.yaml
# Add your API keys and platform tokens

# 5. Launch
docker compose up -d

# 6. Set up OpenCode-Go
# for open-source model access
go install github.com/opencode-ai/opencode@latest
opencode serve --port 8080

Cost

What this actually costs to run

Component What you use Monthly cost
VPS 4 vCPU / 16GB RAM $20–40
Hermes Agent Open-source, self-hosted $0
OpenCode-Go Open-source models via API $0
Proprietary models Claude / GPT / Gemini APIs $10–100
Telegram / Discord Bot APIs (free tier) $0
Domain (optional) Custom URL for web access $1

Total: $30–140/month for a full AI-native operation. Compare that to SaaS platforms charging per seat, per workflow, per integration. The models are the only recurring cost that scales with usage — and open-source models keep that cost near zero for background tasks.

Start here

The tools are already built.
You just need to wire them together.

# The stack at a glance

Hermes Agent     → orchestration
OpenCode-Go      → open-source models
SQLite + FTS5    → memory & search
Docker           → isolation
UFW + fail2ban   → security
Tailscale        → private networking
Telegram/Discord → your interface
Cron             → automation
Skills           → reusable procedures

# Total cost of entry

Time:  one weekend
Money: ~$30/month
Lock-in: zero

TL;DR

You don't need Block's budget
to run Block's playbook.

Multiple models. Open-source tools. One VPS. Agents with clear roles. Memory that persists. Automation that runs while you sleep.

Built with · Hermes Agent · OpenCode-Go · Claude · Mistral · Llama · Telegram · Discord
Deployed on · A single VPS running Ubuntu