One box, twenty services

Most side projects die because nothing is watching them. I built the thing that watches.

tjserv is a single Ubuntu server (RTX 3090, 24 GB VRAM) running my entire AI ecosystem: an evaluation lab, a knowledge archive, segmentation and transcription services, LLM decision tools, and the voice/assistant stack — around 20 services at any time, with a ROCK 5B+ single-board computer running 13 more at the edge.

The interesting part isn't any one service. It's that the fleet stays alive.

The fabric

Project tracker as control plane. Every project declares itself with one TOML file (.tracker/manifest.toml); a hub service (FastAPI + PostgreSQL) discovers it, polls its health endpoint, maps its dependencies into a live force-directed graph, and renders a single dashboard that answers "what's the state of everything?" in one glance. Projects never call the tracker — they write to their own directory, the tracker reads. Zero coupling, so abandoned projects can't leave stale integrations behind.
Markdown is the work queue. Tasks live in version-controlled markdown, rendered live at /work, with a pre-computed daily LLM recommendation answering "what should I work on right now?" — I removed the kanban after markdown won in practice.
Operational safety nets, layered. Nightly restic backups to Backblaze B2 (system tier) and weekly local tier; nightly source + state archives with a cross-disk tarball mirror; a daily git auto-snapshot that commits dirty trees across all 33 mirrored repos; and a deadman switch every 30 minutes that pings Discord if any daemon stops stamping — rate-limited so it alerts once, not forever.
Health is a state machine, not a boolean. Services report green; stale health files degrade to amber after six hours; hard dependency edges (and only those) cascade failures red. Watchdogs that exit non-zero by design are first-class, not false alarms.

Why it matters

Anyone can start twenty projects. The platform is what lets one person operate twenty — pause thirteen of them without killing their services, recover any of them from any of three backup layers, and know within thirty minutes when something silently dies.

Status & limits — running daily as my actual morning ritual; v1 trust phase in progress. Single-node by design: high availability isn't the goal, recoverability is. Quarterly restore drills are scheduled, and the first one already caught a gap.

The fabric

Why it matters

Stack