Brisbane, Australia · remote-friendly · Open to roles & contracts

Lee Goymer

Systems generalist · Applied-AI engineer

I design, build, and operate AI systems end-to-end — evaluation labs, GPU schedulers, knowledge archives, voice assistants — in production, on hardware I administer myself.

See the work Resume Get in touch

8,351open models tracked

1.45Mgame scores indexed

246Kbenchmark results

13.6KLLM-judged answers

112/112archives verified

20+services in production

The system

Everything below runs on one Ubuntu server with an RTX 3090, plus a ROCK 5B+ at the edge — built and operated solo over a twelve-month build year. Monitored, scheduled, backed up off-site, and watched by a deadman switch.

Architecture map of the tjserv ecosystem: fabric, evaluation lab, knowledge and media services, assistants, and the edge node

Case studies

One box, twenty services Operating daily

A personal AI platform on a single RTX 3090 server — built, instrumented, and operated solo with the discipline of a small infra team.

45 projects under management~24 scheduled jobs33 mirrored repos3-tier backups → Backblaze B2

Case study →

Open-model evaluation lab Daily pipeline

A self-hosted answer to "what's the best open model for X?" — 8,351 models tracked, benchmarked across modalities, with psychometric test design and an LLM judge I actually calibrated.

8,351 models246,567 result rows13,662 judged answersIRT (2PL) item selection

Case study →

ARK — verified knowledge compression Verified archive

An offline, provenance-rich archive of open human knowledge — compressed, integrity-verified end-to-end, and queryable without the internet.

15 corpus scales47.7 GB Wikipedia rung112/112 round-trip verifiedFTS · DuckDB · SPARQL query surfaces

Case study →

CTA3 — an LLM solver for community riddles Research engine

A full-stack research database and agentic solver for osu!'s hardest community puzzle tournament — 1.45 million scores indexed, every miss forensically analyzed.

1,453,429 scores219,440 beatmaps23,142 players168 clues, gap-analyzed

Case study →

GPU timeshare — multi-tenant scheduling for one RTX 3090 In production

Eight GPU-hungry projects, 24 GB of VRAM, zero OOM wars — a lease API with priority aging, reversible preemption, and an enforcer that audits every process it kills.

15-second enforcer looptwo-phase pause, VRAM-verifiedRAM park ledger ≤ 32 GiB1,097 LOC of tests

Case study →

More from the lab

SAM Foreground API

Stable service

Production image-segmentation service — SAM 2.1 + depth-aware candidate ranking, benchmarked to 0.995 mIoU on a human-evaluated suite.

0.9954 mean IoU6-method ensemble

Life Coach — privacy-first personal AI

Running daily

A coaching agent with months-long memory — temporal knowledge graph, contradiction resolution, and a three-lane privacy architecture that keeps the private parts local.

temporal KG (Graphiti/Neo4j)3 privacy lanes

Easy Transcription

Stable service

Self-hosted speech-to-structured-notes — browser recording, live WebSocket ASR, and LLM post-structuring into summaries and action items.

faster-whisper large-v3 (CUDA)live streaming ~8s chunks

Edge multimodal node — ROCK 5B+

13 services live

A complete multimodal AI stack on a 6-TOPS ARM board — ASR, TTS, VLM, and diversity-sampled capture, with models quantized and converted for the NPU by hand.

13 systemd servicesRKNN/ONNX NPU conversions

osu!catch Completion Tracker

Live on the internet

A public web app tracking ranked-beatmap completion for osu!catch players — OAuth login, 800+ requests/minute ingestion, live WebSocket progress.

live at completion.trainjumper.comOAuth2 (osu! API)

The idea refinery

Research pipeline

A 12-stage adversarial pipeline that generates ideas, tries to kill them, rescues the salvageable, and scores survivors with deterministically-capped LLM judges.

12 pipeline stagessteelman rescue + adversarial rejudge

Speech-enhancement benchmark harness

Methodology piece

How do you benchmark denoising when no clean reference exists? A fair-comparison harness for dashcam audio with ASR-hallucination detection as the quality signal.

4 SOTA models, isolated venvsnon-reference metric design

Tab Veto'er

Working MVP

2,704 hoarded browser tabs triaged by an LLM that checks what I already own, already know, and actually need before it lets anything survive.

2,704 tabs ingestedACT/SAVE/CLOSE/DEFER verdicts

Also in the lab: Smart Home (Home Assistant ↔ RabbitMQ event bridge, Zigbee mesh, local voice — hardware phase in progress) · Vehicular Assistant (streaming ASR→LLM→TTS voice loop for the car, pluggable stage protocols with per-stage latency instrumentation) · Universal Critic (three-pass dialectic critique engine: critique → steelman → adjudicate) · Researcher (search-synthesis service with enforced source attribution) · Solar sufficiency modeling, dashcam processing, video compression pipelines, and a long archive of community infrastructure — including Minecraft community services (modpack distribution, server lists, vote bots) run for real users for years.

About

I'm a Computer Science graduate (Griffith University) and former Amazon SDE (Amazon Fresh — proactive order-issue remediation, event pipelines at US-grocery scale) who took a deliberate build year: a low-stakes day job to pay the bills, and every other waking hour turning a single RTX 3090 server into the twenty-service AI platform on this page — an evaluation lab tracking 8,000+ open models, a GPU scheduler with reversible preemption, a verified knowledge archive, voice assistants on edge silicon — all running 24/7 under real operational discipline: monitored, scheduled, backed up off-site, watched by a deadman switch.

Before that: software development and cyber-security work — two paid security research internships (formal verification of Solidity smart contracts at Griffith's Institute for Integrated and Intelligent Systems), a fintech penetration-testing engagement in Hong Kong, and a top-ranked first-year team finish in the Australian Cyber Security Challenge.

What I'm best at is the full loop: pick a hard, fuzzy problem; design the system; build it fast; measure it honestly; run it in production; then write down what's still broken. The case studies above each end with a "status & limits" note for exactly that reason — I'd rather you trust the numbers than admire the adjectives.

What I'm looking for: applied-AI engineering, platform work, forward-deployed/ solutions roles, founding-engineer seats, or contract prototyping — anywhere the job is "figure it out end-to-end and ship it."

Contact

Open to roles and contract engagements — applied AI, platform engineering, rapid prototyping, evaluation work.

lmgoymer@gmail.com LinkedIn GitHub