The idea refinery — Lee Goymer

Two cooperating systems that take "LLM as judge" seriously enough to distrust it.

The generator fuses concept pairs into candidate ideas, then runs a gauntlet: a multi-judge sanity gate (advance / salvage / kill, each with a named fatal assumption and a killer test), a steelman rescue that tries to save salvageable ideas and then adversarially re-judges its own rescue — capping the verdict if the "fix" is just a rephrasing (Jaccard similarity against the original flaw) — blind rescoring on five axes, prior-art search across seven providers with novelty verdicts, and embedding-based near-duplicate detection using corpus-relative whitening.

The scorer evaluates surviving opportunities on eight weighted axes — but the LLM doesn't get the last word. Deterministic calibration rules floor and cap scores based on evidence: weak demand signals cap expected value, missing concrete artifacts cap actionability, and every adjustment is logged into the rationale. Recent rejections are fed back into generation prompts as "avoid these patterns."

The design thesis across both: LLM judgment is a noisy sensor to be calibrated, bounded, and audited — never an oracle.

Status & limits — honest label is research pipeline, not product: one database reconstruction is journaled, and one prompt variant is quarantined for measurably regressing quality (which is exactly why the measurement exists).

Stack