Markus Schirp's wroclove.rb 2026 conference talk, delivered with an assistant (Katie) physically throwing darts at a slide to illustrate the mental model. Not a Mutant talk: refines a scribble-on-napkins argument Schirp typically uses in consulting conversations. Core thesis: discipline doesn't scale — any organization, at 3 or 5,000 developers, will screw up, and only automation helps. Introduces three thresholds every software system has: (1) ecosystem threshold — what the base language gives you batteries-included (Haskell's type system very high; C somewhat lower; assembly lowest; Ruby essentially 'it parsed and eventually booted'); (2) automation threshold — tests, linters, TDD, DDD, CI and all quality gates layered on top; (3) contribution threshold — whether merging the change actually helps the company. Each contribution is a dart: darts landing below the ecosystem threshold are automatically rejected (syntax errors, failed boot); darts between the automation and contribution thresholds land green (helpful merges); darts in the red zone between automation and contribution are the most dangerous — CI passes, merge button is clicked, and things break in production (wrong DB schema, forgotten mobile membership schema update, etc.). The gap between automation and contribution is where discipline lives, and discipline is a statistical losing bet at scale. Developers (Schirp includes himself) are overconfident about the shape of their personal dart distribution; integrated over time everyone regresses to the same gaussian-ish curve. LLMs simply throw many more darts, so even the same distribution produces far more red hits — the fix is raising the automation threshold closer to the contribution threshold to auto-reject more. In Ruby this matters especially: the ecosystem threshold is very low, the Ruby ecosystem has spearheaded TDD and similar tooling precisely because the language gives so little, and runtime hazards (monkey patches leaking from third-party gems, redefining division operators, method_missing, eval) keep dragging it down; Schirp patches out eval, method_missing and freezes core classes in his production systems but says a big VM-level 'harden' method should exist. Recommends raising the automation threshold with Mutant (mutation testing), Sorbet/RBS type systems (cites Shopify as proof it helps at scale; seen retrofitted with good effects), and property-based testing (an open field in Ruby — would love someone to write and popularize a property-testing framework; invariants like 'reverse doesn't change length' or 'sum of line items >= any single item' produce phenomenal business-logic tests; god-tier in Haskell/finance). Closes on the LLM point: LLMs are an amplifier of existing patterns, not specifically bad; since we can never eliminate the red area (that would be AGI, and everyone's out of a job), we must reduce its ratio. Q&A covers: what counts as 'shit in production' (long-lasting damage like a wrong DB schema discovered five years later, not a trivial bug); how scale changes the problem (same problem, three-person teams just don't see the distribution); product-quality 10× amplification (features nobody asked for); why the base language matters (bridging a strong type system to a good contribution threshold is much less work than bridging from 'it parsed'); defaults and standard library matter because systems regress to defaults; why Schirp reaches for Rust as an '80% Haskell, easy sell' when he can't pick Haskell for economic reasons; when to invest in culture vs automation (culture is a second-order encoding of discipline — fix the deterministic automation first, culture is celebrated but doesn't move the needle).