Audience question: since LLMs generate one token at a time with full context visibility, could you hook a fault-tolerant parser into the stream so that the moment one token strays from a valid grammar (TypeScript-style 'next valid word' prediction), you rewind and retry just that token? Hasiński confirms llama.cpp already does something similar for structured output; he's confident proprietary providers do too. You can also build it on low-level APIs since you control the input-array-to-output-number loop. The feature moved from client-side to server-side specifically because checking and interrupting fast enough requires local latency — a remote round-trip would be too slow for things like wiring Ruby LSP to a remote LLM. llama.cpp exposes a minimum-token parameter and a token callback that can remove a token and regenerate with different parameters — he recommends downloading it to experiment.