Open-source C/C++ engine for running LLMs locally. Underlies llamafile. Exposes parameters like minimum token counts (to guarantee some output is generated) and a token callback that can remove the last generated token and resume generation with different parameters — enabling server-side structured-output enforcement. Already does something resembling per-token fault-tolerant parsing for structured output; Hasiński notes this feature moved from client- to server-side precisely because quick, interruptive checks require local latency.