Project that packages llama.cpp and a model into a single portable binary that runs on macOS, Windows and Linux without modification — just download and run. Ships a server with configuration for chat templates and (historically) stop tokens, and protections against models misfiring on missing stop tokens. Hasiński highly recommends downloading it and playing with its parameters (minimum token counts, token callbacks that can rewind and regenerate tokens with different parameters) to build intuition about how LLMs actually work.