← Graph

Next Token!

talk 29 connections

Krzysztof 'Chris' Hasiński's wroclove.rb 2025 single-speaker talk, reusing slides he has had to re-update for every conference because AI moves so fast. Framed around 'falsehoods programmers believe' (à la the falsehoods-about-time article and the broader falsehoods repository), but applied to LLMs. Core thesis: LLMs can't chat, can't think, and can't use tools — they are a 'big token factory' that predicts the next token the trainer preferred (via reinforcement learning on top of training data). Walks through: (1) tokens are numbers under the hood — OpenAI's tokenizer shows English words often get whole clean tokens (with leading spaces), Polish burns many tokens, Japanese/popular words like 'nihon' or 'go' get single dense tokens; (2) chat is fictional — it's implemented with stop tokens: a program feeds a system prompt like 'you are an assistant, prefix replies with assistant, user prefixes with user' and stops generation when the model emits 'user'; miss the stop token and the model answers itself (OpenAI's voice model famously clones the user's voice and asks itself questions); llama.cpp guards against misfired stop tokens; (3) reasoning (o1/o3, DeepSeek) is also just token generation in a hidden 'reason' role — the summary you see is another model summarizing; (4) agents and tool use abuse stop tokens, embeddings, and output formatting — example fictional prompt with 'tool' role, JSON invocation, 'commit' and 'user' as stop tokens; program calls the real API and splices the response back into context; (5) MCP servers are a meta-tool that lists other tools on demand to save context, but have no security model — any MCP server can hijack your LLM; (6) embeddings are high-dimensional vectors capturing abstract concepts (illustrated with the king/queen/car + royalness/manliness 2-D toy); paired with vector DBs they power RAG either as pre-query lookup or during-response tool calls, support multimodal inputs (images, video, audio via models like Google's SigLIP), and benefit from chunking (e.g. Baron gem); (7) hybrid search (classic word lookup + vector + whatever works) is now common; also consider graph databases, especially LLM-generated ones; (8) structured output used to be 'please format as this JSON schema, validate, re-ask on failure' (same algorithm as yelling at junior developers — LangChain does exactly this with a YAML retry prompt), now handled server-side so you get clean JSON back. Surveys the Ruby AI ecosystem: LangChain is basically dead now that LLM servers do the abstractions; Ruby LLM is growing fast (with a huge PR backlog); neighbor + pgvector/SQLite-vec (all by Andrew Kane, 'if we lose Andrew we lose Ruby's ecosystem'); Baron for chunking; multiple MCP server implementations in Ruby; vendor APIs have mostly standardized on the OpenAI format with minor differences (bedrock, OpenRouter expose subsets). Closing message: it's still wild west, everything you write today is outdated tomorrow, but we have a new magical token generator and a lot of software to write around it — job security for the audience. Q&A: could you hook a fault-tolerant parser into the token stream to retry one token at a time, like a type-aware IDE suggesting the next three valid tokens? Chris confirms llama.cpp already does something similar for structured output (which is why it moved from client-side to server-side — latency would kill a remote roundtrip); providers likely do it too; llama.cpp exposes parameters for minimum token counts and token callbacks that can rewind one token and regenerate with different parameters, and he recommends downloading it to play.

date
2025-03-14
type
talk
talk Next Token!
about
Core subject — debunking how LLMs actually work.
talk Next Token!
about
Talk is framed as a falsehoods list about LLMs.
talk Next Token!
about
LLM Tokens concept
Explains tokenization and per-language token density.
talk Next Token!
about
Stop Tokens concept
Explains chat, tool calling, and reasoning as stop-token tricks.
talk Next Token!
about
Debunks reasoning as hidden-role token generation.
talk Next Token!
about
Describes tool invocation as stop-token-delimited structured output.
talk Next Token!
about
MCP Server concept
Discusses MCP as a meta-tool and its security implications.
talk Next Token!
about
Explains embeddings with king/queen/car toy example.
talk Next Token!
about
Covers RAG via pre-query lookup and tool-lookup-during-response.
talk Next Token!
about
Walks through prompt-based JSON schema enforcement and its modern server-side replacement.
talk Next Token!
about
Hybrid Search concept
Recommends mixing keyword and vector search for LLM apps.
talk Next Token!
about
Notes RLHF is why outputs reflect trainer preferences, not raw training distribution.
talk Next Token!
about
AI Agent concept
Argues agents aren't real — just stop-token/embeddings/output-formatting abuse.
talk Next Token!
about
Ruby LLM tool
Showcases Ruby LLM as the modern Ruby wrapper replacing LangChain.
talk Next Token!
about
Uses LangChain's JSON/YAML retry prompts as an example and argues LangChain is now obsolete.
talk Next Token!
about
neighbor tool
Recommended as the Ruby gem for vector search in Postgres/SQLite.
talk Next Token!
about
Baron tool
Mentioned as a Ruby chunking gem for better embeddings.
talk Next Token!
about
llamafile tool
Recommended as a single-binary way to run LLMs locally and explore their parameters.
talk Next Token!
about
llama.cpp tool
Discussed as the engine with token-callback and minimum-token controls.
talk Next Token!
about
Used to visualize tokenization across languages.
talk Next Token!
about
Framing device — LLM falsehoods as an analogue to the time-falsehoods list.
talk Next Token!
about
SigLIP tool
Mentioned as the multimodal embedding model used during the associated workshop.
talk Next Token!
about
Midjourney tool
Cited as an outdated illustration tool — image generators have moved on.
Audience Q&A following the talk.
authored
Next Token! talk
Hasiński delivered the 'Next Token!' talk at wroclove.rb 2025.
from_talk
Next Token! talk
Central takeaway of the 2025 talk.
from_talk
Next Token! talk
Warning issued in the MCP section of the talk.
from_talk
Next Token! talk
Update Hasiński gives on the Ruby AI ecosystem in 2025.
talk Next Token!
presented_at
Delivered on 2025-03-14 at wroclove.rb 2025.

Provenance

Created
2026-04-17 16:18 seed
Last updated in
Next Token! — Chris Hasiński on LLM falsehoods 2026-04-18 07:42
Read by
19 extractions