Planet

CSV Parsing Benchmarks

talk 8 connections

wroclove.rb 2026 lightning talk comparing CSV-parsing options in Ruby: stdlib Ruby CSV, smarter_csv, Polars Ruby, and DuckDB. Benchmarks on 1M rows with tasks like regex-validating dates of birth, summing amounts, and scanning for values, all running in Docker. Meta-point: the entire presentation (benchmarks + slides, which broke entertainingly during the talk) was built in about an hour using only codex cli, arguing that creating such demos has never been easier. Results: smarter_csv ≈ 3× speedup over stdlib, DuckDB ≈ 15×, Polars Ruby ≈ 22× (fastest). Memory: stdlib Ruby ≈ 400MB post-load on a 30MB CSV, smarter_csv ≈ 300MB, Polars Ruby ≈ 100MB, DuckDB ≈ 90MB (best). Speaker has used Polars Ruby in production to parse 300MB CSVs in real time within a single request — no caching or pre-calculation needed. Only about two audience members had ever used Polars in Ruby, motivating the talk. All slides, code and benchmarks are in a GitHub repo.

talk CSV Parsing Benchmarks

about

Polars Ruby tool

Benchmarked option; fastest at ~22× over stdlib

talk CSV Parsing Benchmarks

about

smarter_csv tool

Benchmarked option; ~3× over stdlib

talk CSV Parsing Benchmarks

about

DuckDB tool

Benchmarked option; ~15× faster and smallest memory footprint

talk CSV Parsing Benchmarks

about

Ruby CSV tool

Baseline option in the benchmark

talk CSV Parsing Benchmarks

about

Codex CLI tool

Meta-point: the whole talk was built in an hour with codex cli

takeaway Polars Ruby for Fast CSV Parsing

from_talk

CSV Parsing Benchmarks talk

Bottom-line recommendation

talk CSV Parsing Benchmarks

presented_at

wroclove.rb 2026 event

Lightning talk at wroclove.rb 2026

CSV Parsing Benchmarks

Provenance