← Graph

CSV Parsing Benchmarks

talk 8 connections

wroclove.rb 2026 lightning talk comparing CSV-parsing options in Ruby: stdlib Ruby CSV, smarter_csv, Polars Ruby, and DuckDB. Benchmarks on 1M rows with tasks like regex-validating dates of birth, summing amounts, and scanning for values, all running in Docker. Meta-point: the entire presentation (benchmarks + slides, which broke entertainingly during the talk) was built in about an hour using only codex cli, arguing that creating such demos has never been easier. Results: smarter_csv ≈ 3× speedup over stdlib, DuckDB ≈ 15×, Polars Ruby ≈ 22× (fastest). Memory: stdlib Ruby ≈ 400MB post-load on a 30MB CSV, smarter_csv ≈ 300MB, Polars Ruby ≈ 100MB, DuckDB ≈ 90MB (best). Speaker has used Polars Ruby in production to parse 300MB CSVs in real time within a single request — no caching or pre-calculation needed. Only about two audience members had ever used Polars in Ruby, motivating the talk. All slides, code and benchmarks are in a GitHub repo.

type
lightning-talk
talk CSV Parsing Benchmarks
about
Benchmarked option; fastest at ~22× over stdlib
talk CSV Parsing Benchmarks
about
Benchmarked option; ~3× over stdlib
talk CSV Parsing Benchmarks
about
DuckDB tool
Benchmarked option; ~15× faster and smallest memory footprint
talk CSV Parsing Benchmarks
about
Ruby CSV tool
Baseline option in the benchmark
talk CSV Parsing Benchmarks
about
Codex CLI tool
Meta-point: the whole talk was built in an hour with codex cli
from_talk
CSV Parsing Benchmarks talk
Bottom-line recommendation
talk CSV Parsing Benchmarks
presented_at
Lightning talk at wroclove.rb 2026
talk CSV Parsing Benchmarks
uses
Docker tool
All benchmarks ran in Docker

Provenance

Created in
wroclove.rb 2026 Lightning Talks 2026-04-22 08:41
Read by
1 extraction