Lesson from the 30-minute-502 incident: the working group followed many hypotheses in parallel, worked in pairs (a developer with an infra person when load balancers were suspected), and continued across time zones only because every member published small observations, hypotheses, and dead ends. Requires a blameless culture focused on fixing the problem rather than being the hero.