Audience asked how the team knew whether a manual tweak was for the better — customer reactions or benchmarking? Answer: the benchmark was the product and operations teams 'vibing' with the resulting boxes — looking at each suggested box given the input and asking 'would I feel good receiving this?'. A matter of taste and judgment, not a metric.