Compiler/interpreter optimization that detects a SIMD-capable CPU (MMX/SSE on x86) and unrolls a loop so that, for example, four multiplications can execute simultaneously in hardware instead of sequentially. Halasz cites this as his favorite optimization — he implemented it in hardware at university.