We can easily imagine what a perfect tool would do even when it's hard to build. Agreement in the team: on manual reviews, flag everything a perfect tool would flag — if a tool should identify something but doesn't, the human reviewer must catch it.