The Columbo method
Jul 20, 2015
If you're of a certain age, you'll remember the well-known and much-loved TV show Columbo, starring Peter Falk as a bumbling and dishevelled detective who always got his man (or woman) in the end. As formulaic as the show was, it was a formula that worked: the murderer was revealed early on in the episode, and then the dramatic tension would come from the way in which Columbo managed to reveal the truth.
He would follow up anything that didn't look right, no matter how small — from a paperweight on the wrong side of the desk to a turn of phrase that was just slightly misplaced. He asked apparently stupid questions all the time too, much to the annoyance of the perpetrator of the crime, and on top of this he was relentlessly persistent: each killer was given no time to rest or relax as Columbo turned up the heat.
Columbo is sadly off the air now but we can adapt some of the LA detective's methods in figuring out issues with software: querying small details and anything that looks out of place can lead to exposing major issues (in this case the killers we're trying to catch are bugs in the code).
Consider the $125 million Mars Climate Orbiter probe, which NASA lost in 1999 — this major mishap happened because engineers failed to convert from English to metric measurements before launching the probe. That may sound like a relatively insignificant detail but it wrecked the whole mission, and you can be sure it's not something Columbo would have let slide had he been on the launch team.
From the Wikipedia page on the mission: "The discrepancy between calculated and measured position, resulting in the discrepancy between desired and actual orbit insertion altitude, had been noticed earlier by at least two navigators, whose concerns were dismissed."
The 5 Whys method sums up the kind of iterative, Columbo-like questioning method that's required to troubleshoot and resolve failure: keep asking 'why?' until you reach the root cause of the issue. Changing things around based on guess work, 'fiddling with the knobs' to see what happens, getting something into a working state and then hoping it holds, or just buying a bigger server — all of these approaches will doom you to a life of perpetual firefighting without resolving the underlying issues.
Fixing things properly requires the right culture and the right approach, whether you're solving crimes in California or getting code into the best state it can be. Columbo's focus on "one more thing" would often get to the heart of a case through the tiniest of inconsistencies in an alibi or a murder scene — and while we don't have to catch murderers we like to take the same thoroughness in our approach to spotting bugs and fixing failures.