Sunday, April 29, 2007

Great Troubleshooting:

Early in my software career I was given the task of maintining the home-grown oprating system for a company that provided computer services to many companies. Our computers tended to crash every two to four hours. The most important part of my job was to analyze these crashes fast, try to see what caused them, and get the computers up again. Every minute was critical in terms of lost income and customer dissatisfaction.

Sixteen months later I left that assignment for another one at the same company. When I left, the computers stayed up at least eight days at a time without crashing. In those sixteen months I had done other assignments, but I had spent hours and hours reading the operating system code, looking for problems, verifying and fixing them. This was a rare, highly productive experience, and until a few days ago I thought it was primarily my cleverness and stick-to-itiveness that yielded such rich rewards..

But suddenly, I knew better.

I owed my great success to the company's willingness to give me months to find bugs they never imagined they had. They never worried that they might waste the next month, while I found nothing of value to fix. In the modern computer world, I think such an attitude is unthinkable. Imagine my manager telling the VP of operations she wants to let me keep looking for OS problems for another two months.
"To do what?"
"Maybe he'll find an impoortant reliability problem."
"What problem? I bet the only thing is, our disk drives are lousy. Tell him to write a Ruby development environment instead."
"Let him look, you'll never know if there's another big bug in the OS unless he finds it."
"Tell me what he's going to find that 's worth eight weeks of his salary."
"I don't know."

And that's how it would go. But the company I worked for in 1969 was different. They had no idea how to control money.They ran up debts at a startling rate. They had dozens of different ways to waste money, I was just static in the overall noise. So I was left alone to make their basic product reliable. Lucky me.
Post a Comment