Back in February 2010, Bob Pease sent an email titled “Confidence in Software??” Pease was well-known for his distate of computers:
It was no surprise that he didn’t think much of software. He wrote:
“I heard the story that, when the first Prius prototype was brought out for Mr. Toyoda to see a demo, they ‘turned the key,’ and... nothing happened. Apparently some idiot manager was so confident that they didn't even try out the car. It was a software problem. Are you surprised? Mr. Toyoda is sitting there and the driver turned the ‘key’ and nothing happened. Embarrassing.
“Now Toyota has got bad software in the brakes. First it won't go; then it won't stop. Are the Japanese inherently unsuited to do good software? I've never written any bad software, but I realize that many prototypes must be properly tested. I'm working on some systems now that will obviously need a lot of testing. Are you surprised? / Sigh. / rap”
Despite some bad experience with Japanese printer drivers, I don’t think bad software is the result of national origin. More bad software comes out of the USA than anywhere else. Think of those old HPGL plotter drivers. Indeed, a recent article in the Atlantic Monthly was titled “The Coming Software Apocalypse.” The URL base-name is equally amusing, “saving-the-world-from-code.”
The Atlantic article references the same Toyota unintended acceleration problem Pease alluded to in his email. It also acknowledges that software is wickedly complex, “...beyond our ability to intellectually manage.” Intellect cannot help us understand something as complex as a software program. If you take 20 issues of Electronic Design magazine, and arrange them on a shelf in all possible combinations, that’s 19! (19 factorial). That’s more seconds than the universe has existed since the Big Bang. Even simple embedded programs may have 20 modules with many more branches.
We knew decades ago that it is impossible to generate all possible test vectors for large programs. My programmer buddy says, “Over 100,000 lines of code, and you just poke it with sticks, hoping it doesn’t break somewhere else.”
It Takes a Rocket Scientist
Even common apps and programs are as complex as the Space Shuttle. Don’t be surprised if they blow up every few years. A major problem is the complexity is hidden from users, managers, and even other programmers. Some bad software comes about because the engineers are under time pressure, so they just take demo code from various chips and modules and try to string them together into a functioning program. Other bad software comes about from lazy or sloppy programming.
Another programmer buddy told me “If people could watch code run, they would die laughing.” When I pressed him for an example, he noted that the program might spend considerable time creating some giant data structure in memory. Then the program would pluck just one byte out of all that data. And I know for a fact that sloppy code makes those giant data structures into memory leaks that eventually crash the program.
I asked John Gilmore, a co-founder of the Electronic Frontier Foundation and Sun Microsystems, about his thoughts on software. He wrote back:
“I've worked with lots of software engineers, and with lots of hardware engineers. I have managed a lot of software engineers too. One thing I've noticed is that hardware engineers are a lot better (as a class) at accurately estimating how long it will take them to get a project done.
“My own suspicion about this is that it's because of the inherent differences in complexity. Hardware designers have the benefit of the laws of physics, which mean that something happening over on the left-hand side of the board (or chip) probably won't affect something happening on the other side—unless there's an explicit wire between the two, etc. So you can get one subsystem working at a time, and it's unlikely that changes in a different subsystem are going to mess up the part that you just got working.
“But with software, it's all happening in one big address space, in random access memory (RAM). Every memory location is equidistant from every other memory location. Anything can touch anything else. And if one of those things contains some kinds of errors, such as stray pointers, then they can affect absolutely anything else in the same address space. So, in effect, the whole thing has to be working, or at least well-behaved, before a piece of software can be debugged in a predictable amount of time.”
This reminds me of IC design or RF engineering. In analog IC design, the location of devices on the die can interact and cause latch-up or failure. With RF engineering, the radiation from one component on your board will affect other components. It’s a large interactive system. Yet software is many orders of magnitude more complex and more interactive.
The Atlantic article maintains that “The FAA (Federal Aviation Administration) is fanatical about software safety.” No, the FAA is fanatical about software documentation. So was the Army, which required contractors to comment every line of code. So we would have silly comments like “Move register C to the accumulator.” Factually correct, but absolutely useless to understand the design intent. A friend who works on products under FAA certification tells me it’s the worse code he has ever seen. One problem is that it never gets patched or upgraded, since that would trigger another expensive time-consuming FAA review.
The Software Experience
Unlike Pease, I have done some software. I once made a PC test fixture to exercise a wafer elevator in a semiconductor manufacturing machine. The elevator had a stepper motor and an optical “home” flag. I used Basic. The program was 200 lines long. I naively just tried to run it, and of course, nothing happened. I ended up going two or three lines at a time, adding semaphores and flags to figure out what was happening.
Like Gilmore noted, it took me much longer than I anticipated. It did finally work. I was delighted when I compiled it in QuickBasic and the program ran much faster. That’s the beguiling thing about software writing and use. It frustrates you for hours, but the payoff is so satisfying, you keep doing it. It’s like 3-cushion billiards versus bumper pool.
I was working at a startup doing embedded design of point-of-sales systems. Four of us had been up all night trying to figure out a particularly insidious bug. We cracked it about 5:00 AM. That’s when my pal Wayne Yamaguchi gave an extemporaneous speech about building quality software. He noted that the lower-level modules like serial communication and timer loops had to be rock-solid, just like the foundation of a skyscraper. The code that called these modules could be a little less tested, but it still should be rigorously proven out. When I think of some modern software, it seems more like a user-interface mountaintop held up by some bamboo sticks, ready to collapse at any moment.
Team…Building…
To build a successful skyscraper you need architects, engineers, and interior decorators. The architect has to mediate between the fanciful interior decorators and the down-to-earth engineers. If the decorators want some giant unsupported glass wall, the architect has to check with the engineers to make sure this is possible. If the architect wants some cantilevered open-span structure, she knows to check with the engineer to make sure it won’t collapse.
When I have worked with website software teams, I noticed they were mostly interior decorators, with a few programmers. There were no website architects. The marketing or design people wanted snazzy rounded corners and pop-up light-boxes and all kinds of trendy features. The engineers just put their nose to the grindstone and tried to deliver. Then the marketing people were surprised to see the website slow, buggy, and not compatible with all browsers. Nothing was architected. It was non-technical people demanding things work a certain way, and powerless engineers in the trenches trying to make things work that way.
I saw one engineer spend a day trying to make nice radiused corners on a box. When his boss gave him guff for taking so much time, he explained that getting rounded corners on all browsers without using non-approved JavaScript was not easy. When they told the marketing types, the reply was, “Why didn’t you tell us? We could have lived with square corners.” No architect.
That same programmer used to work at eBay many years ago. He noted a vice president had come from Oracle and insisted that the customer-facing servers be 170 Windows computers. Being Windows, it was necessary to reboot the machines every few days so that the inevitable Windows memory leaks didn’t crash the computer.
Problem was, as long as any given customer would have an eBay session open, you could not just reboot and cut him off. So the system administration was designed to stop any new pages views on that one machine, while waiting for all existing sessions to end. Then the computer could be rebooted and start to accept new page requests. It might take days.
It’s unusual a former Oracle guy would be so adamant about how a front-end system should work. Oracle is more of a database company, concerned with the “back end” part of website functionality. My pal said the VP finally left, and eBay installed middleware that could assign any disparate page request to whatever machine was available. Then it was trivial to do a reboot without destroying an ongoing user interaction.
To me, that says one of the biggest websites on earth didn’t have an architect. We all know how this happens. Something is slapped together to get working fast. Things are added onto that. Nobody ever just tosses all of the code and starts over. I wish they would; I still find eBay infuriating for many reasons.
Another example of how hard it is to do software involves Linux versus a microkernel operating system. Before Linus Torvalds came up with Linux in a nine-month session of binge-work, the hot topic was making a UNIX-like operating system using a microkernel. There would not be one large body of code in a kernel. Instead, there would a slew of small microkernels to do various OS tasks independently. Richard Stallman was championing this approach and working hard to come up with a working system. Torvalds abandoned the microkernel, and just did a conventional one. He made it to market, whereas Stallman’s effort stalled. Microkernels are a great intellectual exercise for college professors, but it’s just too hard to debug and get working. Stallman has admitted as much.
Some programmers like to emulate a computer, mentally doing one instruction after another. I had a boss that called this “linear brained.” They are what my one software pals calls procedural programmers, who think like a flow chart. They don’t so much as think as parse a linear stream of deterministic instructions.
My pal told me it’s much harder to do event-driven programming. Here the user might press a button on his Palm Pilot and the program has to drop everything, service that button, decide what to do based on where it was, and store or pop the previous event on the stack so it can do something new. A hardware guy like me would call it interrupt-driven programming. To realize how hard it is to do microkernels or event-driven programming, think about how you would implement an “undo” function in such a system.
Modes of Modeling
The Atlantic article talks about model-based design like it’s a new thing. Companies like ANSYS were talking about model-based design years ago. Simulink works in Matlab to characterize multi-domain dynamic systems using model-based design, as seen in this video:
If model-based design gets you a working system sooner, that’s great. Mentor Graphics sells software to help understand the interaction of multiple CAN (controller area network) buses. Modern autos often have multiple CAN buses as well as other buses such as LIN, MOST, and VAN. With 100 million line of code in a modern car, the engineers need all of the tools they can get to help ensure the software runs as intended.
Companies like National Instruments gave us LabVIEW three decades ago. This allows for better architecture, since it’s so visual. You can finally see the software and how it works. There are pretty little boxes you hook up, that can be understood by other programmers at a glance. Then again, there’s also a place for a few lines of old-fashioned code, which LabVIEW will let you put inline. It’s the best of both worlds.
I see Arduino as a hopeful future for software. It took me less than an hour to get an Arduino board communicating over the USB port, while sensing switches and lighting LEDs. I had done assembly language programming, but not C or C+. Linear Technology, now part of Analog Devices, has demo systems they call Linduino, since it builds on the Arduino integrated development environment to take large amount of data from an LT chip, such as an analog-to-digital converter.
The problem with any design, model-based or otherwise, is that you have to understand the model. It was the night before the show. I was working with my boss on a prototype wafer-scrubbing machine that had to be at the show the next day. He had designed the machine to run on an RS-485 bus. There was one cable that would send commands to various parts of the machine. It was a very neat idea.
It was working OK until the night before the show. Then it started breaking wafers. It took most of the night to realize that when we sent the “find home” instruction to the wafer handler, it would run the stepper motor until it found home. But the reply that it was at home got clobbered by other traffic on the bus. So the system would send another “find home” instruction. That one tended to get clobbered, too; there was all kinds of asynchronous traffic on the bus. After the third or fourth “find home” instruction, one of the “at home” replies got through to the operating system. Thing is, it was not at home—it was many steps past home. Hence the broken wafers. That was a hard lesson in how you have to acknowledge instructions and replies on a bus, or things go haywire. No amount of model-based design would have helped us. We hardware guys just didn’t understand good bus protocol.
The consultant that architected OrCAD told me that he doesn’t start writing code until he has three different ways to solve the problem. Then he evaluates all three for those unintended consequences we all suffer with. A wag has noted, “The best programming language is the language your best programmer likes best.” The thing is, software is a language, and it’s one we hardware folks don’t always get
I think of the Matrix movie where the guy could look at the numbers and symbols dripping down the screen and see what was happening. Sometimes software can be the most abstract mathematics made real and beautiful. Other times, it can be a programmer punishing the world for teasing him in Junior High School. We are going to get both, but its sure to be more of the beautiful as this new thing called software development gets improved and perfected.