Thursday, April 21, 2005
Breaking The Quality–Speed Compromise
In the book Hardball, George Stalk notes that when an industry imposes a compromise on its customers, the company that breaks the compromise stands to gain a significant competitive advantage. For example, the airline industry imposes a big compromise on travelers: if you want low cost tickets, you have to make your plans early and pay a stiff penalty to change them. Southwest Airlines breaks this compromise: its customers can apply the cost of unused tickets to a future flight without a change fee.
In the software development industry, we impose many compromises on our customers. We tell them that high quality software takes a lot of time; we ask them to decide exactly what they want when they don’t really know; we make it clear that changes late in the development process will be very expensive. There’s a significant competitive advantage waiting for companies that can break these compromises. In particular, I’d like to focus on breaking the compromise between quality and speed, because many companies have achieved great leverage by competing on the basis of time.
When I teach classes on Lean Software Development, the first thing we do is draw value stream maps of existing software development processes. Starting with a customer request, the class draws each step that the request goes through as it is turned into deployed software which solves the customer’s problem. The average time for each step is noted, as well as the time between steps, giving a picture of the total time it takes to respond to a customer.
Next the class determines how much of the time between request and deployment is spent actually working on the problem. Typically, less than 20% of the total time is spent doing work on the request; for 80+% of the time the request is waiting in some queue. For starters, driving down this queue-time will let us deliver software much faster without compromising quality.
But reducing wait time is not the only opportunity for faster software development. Typically the value stream maps in my classes show a big delay just before deployment, at a step which is usually called ‘verification’. Now, I don’t have any problem with verification just before deployment, but when I ask, “Do you find any defects during verification?” the answer is always “Yes.” Therein lies the problem. When a computer hits the end of Dell’s assembly line, it is powered on and it is expected to work. The verification step is not the time to find defects; by the time software hits verification, it should work.
The way to get rid of the big delay at verification is to move testing closer to coding – much closer. In fact, testing should happen immediately upon coding; if possible the test should have been written before the code. New code should be integrated into the overall system several times a day, with a suite of automated unit tests run each time. Acceptance tests for a feature should pass as soon as the feature is complete, and regression testing should be run on the integrated code daily or perhaps weekly.
Of course, this testing regime is not feasible with manual testing, automated unit and acceptance tests are required. While this may have been impractical a few years ago, the tools exist today to make automated testing practical. Obviously not all tests can be automated and not all automated test suites are fast enough to run frequently. But there are many ways to make automated testing more effective; for example, each layer is usually tested separately – ie. the business rules are tested below the GUI with most database calls mocked out.
In most of the value stream maps I see in my classes, there is a huge opportunity to move tests far forward in the process and catch defects at their source. Many companies spend a great deal of time tracking, prioritizing, and fixing a long queue of defects. Far better to never let a defect into the queue in the first place.
There is another area of my classes’ value stream maps that raises a flag. Toward the beginning of the map there is usually a step called ‘requirements’ which often interacts with a queue of change requests. Dealing with change requests takes a lot of time and approved changes create significant churn. There has been a feeling that if only we could get the requirements right, this ‘change churn’ would go away. But I generally find that the real problem is that the requirements were specified too early, when it was not really clear what was needed. The way to reduce requirements churn is to delay the detailed clarification of requirements, moving this step much closer to coding. This greatly reduces the change request queue, because you don’t need to change a decision that has not yet been made!
Toward the end of my classes, we draw a future value stream map, and invariably the new value stream maps show a dramatically shortened cycle time, the result of eliminating wait time, moving tests forward, and delaying detailed specification of requirements. We usually end up with a process in which cross-functional teams produce small, integrated, tested, deployment-ready packages of software at a regular cadence.
This kind of software development process exposes another compromise: conventional wisdom says that changes late in the development cycle are costly. If we are developing small bits of code without full knowledge of everything that the system will require, then we are going to have to be able to add new features late in the development process at about the same cost as incorporating them earlier.
The cost of adding or changing features depends on three things: the size of the change, the number of dependencies in the code, and whether or not the change is structural. Since we just agreed to keep development chunks small, let’s also agree to keep changes small. Then let’s agree that we are going to get the structural stuff right – including proper layering, modularization that fits the domain, appropriate scalability, etc.
We are left to conclude that the cost of non-structural change depends on the complexity of the code. There are several measurements of complexity, including the number of repetitions (the target is zero), the use of patterns (which reduce complexity), and McCabe scores (the number of decisions in a module). It has been shown that code with low complexity scores has the fewest defects, so a good measure of complexity is the number of defects.
Which brings us back to our testing regime. The most important thing we can do to break the compromises we impose on customers is to move testing forward and put it in-line with (or prior to) coding. Build suites of automated unit and acceptance tests, integrate code frequently, run the tests as often as possible. In other words, find and fix the defects before they even count as defects.
Companies that respond to customers a lot faster than their industry average can expect to grow three times faster and enjoy twice the profits of their competitors. So there is a lot of competitive advantage available for the software development organization that can break the speed–quality compromise, and compete on the basis of time.
 Michael L. George and Stephen A. Wilson, Conquering Complexity in Your Business, McGraw-Hill, 2004, p.48.
 George Stalk and Rob Lachenauer, Hardball: Are you Playing to Play or Playing to Win, Harvard Business School Press, 2004
 George Stalk, Competing against Time: How time-based Competition is Reshaping Global Markets, Free Press, 2003, originally published in 1990. p.4.
Screen Beans Art, © A Bit Better Corporation