Friday, August 1, 2003

Concurrent Development

When sheet metal is formed into a car body, a massive machine called a stamping machine presses the metal into shape. The stamping machine has a huge metal tool called a die which makes contact with the sheet metal and presses it into the shape of a fender or a door or a hood. Designing and cutting these dies to the proper shape accounts for half of the capital investment of a new car development program, and drives the critical path. If a mistake ruins a die, the entire development program suffers a huge set-back. So if there is one thing that automakers want to do right, it is the die design and cutting.

The problem is, as the car development progresses, engineers keep making changes to the car, and these find their way to the die design. No matter how hard the engineers try to freeze the design, they are not able to do so. In Detroit in the 1980’s the cost of changes to the design was 30 – 50% of the total die cost, while in Japan it was 10 – 20% of the total die cost. These numbers seem to indicate the Japanese companies must have been much better at preventing change after the die specs were released to the tool and die shop. But such was not the case.

The US strategy for making a die was to wait until the design specs were frozen, and then send the final design to the tool and die maker, which triggered the process of ordering the block of steel and cutting it. Any changes went through a arduous change approval process. It took about two years from ordering the steel to the time that die would be used in production. In Japan, however, the tool and die makers order up the steel blocks and start rough cutting at the same time the car design is starting. This is called concurrent development. How can it possibly work?

The die engineers in Japan are expected to know a lot about what a die for a front door panel will involve, and they are in constant communication with the body engineer. They anticipate the final solution and they are also skilled in techniques to make minor changes late in development, such as leaving more material where changes are likely. Most of the time die engineers are able to accommodate the engineering design as it evolves. In the rare case of a mistake, a new die can be cut much faster because the whole process is streamlined.

Japanese automakers do not freeze design points until late in the development process, allowing most changes occur while the window for change is still open. When compared to the early design freeze practices in the US in the 1980’s, Japanese die makers spent perhaps a third as much money on changes, and produced better die designs. Japanese dies tended to require fewer stamping cycles per part, creating significant production savings.

The significant difference in time-to-market and increasing market success of Japanese automakers prompted US automotive companies to adopt concurrent development practices in the 1990’s, and today the product development performance gap has narrowed significantly.

Concurrent Software Development
Programming is a lot like die cutting. The stakes are often high and mistakes can be costly, so sequential development, that is, establishing requirements before development begins, is commonly thought of as a way to protect against serious errors. The problem with sequential development is that it forces designers to take a depth-first rather than a breadth-first approach to design. Depth-first forces making low level dependant decisions before experiencing the consequences of the high level decisions. The most costly mistakes are made by forgetting to consider something important at the beginning. The easiest way to make such a big mistake is to drill down to detail too fast. Once you set down the detailed path, you can’t back up, and aren’t likely to realize that you should. When big mistakes can be made, it is best to survey the landscape and delay the detailed decisions.

Concurrent development of software usually takes the form of iterative development. It is the preferred approach when the stakes are high and the understanding of the problem is evolving. Concurrent development allows you to take a breadth-first approach and discover those big, costly problems before it’s too late. Moving from sequential development to concurrent development means starting programming the highest value features as soon as a high level conceptual design is determined, even while detailed requirements are being investigated. This may sound counterintuitive, but think of it as an exploratory approach which permits you to learn by trying a variety of options before you lock in on a direction that constrains implementation of less important features.

In addition to providing insurance against costly mistakes, concurrent development is the best way to deal with changing requirements, because not only are the big decisions deferred while you consider all the options, but the little decisions are deferred as well. When change is inevitable, concurrent development reduces delivery time and overall cost, while improving the performance of the final product.

If this sounds like magic – or hacking – it would be if nothing else changed. Just starting programming earlier, without the associated expertise and collaboration found in Japanese die cutting, is unlikely to lead to improved results. There are some critical skills that must be in place in order for concurrent development to work.

Under sequential development, US automakers considered die engineers to be quite remote from the automotive engineers, and so too, programmers in a sequential development process often have little contact with the customers and users who have requirements and the analysts who collect requirements. Concurrent development in die cutting required US automakers to make two critical changes – the die engineer needed the expertise to anticipate what the emerging design would need in the cut steel, and had to collaborate closely with the body engineer.

Similarly, concurrent software development requires developers with enough expertise in the domain to anticipate where the emerging design is likely to lead, and close collaboration with the customers and analysts who are designing how the system will solve the business problem at hand.

The Last Responsible Moment
Concurrent software development means starting developing when only partial requirements are known and developing in short iterations which provide the feedback that causes the system to emerge. Concurrent development makes it possible to delay commitment until the Last Responsible Moment, that is, the moment at which failing to make a decision eliminates an important alternative. If commitments are delayed beyond the Last Responsible Moment, then decisions are made by default, which is generally not a good approach to making decisions.

Procrastinating is not the same as making decisions at the Last Responsible Moment; in fact, delaying decisions is hard work. Here are some tactics for making decisions at the Last Responsible Moment:

  • Share partially complete design information. The notion that a design must be complete before it is released is the biggest enemy of concurrent development. Requiring complete information before releasing a design increases the length of the feedback loop in the design process and causes irreversible decisions to be made far sooner than necessary. Good design is a discovery process, done through short, repeated exploratory cycles.
  • Organize for direct, worker-to-worker collaboration.  Early release of incomplete information means that the design will be refined as development proceeds. This requires that upstream people who understand the details of what the system must do to provide value must communicate directly with downstream people who understand the details of how the code works.
  • Develop a sense of how to absorb changes.  In ‘Delaying Commitment,’ IEEE Software (1988), Harold Thimbleby observes that the difference between amateurs and experts is that experts know how to delay commitments and how to conceal their errors for as long as possible. Experts repair their errors before they cause problems. Amateurs try to get everything right the first time and so overload their problem solving capacity that they end up committing early to wrong decisions. Thimbleby recommends some tactics for delaying commitment in software development, which could be summarized as an endorsement of object-oriented design and component-based development:
  • Use Modules.  Information hiding, or more generally behavior hiding, is the foundation of object-oriented approaches. Delay commitment to the internal design of the module until the requirements of the clients on the interfaces stabilize. 
  • Use Interfaces. Separate interfaces from implementations. Clients should not de-pend on implementation decisions.
  • Use Parameters.Make magic numbers – constants that have meaning – into parameters. Make magic capabilities like databases and third party middleware into parameters. By passing capabilities into modules wrapped in simple interfaces, your dependence on specific implementations is eliminated and testing becomes much easier.
  • Use Abstractions.  Abstraction and commitment are inverse processes. Defer commitment to specific representations as long as the abstract will serve immediate design needs.
  • Avoid Sequential Programming.  Use declarative programming rather than procedural programming, trading off performance for flexibility. Define algorithms in a way that does not depend on a particular order of execution.
  • Beware of custom tool building.  Investment in frameworks and other tooling frequently requires committing too early to implementation details that end up adding needless complexity and seldom pay back. Frameworks should be extracted from a collection of successful implementations, not built on speculation.
Additional tactics for delaying commitment include:
  • Avoid Repetition. This is variously known as the Don’t Repeat Yourself (DRY) or Once And Only Once (OAOO) principle. If every capability is expressed in only one place in the code, there will be only one place to change when that capability needs to evolve and there will be no inconsistencies.
  • Separate Concerns.  Each module should have a single well defined responsibility. This means that a class will have only one reason to change. 
  • Encapsulate Variation. What is likely to change should be inside, the interfaces should be stable. Changes should not cascade to other modules. This strategy, of course, depends on a deep understanding of the domain to know which aspects will be stable and which variable. By application of appropriate patterns, it should be possible to extend the encapsulated behavior without modifying the code itself. 
  • Defer Implementation of Future Capabilities. Implement only the simplest code that will satisfy immediate needs rather than putting in capabilities you ‘know’ you will need in the future. You will know better in the future what you really need then and simple code will be easier to extend then if necessary.
  • Avoid extra features. If you defer adding features you ‘know’ you will need, then you certainly want to avoid adding extra features ‘just-in-case’ they are needed. Extra features add an extra burden of code to be tested and maintained, and understood by programmers and users alike. Extra features add complexity, not flexibility.
Much has been written on these delaying tactics, so they will not be covered in detail in this book.
  • Develop a sense of what is critically important in the domain.  Forgetting some critical feature of the system until too late is the fear which drives sequential development. If security, or response time, or fail safe operation are critically important in the domain, these issues need to be considered from the start; if they are ignored until too late, it will indeed be costly. However, the assumption that sequential development is the best way to discover these critical features is flawed. In practice, early commitments are more likely to overlook such critical elements than late commitments, because early commitments rapidly narrow the field of view.
  • Develop a sense of when decisions must be made.  You do not want to make decisions by default, or you have not delayed them. Certain architectural concepts such as usability design, layering and component packaging are best made early, so as to facilitate emergence in the rest of the design. A bias toward late commitment must not degenerate into a bias toward no commitment. You need to develop a keen sense of timing and a mechanism to cause decisions to be made when their time has come.
  • Develop a quick response capability. The slower you respond, the earlier you have to make decisions. Dell, for instance, can assemble computers in less than a week, so they can decide what to make less than a week before shipping. Most other computer manufacturers take a lot longer to assemble computers, so they have to decide what to make much sooner. If you can change your software quickly, you can wait to make a change until customers’ know what they want.
Cost Escalation
Software is different from most products in that software systems are expected to be upgraded on a regular basis. On the average, more than half of the development work that occurs on a software system occurs after it is first sold or placed into production. In addition to internal changes, software systems are subject to a changing environment – a new operating system, a change in the underlying database, a change in the client used by the GUI, a new application using the same database, etc. Most software is expected to change regularly over its lifetime, and in fact once upgrades are stopped, software is often nearing the end of its useful life. This presents us with a new category of waste, that is, waste caused by software that is difficult to change.

In 1987 Barry Boehm wrote, “Finding and fixing a software problem after delivery costs 100 times more than finding and fixing the problem in early design phases”. This observation became been the rational behind thorough up front requirements analysis and design, even though Boehm himself encouraged incremental development over “single-shot, full product development.” In 2001, Boehm noted that for small systems the escalation factor can be more like 5:1 than 100:1; and even on large systems, good architectural practices can significantly reduce the cost of change by confining features that are likely to change to small, well-encapsulated areas.

There used to be a similar, but more dramatic, cost escalation factor for product development. It was once estimated that a change after production began could cost 1000 times more than if the change had been made in the original design. The belief that the cost of change escalates as development proceeds contributed greatly to the standardizing the sequential development process in the US. No one seemed to recognize that the sequential process could actually be the cause of the high escalation ratio. However, as concurrent development replaced sequential development in the US in the 1990’s, the cost escalation discussion was forever altered. The discussion was no longer how much a change might cost later in development; the discussion centered on how to reduce the need for change through concurrent engineering.

Not all change is equal. There are a few basic architectural decisions that you need to get right at the beginning of development, because they fix the constraints of the system for its life. Examples of these may be choice of language, architectural layering decisions, or the choice to interact with an existing database also used by other applications. These kinds of decisions might have the 100:1 cost escalation ratio. Because these decisions are so crucial, you should focus on minimizing the number of these high stakes constraints. You also want to take a breadth-first approach to these high stakes decisions.

The bulk of the change in a system does not have to have a high cost escalation factor; it is the sequential approach that causes the cost of most changes to escalate exponentially as you move through development. Sequential development emphasizes getting all the decisions made as early as possible, so the cost of all changes is the same – very high. Concurrent design defers decisions as late as possible. This has four effects:
  • Reduces the number of high-stake constraints.
  • Gives a breadth-first approach to high-stakes decisions, making it more likely that they will be made correctly.
  • Defers the bulk of the decisions, significantly reducing the need for change.
  • Dramatically decreases the cost escalation factor for most changes.
A single cost escalation factor or curve is misleading. Instead of a chart showing a single trend for all changes, a more appropriate graph has at least two cost escalation curves, as show in Figure 3-1. The agile development objective is to move as many changes as possible from the top curve to the bottom curve.

Figure 3-1. Two Cost Escalation Curves
Returning for a moment to the Toyota die cutting example, the die engineer sees the conceptual design of the car and knows roughly the size of door panel is necessary. With that information, a big enough steel block can be ordered. If the concept of the car changes from a small, sporty car to a mid-size family car, the block of steel may be too small, and that would be a costly mistake. But the die engineer knows that once the overall concept is approved, it won’t change, so the steel can be safely ordered, long before the details of the door emerge. Concurrent design is a robust design process because the die adapts to whatever design emerges.

Lean software development delays freezing all design decisions as long as possible, because it is easier to change a decision that hasn’t been made. Lean software development emphasizes developing a robust, change-tolerant design, one that accepts the inevitability of change and structures the system so that it can be readily adapted to the most likely kinds of changes.

The main reason why software changes throughout its lifecycle is that the business process in which it is used evolves over time. Some domains evolve faster than others, and some domains may be essentially stable. It is not possible to build in flexibility to accommodate arbitrary changes cheaply. The idea is to build tolerance for change into the system along domain dimensions that are likely to change. Observing where the changes occur during iterative development gives a good indication of where the system is likely to need flexibility in the future. If changes of certain types are frequent during development, you can expect that these types of changes will not end when the product is released. The secret is to know enough about the domain to maintain flexibility, yet avoid making things any more complex than they must be.

If a system is developed by allowing the design to emerge through iterations, the design will be robust, adapting more readily to types of changes that occur during development. More importantly, the ability to adapt will be built-in to the system, so that as more changes occur after its release, they can be readily incorporated. On the other hand, if systems are built with a focus on getting everything right at the beginning in order to reduce the cost of later changes, their design is likely to be brittle and not accept changes readily. Worse, the chance of making a major mistake in the key structural decisions is increased with a depth-first, rather than a breadth-first approach.