Monday, March 18, 2002

Lean Design

For over a decade, a manufacturing metaphor has been used to bring about improvements in software development practices.  But even the originators of the metaphor recognize that it’s time for a change.  From SEI’s COTS-Based Systems (CBS) Initiative we hear:[1]
“Indeed, to many people software engineering and software process are one and the same thing. An entire industry has emerged to support the adoption of CMM or ISO-9000 models, and process improvement incentives have played a dominant role in defining roles and behavior within software development organizations. The resulting roles and behaviors constitute what we refer to as the process regime.
“The process regime was born of the software crisis at a time when even large software systems were built one line of code at a time. With some logic it established roles and behaviors rooted in a manufacturing metaphor, where software processes are analogous to manufacturing processes, programmers are analogous to assembly-line workers, and the ultimate product is lines of code. When viewed in terms of software manufacturing, improvements in software engineering practice are equated with process improvement, which itself is centered on improving programmer productivity and reducing product defects. Indeed, the manufacturing metaphor is so strong that the term software factory is still used to denote the ideal software development organization.

“The process regime might have proven adequate to meet the software crisis, or at least mitigate its worst effects, but for one thing: the unexpected emergence of the microprocessor and its first (but not last!) offspring, the personal computer (PC). The PC generated overwhelming new demand for software far beyond the  capacity of the conventional software factory to produce.

“The response to the growing gap between supply and demand spawned an impressive range of research efforts to find a technological “silver bullet.” The US government funded several large-scale software research efforts totaling hundreds of millions of dollars with the objective of building software systems “better, faster and cheaper.” While the focused genius of software researchers chipped away at the productivity gap, the chaotic genius of the free market found its own way to meet this demand—through commercial software components.

“The evidence of a burgeoning market in software components is irrefutable and overwhelming. Today it is inconceivable to contemplate building enterprise systems without a substantial amount of the functionality of the system provided by commercial software components such as operating systems, databases, message brokers, Web browsers and servers, spreadsheets, decision aids, transaction monitors, report writers, and system managers.

“As many organizations are discovering, the traditional software factory is ill equipped to build systems that are dominated by commercial software components. The stock and trade of the software factory—control over production variables to achieve predictability and then gradual improvement in quality and productivity—is no longer possible. The software engineer who deals with component-based systems no longer has complete control over how a system is partitioned, the interfaces are between these partitions, or how threads of control are passed or shared among these partitions. Traditional software development processes espoused by the process regime and software factory that assume control over these variables are no longer valid. The process regime has been overthrown, but by what?

“Control has passed from the process regime to the market regime. The market regime consists of component producers and consumers, each behaving, in the aggregate, according to the laws of the marketplace.

“The organizations that have the most difficulty adapting to the component revolution are those that have failed to recognize the shift from the process to the market regime and the loss of control that is attendant in this shift. Or, having recognized the shift, they are at a loss for how to accommodate it.
So it’s official, the manufacturing metaphor for software development improvement is needs to be replaced, but with what?  Let’s look to Lean Thinking for a suggestion.

How Programmers Work
A fundamental principle of Lean Thinking is that the starting point for improvement is to understand, in detail, how people actually do their work.   If we look closely at how software developers spend their time, we see that they do these things in sequence:  {analyze–code–build–test}.   First they figure out how they are going to address a particular problem, then they write code, then they do a build and run the code to see if it indeed solves the problem, and finally, they repeat the cycle.  Many times.  This is how programmers work.

An interesting thing about software development is that this cycle:  {analyze–code–build–test}, occurs both in the large and in the small.  Every large section of software will pass through this cycle (many times), but so will every small section of code.  A developer may go through these steps several times a day, or even, many times per hour.  Generally there is no particular effort, nor any good reason, to get the code exactly right the first time.  Try it, test it, fix it is a far more efficient approach to programming than perfection in the first draft.  Just as writers go through several drafts to create a finished piece of work, so do software developers.

The Wrong Metaphor
Because the software development cycle occurs both in the large and in the small, there have been attempts to divide the software development cycle and give each piece of the cycle to a different person.  So for instance, someone does the analysis, another person does the design, someone else writes code, a clerk does an occasional build, and QC people run tests.  This ‘assembly line’ approach to software development comes from the manufacturing metaphor, and basically, it just doesn’t work.

The reason a manufacturing metaphor does not work for development is because development is not sequential, it is a cycle of discovery. The {analyze–code–build–test} cycle is meant to be repeated, not to happen only once.  Further, as ideas and information move through this cycle, two things must be assured.  First,  information must not be lost through handoffs, and second, feedback from the cycle must be as short as possible.

The manufacturing metaphor violates both of these requirements.  First of all, handing off information in a written format will convey at best half of the information known to those who write the documents.  The tacit knowledge buried in the minds of the writers simply does not make it into written reports.  To make matters worse, writing down information to pass along to the next step in the cycle introduces enormous waste and dramatically delays feedback from one cycle to the next. 

This second point is critically important.  The cycle time for feedback from the test phase of the cycle should be in minutes or hours; a day or two at the outside.  Dividing the development cycle among different functions with written communication between them stretches the cycle out to the point of making feedback difficult, if not impossible.

Some may argue with the premise that software development is best done by using a discovery cycle.  They feel that developers should be able to write code and ‘Get it right the First Time”.  This might make sense in manufacturing, where people make the same thing repeatedly.  Software development, however, is a creative activity.  You would never want a developer to be writing the same code over and over again.  That’s what computers are for.

The Difference Between Designing and Making
Glenn Ballard of the Lean Construction Institute (LCI) sheds some light on this topic in a paper called “Positive vs Negative Iteration in Design”.  He draws a clear distinction between the two activities of designing and making.  He points out, “This is the ancient distinction between thinking and acting, planning and doing.  One operates in the world of thought; the other in the material world.”  Ballard summarizes the difference between designing and making in this manner:

The important thing to notice is that the goals of ‘designing’ and ‘making’ are quite different.  Designing an artifact involves understanding and interpreting the purpose of the artifact.  Making an artifact involves conforming to the requirements expressed in the design, on the assumption that the design accurately realizes the purpose.

A striking difference between designing and making is the fact that variability of outcomes is desirable during design, but not while making.  In fact, design is a process of finding and evaluating multiple solutions to a problem, and if there were no variability, the design process would not be adding much value.  As a corollary, Ballard suggests that iteration creates value in design, while it creates waste (rework) in making.  To put it another way, the slogan “Do it Right the First Time” applies to making something after the design is complete, but it should not be applied to the design process.

In the {analyze–code–build–test} cycle, notice that both analyzing and coding are design work.  There are many ways to create a line of code; individual developers are making decisions every minute they are writing code.  There is no recipe to tell them exactly how to do things.  They are writing the recipe for the computer to follow.  It is not until we get to the ‘build’ stage of the cycle that we find ‘making’ activity.  And indeed, all of the rules of ‘making’ apply to a software build:  No one should break the build, and every build with the same inputs, had better get the same outputs.

This brings us to the last step of the software development cycle:  test.  Is testing ‘designing’ or ‘making’ or yet a third element?  In fact, designing tests is a creative activity, often part of the design.  Further, the results of tests are continually fed back into the design to improve it.  So in a very real sense, the test step is ‘designing’, not ‘making’.  Further, the ‘test’ step is what causes the cycle to loop back and repeat, it is what makes development work into a cycle in the first place.  In making, it is not desirable to test and rework; however, in development, repeating the cycle is the essence of doing work.  Development is basically an experimental activity.

There are other well-known cycles that bear mentioning here, and all of them end with a step which causes the cycle to repeat.  Some examples are:
  1. The Scientific Method:  {Observe – Create a Theory – Predict from the Theory – Test the Predictions}  Graduate students know this well.
  2. The Development Approach:  {Discover – Assemble – Assess} This bears a striking (and not accidental) resemblance to the software development cycle.
  3. The Demming Cycle:  {Plan – Do – Check} – Act.  This is a three-step cycle {Plan – Do – Check}, followed by – Act once the cycle yields results.   A more complete definition of the Demming Cycle is: {Identify Root Causes of Problems – Develop and Try a Solution – Measure the Results}  Repeat Until a Solution is Proven, then – Standardize the Solution.  Demming taught that all manufacturing processes should be continually improved using this cycle.
In Search of Another Metaphor
If software developers spend their days in a continual {cycle of design–code–build–test}, we might gain insight if we find other workers who use a similar cycle.  In this quest we might eliminate workers in manufacturing, who are not involved in designing the product they produce.  On the other hand, in Lean Manufacturing, workers are continually involved in redesigning their work processes.  Despite this, it seems that software developers more closely resemble product designers than product makers, because a large portion of software development time involves designing the final product, both in the large and in the small.  But unlike product developers, software developers not only design, but also produce and test their product.

We might compare software developers to the skilled workers in construction, who often do a lot of on-site design before they actually produce work.  An electrician, for instance, must understand the use of the room to locate outlets, and must take framing, HVAC and plumbing into account when routing wires.   Software developers might also be thought of as artists and craftsmen, who routinely extend the design process right into the making process.

Learning Lessons from Metaphors
New Product Development, Skilled Construction Workers, Artists and Craftsmen – as we attempt to learn from these metaphors we must also take care not to go too far, as happened with the manufacturing metaphor.  The careful use of a metaphor involves abstracting to a common base between disciplines, and then applying the abstraction to the new discipline (software development) in a manner appropriate to the way work actually occurs in that discipline.   

Three useful abstractions come immediately to mind as we apply design and development metaphors to software development:

Abstraction 1:  Emphasize ‘Designing’ Values, not ‘Making’ Values
Code should not be expected to “Conform to Requirements” or be “Right the First Time”.  These are ‘making’ values.  Instead, software should be expected to be “Fit for Use” and “Realize the Purpose” of those who will be using it.   Disparaging software changes as ‘rework’ exemplifies the misuse of a ‘making’ value.  Since software development is mostly about designing, not making, the correct value for software development is precisely the opposite.  Iterations are good, not bad.  They lead to a better design.

Ballard states that:  “Designing can be likened to a good conversation, from which everyone leaves with a better understanding than anyone brought with them…  Design development makes successively better approaches on the whole design, likegrinding a gem, until it gets to the desired point….”

Abstraction 2:  Compress the {Design–Code–Build–Test} Cycle Time
Once we recognize that the {design–code–build–test cycle} is the fundamental element of work in software development, then principles of lean thinking suggest that compressing this cycle will generate the best results.  Compressing cycle time makes feedback immediate, and thus allows for a system to rapidly respond to both defects and change.

Based on this hypothesis, we may predict that the effectiveness of Extreme Programming comes from its dramatic compression of the {design–code–build–test cycle}.  Pair programming works to shorten design time because design reviews occur continuously, just as design occurs continuously.  Writing test code before production code radically reduces the time from coding to testing, since tests are run immediately after code is written.   The short feedback loop of the {design–code–build–test cycle} in all agile practices is a key reason why they produce working code very quickly.

Abstraction 3:  Use Lean Design Practices to Reduce Waste
Not all design iteration is good; iterations must add value and lead to convergence.  Many times a design will pass from one function to another, each adding comments and changes, causing more comments and changes, causing another round of comments and changes, in a never-ending cycle.  This kind of iteration does not produce value, and is thus ‘negative iteration’ or ‘waste’.

Ballard suggests the following ‘Lean Design’ techniques to reduce negative iteration, or in other words, obtain design convergence:
  1. Design Structure Matrix.   Steven Eppinger’s article “Innovation at the Speed of Information” in the January 2001 issue of Harvard Business Review suggests that design management should focus on information flows, not task completions, to achieve the most effective results. The Design Structure Matrix is a tool that answers the question: “What information do I need from other tasks before I can complete this one?”
  2. Cross Functional Teams.  Cross-functional teams which collaborate and solve problems are today’s standard approach for rapid and robust design with all interested parties contributing to decisions.  One thing to remember is to ‘let the team manage the team’.
  3. Concurrent Design / Shared Incomplete Information.    Sequential processing results in part from the assumption that only complete information should be shared.  Sharing incomplete information allows concurrent design to take place.  This both shortens the feedback loop and allows others to start earlier on their tasks.
  4. Reduced Batch Sizes.   Releasing small batches of work allows downstream work to begin earlier batches and provides for more level staffing.  It also is the best mechanism for finding and fixing problems early, while they are small, rather than after they have multiplied across a large batch.
  5. Pull Scheduling.  Ballard notes:  “The Lean Construction Institute recommends producing such a work sequence by having the team responsible for the work being planned to work backwards from a desired goal; i.e., by creating a 'pull schedule'. Doing so avoids incorporation of customary but unnecessary work, and yields tasks defined in terms of what releases work and thus contributes to project completion.”
  6. Design Redundancy. When it is necessary to make a design decision in order to proceed, but the task sequencing cannot be structured to avoid future changes, then the best strategy may be to choose a design to handle a range of options, rather than wait for precise quantification.  For example, when I was a young process control engineer, I used to specify all process control computers with maximum memory and disk space, on the theory that you could never have enough.  In construction, when structural loads are not known precisely, the most flexible approach is often to design for maximum load.
  7. Deferred Commitment / Least Commitment.  Ballard writes:  “Deferred commitment is a strategy for avoiding premature decisions and for generating greater value in design. It can reduce negative iteration by simply not initiating the iterative loop. A related but more extreme strategy is that of least commitment; i.e., to systematically defer decisions until the last responsible moment; i.e., until the point at which failing to make the decision eliminates an alternative. Knowledge of the lead times required for realizing design alternatives is necessary in order to determine last responsible moment.
  8. Shared Range of Acceptable Solutions  / Set-Based Design.   The most rapid approach to arriving at a solution to a design problem is for all parties to share the range of acceptable solutions and look for an overlap.  This is also called set-based design, and is widely credited for speeding up development at Toyota, decreasing the need for communication, and increasing the quality of the final products.
These eight Lean Construction techniques, particularly set-based design, is being tested in construction and expected to result in dramatic improvements in design time (~50%) and construction time (~30%).  In addition, work can be leveled throughout the project, better, more objective decisions are expected.

The following two additional Lean Design techniques are particularly applicable to software development:
  1. Frequent Synchronization.  It is widely recognized in software development that daily (or more frequent) builds with automated testing is the best way to build a robust system rapidly.
  2. The Simplest ‘Spanning Application’ Possible.  This is a software development technique  particularly good for testing component ensembles and legacy system upgrades. The idea is not to implement module-by-module, but implement a single thread across the entire system, so as to test the interactions of all parts of a system along a narrow path.

[1] From draft version of Chapter 1 of Building Systems from Commercial Components,  [Addison-Wesley, 2001] by Kurt Wallnau, Scott Hissam, and Robert Seacord; downloaded from SEI COTS-Based Initiative website.

Screen Beans Art, © A Bit Better Corporation