Lean Essays: 2002

November 7, 2002

Principles of Lean Thinking

Abstract
In the 1980’s, a massive paradigm shift hit factories throughout the US and Europe. Mass production and scientific management techniques from the early 1900’s were questioned as Japanese manufacturing companies demonstrated that ‘Just-in-Time’ was a better paradigm. The widely adopted Japanese manufacturing concepts came to be known as ‘lean production’. In time, the abstractions behind lean production spread to logistics, and from there to the military, to construction, and to the service industry. As it turns out, principles of lean thinking are universal and have been applied successfully across many disciplines.

Lean principles have proven not only to be universal, but to be universally successful at improving results. When appropriately applied, lean thinking is a well-understood and well-tested platform upon which to build agile software development practices.

Introduction
Call a doctor for a routine appointment and chances are it will be scheduled a few weeks later. But one large HMO in Minnesota schedules almost all patients within a day or two of their call, for just about any kind of medical service. A while ago, this HMO decided to worked off their schedule backlogs by extending their hours, and then vary their hours slightly from week to week to keep the backlog to about a day. True, the doctors don’t have the comforting weeks-long list of scheduled patients, but in fact, they see just as many patients for the same reasons as they did before. The patients are much happier, and doctors detect medical problems far earlier than they used to.

The idea of delivering packages overnight was novel when Federal Express was started in 1971. In 1983, a new company called Lens Crafters changed the basis of competition in the eyeglasses industry by assembling prescription glasses in an hour. The concept of shipping products the same day they were ordered was a breakthrough concept when LL Bean upgraded its distribution system in the late 1980’s. Southwest Airlines, one of the few profitable airlines these days, saves a lot of money with its unorthodox method of assigning seats as people arrive at the airport. Dell maintains profitability in a cutthroat market by manufacturing to order in less than a week. Another Austin company builds custom homes in 30 days.

The common denominator behind these and many other industry-rattling success stories is lean thinking. Lean thinking looks at the value chain and asks: How can things be structured so that the enterprise does nothing but add value, and does that as rapidly as possible? All the intermediate steps, all the intermediate time and all the intermediate people are eliminated. All that’s left are the time, the people and the activities that add value for the customer.

Origins of Lean Thinking
Lean thinking got its name from a 1990’s best seller called The Machine That Changed the World : The Story of Lean Production[1]. This book chronicles the movement of automobile manufacturing from craft production to mass production to lean production. It tells the story of how Henry Ford standardized automobile parts and assembly techniques, so that low skilled workers and specialized machines could make cheap cars for the masses. The book goes on to describe how mass production provided cheaper cars than the craft production, but resulted an explosion of indirect labor: production planning, engineering, and management. Then the book explains how a small company set its sights on manufacturing cars for Japan, but it could not afford the enormous investment in single purpose machines that seemed to be required. Nor could it afford the inventory or large amount of indirect labor that seemed necessary for mass production. So it invented a better way to do things, using very low inventory and moving decision-making to production workers. Now this small company has grown into a large company, and the Toyota Production System has become known as ‘lean production’.

“The mass-producer uses narrowly skilled professionals to design products make by unskilled or semiskilled workers tending expensive, single-purpose machines. These churn out standardized products at high volume. Because the machinery costs so much and is so intolerant of disruption, the mass-producer adds many buffers – extra supplies, extra workers, and extra space – to assure smooth production…. The result: The customer gets lower costs but at the expense of variety and by means of work methods that most employees find boring and dispiriting.”[2]

Think of the centralized eyeglasses laboratory. Remember that Sears used to take two or three weeks to fill orders from its once-popular catalog. Recall the long distribution channel that used to be standard in the computer market. Think dinosaurs. Centralized equipment, huge distribution centers and lengthy distribution channels were created to realize economies of scale. They are the side effects of mass-production, passed on to other industries. What people tend to overlook is that mass-production creates a tremendous amount of work that does not directly add value. Shipping eyeglasses to a factory for one hour of processing adds more handling time by far than the processing time to make the glasses. Adding retail distribution to the cutthroat personal computer industry means that a manufacturer needs six weeks to respond to changing technology, instead of six days. Sears’ practice of building an inventory of mail orders to fill meant keeping track of stacks of orders, not to mention responding to innumerable order status queries and constant order changes.

“The lean producer, by contrast, combines the advantages of craft and mass production, while avoiding the high cost of the former and the rigidity of the later… Lean production is ‘lean’ because it uses less of everything compared with mass production – half the human effort in the factory, half the manufacturing space, half the investment in tools, half the engineering hours to develop a new product in half the time. Also, it requires keeping far less than half the inventory on site, results in many fewer defects, and produces a greater and ever growing variety of products.”[3]

While on a tour of a large customer, Michael Dell saw technicians customizing new Dell computers with their company’s ‘standard’ hardware and software. “Do you think you guys could do this for me?” his host asked. Without missing a beat, Dell replied, “Absolutely, we’d love to do that.”[4] Within a couple of weeks, Dell was shipping computers with factory-installed, customer-specific hardware and software. What took the customer an hour could be done in the factory in minutes, and furthermore, computers could be shipped directly to end-users rather than making a stop in the corporate IT department. This shortening of the value chain is the essence of lean thinking.

Companies that re-think the value chain and find ways to provide what their customers value with significantly fewer resources than their competitors can develop an unassailable competitive advantage. Sometimes competitors are simply not able to deliver the new value proposition. (Many have tired to copy Dell; few have succeeded.) Sometimes competitors do not care to copy a new concept. (Southwest Airlines has not changed the industry’s approach to seat assignments.) Sometimes the industry follows the leader, but it takes time. (Almost all direct merchandise is shipped within a day or two of receiving an order these days, but the Sears catalog has been discontinued.)

Lean Thinking in Software Development
eBay is a company which pretty much invented ‘lean’ trading by eliminating all the unnecessary steps in the trading value chain. In the mid 1990’s, basic eBay software capabilities were developed by responding daily to customer requests for improvements.[5] Customers would send an e-mail to Pierre Omidyar with a suggestion and he would implement the idea on the site that night. The most popular features of eBay, those which create the highest competitive advantage, were created in this manner.

Digital River invented the software download market in the mid 1990’s by focusing on ‘lean’ software delivery. Today Digital River routinely designs and deploys sophisticated web sites for corporate customers in a matter of a weeks, by tying the corporation’s legacy databases to standard front end components customized with a ‘look and feel’ specific to each customer.

In the mid 1990’s, Microsoft implemented corporate-wide financial, purchasing and human resource packages linked to data warehouses which can be accessed via web front-ends. Each was implemented by “a handful of seasoned IT and functional experts… (who got) the job done in the time it takes a … committee to decide on its goals.”[6]

In each of these examples, the focus of software development was on rapid response to an identified need. Mechanisms were put in place to dramatically shorten the time from problem recognition to software solution. You might call it ‘Just-in-Time’ software development.

The question is – why isn’t all software developed quickly? The answer is – rapid development must be considered important before it becomes a reality. Once speed becomes a value, a paradigm shift has to take place, changing software development practices from the mass production paradigm to lean thinking.
If your company writes reams of requirements documents (equivalent to inventory), spends hours upon hours tracking change control (equivalent to order tracking), and has an office which defines and monitors the software development process (equivalent to industrial engineering), you are operating with mass-production paradigms. Think ‘lean’ and you will find a better way.

Basic Principles of Lean Development
There are four basic principles of lean thinking which are most relevant to software development:

Add Nothing But Value (Eliminate Waste)
The first step in lean thinking is to understand what value is and what activities and resources are absolutely necessary to create that value. Once this is understood, everything else is waste. Since no one wants to consider what they do as waste, the job of determining what value is and what adds value is something that needs to be done at a fairly high level. Let’s say you are developing order tracking software. It seems like it would be very important for a customer to know the status of their order, so this would certainly add customer value. But actually, if the order is in house for less than 24 hours, the only order status that is necessary is to inform the customer that the order was received, and then that it has shipped, and let them know the shipping tracking number. Better yet, if the order can be fulfilled by downloading it on the Web, there really isn’t any order status necessary at all.

To develop breakthroughs with lean thinking, the first step is learning to see waste. If something does not directly add value, it is waste. If there is a way to do without it, it is waste. Taiichi Ohno, the mastermind of the Toyota Production System, identified seven types of manufacturing waste:

Here is how I would translate the seven wastes of manufacturing to software development:

Extreme Programming (XP) is a set of practices which focuses on rapid software development. It is interesting to examine how XP works to eliminate the seven wastes of software development:

‘Do It Right The First Time’
XP advocates developing software for the current need, and as more ‘stories’ (requirements) are added, the design should be ‘refactored’ to accommodate the new stories. Is it waste to refactor software? Shouldn’t developers “Do It Right the First Time?”

It is instructive to explore the origins of the slogan “Do It Right the First Time.” In the 1980’s it was very difficult to change a mass-production plant to lean production, because in mass production, workers were not expected to take responsibility for the quality of the product. To change this, the management structure of the plant had to change. “Workers respond only when there exists some sense of reciprocal obligation, a sense that management actually values skilled workers, … and is willing to delegate responsibility to [them].” [7] The slogan “Do It Right the First Time” encouraged workers to feel responsible for the products moving down the line, and encourage them to stop the line and troubleshoot problems when and where they occurred.

In the software industry, the same slogan “Do It Right the First Time,” has been misused as an excuse to apply mass-production thinking, not lean thinking to software development. Under this slogan, responsibility has been taken away from the developers who add value, which is exactly the opposite of its intended effect. “Do It Right the First Time” has been used as an excuse to insert reams of paperwork and armies of analysts and designers between the customer and the developer. In fact, the slogan is only properly applied if it gives developers more, not less, involvement in the results of their work.

A more appropriate translation of such slogans as “Zero Defects” and “Do It Right the First Time” would be “Test First”. In other words, don’t code unless you understand what the code is supposed to do and have a way to determine whether the code works. A good knowledge of the domain coupled with short build cycles and automated testing constitute the proper way for software developers to “Do It Right the First Time”.

Center On The People Who Add Value
Almost every organization claims it’s people are important, but if they truly center on those who add value, they would be able to say:

In mass-production, tasks are structured so that low skilled or unskilled workers can easily do the repetitive work, but engineers and managers are responsible for production. Workers are not allowed to modify or stop the line, because the focus is to maintain volume. One of the results of mass-production is that unskilled workers have no incentive to volunteer information about problems with the manufacturing line or ways to improve the process. Maladjusted parts get fixed at the end of the line; a poor die or improperly maintained tool is management’s problem. Workers are neither trained nor encouraged to worry about such things.

“The truly lean plant has two key organizational features: It transfers the maximum number of tasks and responsibilities to those workers actually adding value to the car on the line, and it has in place a system for detecting defects that quickly traces every problem, once discovered, to its ultimate cause.” [8] Similarly in any lean enterprise, the focus is on the people who add value. In lean enterprises, traditional organizational structures give way to new team-oriented organizations which are centered on the flow of value, not on functional expertise.

The first experiment Taiichi Ohno undertook in developing lean production was to figure out a way to allow massive, single-purpose stamping machines to stamp out multiple parts. Formerly, it took skilled machinists hours, if not days, to change dies from one part to another. Therefore, mass production plants had many single purpose stamping machines in which the dies were almost never changed. Volume, space, and financing were not available in Japan to support such massive machines, so Ohno set about devising simple methods to change the stamping dies in minutes instead of hours. This would allow many parts of a car to be made on the same line with the same equipment. Since the workers had nothing else to do while the die was being changed, they also did the die changing, and in fact, the stamping room workers were involved in developing the methods of rapid die changeover.

Ohno transferred most of the work being done by engineers and managers in mass-production plants to the production workers. He grouped workers in small teams and trained the teams to do their own industrial engineering. Workers were encouraged to stop the line if anything went wrong, (a management job in mass-production). Before the line was re-started, the workers were expected to search for the root cause of the problem and resolve it. At first the line was stopped often, which would have been a disaster at a mass-production plant. But eventually the line ran with very few problems, because the assembly workers felt responsible to find, expose, and resolve problems as they occurred.

It is sometimes thought that a benefit of good software engineering is to allow low skilled programmers to produce code while a few high skilled architects and designers do the critical thinking. With this in mind, a project is often divided into requirements gathering, analysis, design, coding, testing, and so on, with decreasing skill presumably required at each step. A ‘standard process’ is developed for each step, so that low-skilled programmers, for example, can translate design into code simply by following the process.

This kind of thinking comes from mass-production, where skilled industrial engineers are expected to design production work for unskilled laborers. It is the antithesis of lean thinking and devalues the skills of the developers who actually write the code as surely as industrial engineers telling laborers how to do their jobs devalues the skills of production workers.

Centering on the people who add value means upgrading the skills of developers through training and apprenticeships. It means forming teams that design their own processes and address complete problems. It means that staff groups and managers exist to support developers, not to tell them what to do.

Flow Value From Demand (Delay Commitment)
The idea of flow is fundamental to lean production. If you do nothing but add value, then you should add the value in as rapid a flow as possible. If this is not the case, then waste builds up in the form of inventory or transportation or extra steps or wasted motion. The idea that flow should be ‘pulled’ from demand is also fundamental to lean production. ‘Pull’ means that nothing is done unless and until an upstream process requires it. The effect of ‘pull’ is that production is not based on forecast; commitment is delayed until demand is present to indicate what the customer really wants.

Pulling from demand can be one of the easiest ways to implement lean principles, as LL Bean and Lens Crafters and Dell found out. The idea is to fill each customer order immediately. In mass-production days, filling orders immediately meant building up lots of inventory in anticipation of customer orders. Lean production changes that. The idea is to be able to make the product so fast that it can be made to order. True, Dell and Lens Crafters and LL Bean and Toyota have to have some inventory of sub-assemblies waiting to be turned into a finished product at a moments notice. But it’s amazing how little inventory is necessary, if the process to replenish the inventory is also lean. A truly lean distribution channel only works with a really lean supply chain coupled to very lean manufacturing.

The “batch and queue” habit is very hard to break. It seems counterintuitive that doing a little bit at a time at the last possible moment will give faster, better, cheaper results. But anyone designing a control system knows that a short feedback loop is far more effective at maintaining control of a process than a long loop. The problem with batches and queues is that they hide problems. The idea of lean production is to expose problems as soon as they arise, so they can be corrected immediately. It may seem that lean systems are fragile, because they have no padding. But in fact, lean systems are quite robust, because they don’t hide unknown, lurking problems and they don’t pretend they can forecast the future.

In Lean Software Development, the idea is to maximize the flow of information and delivered value. As in lean production, maximizing flow does not mean automation. Instead, it means limiting what has to be transferred, and transferring that as few times as possible over the shortest distance with the widest communication bandwidth as late as is possible. Handing off reams of frozen documentation from one function to the next is a mass-production mentality. In Lean Software Development, the idea is to eliminate as many documents and handoffs as possible. Documents which are not useful to the customer are replaced with automated tests. These tests assure that customer value is delivered both initially and in the future when the inevitable changes are needed.

In addition to rapid, Just-in-Time information flow, Lean Software Development means rapid, Just-in-Time delivery of value. In manufacturing, the key to achieving rapid delivery is to manufacture in small batches pulled by a customer order. Similarly in software development, the key to rapid delivery is to divide the problem into small batches (increments) pulled by a customer story and customer test. The single most effective mechanism for implementing lean production is adopting Just-in-Time, pull-from-demand flow. Similarly, the single most effective mechanism for implementing Lean Development is delivering increments of real business value in short time-boxes.

In Lean Software Development, the goal is to eliminate as many documents and handoffs as possible. The emphasis is to pair a skilled development team with a skilled customer team and give them the responsibility and authority to develop the system in small, rapid increments, driven by customer priority and feedback.

Optimize across Organizations
Quite often, the biggest barrier to adopting lean practices is organizational. As products move from one department to another, a big gap often develops, especially if each department has its own set of performance measurements that are unrelated to the performance measurements of neighboring departments.
For example, let’s say that the ultimate performance measurement of a stamping room is machine productivity. This measurement motivates the stamping room to build up mounds of inventory to keep the machines running at top productivity. It does not matter that the inventory has been shown to degrade the overall performance of the organization. As long as the stamping room is measured primarily on machine productivity, it will build inventory. This is what is known as a sub-optimizing measurement, because it creates behavior which creates local optimization at the expense of overall optimization.

Sub-optimizing measurements are very common, and overall optimization is virtually impossible when they are in place. One of the biggest sub-optimizing measurements in software development occurs when project managers measured on earned value. Earned value is the cost initially estimated for the tasks which have been completed. The idea is that you had better not have spent any more than you estimated. The problem is, this requires a project manager to build up an inventory of task descriptions and estimates. Just as excess inventory in the stamping room slows down production and degrades over time, the inventory of tasks required for earned value calculations gets in the way of delivering true business value and also degrades over time. Nevertheless, if there is an earned value measurement in place, project tasks are specified and estimated, and earned value is measured. When it comes to a choice between delivering business value or earned value (and it often does), earned value usually wins out.

To avoid these problems, lean organizations are usually structured around teams that maintain responsibility for overall business value, rather than intermediate measurements such as their ability to speculate and pad estimates. Another approach is to foster a keen awareness that the downstream department is a customer, and satisfying this internal customer is the ultimate performance measurement.

The paradigm shift that is required with lean thinking is often hindered if the organization is not structured around the flow of value and focused on helping the customer pull value from the enterprise. For this reason, software development teams are best structured around delivering increments of business value, with all the necessary skills on the same team (eg. customer understanding / domain knowledge, architecture / design, system development, database administration, testing, system administration, etc.).

Software Development Contracts
Flow along the value stream is particularly difficult when multiple companies are involved. Many times I have heard the lament: “Everything you say makes sense, but it is impossible to implement in our environment, because we work under contracts with other organizations.” Indeed, the typical software development contract can be the ultimate sub-optimizing mechanism. Standard software contracts and supplier management practices have a tendency to interfere with many lean principles.

Manufacturing organizations used to have the same problem. For example, US automotive companies once believed the best way to reduce the cost of parts in an automobile was with annual competitive bidding. If the only thing that is important is cheap parts, competitive bidding may seem like the best way to achieve this goal. However, if overall company performance is more important, then better parts which integrate more effectively with the overall vehicle are more valuable. In fact, there is an direct correlation between an automotive company’s profitability and its degree of collaboration with suppliers.[9] When Chrysler moved from opportunistic to collaborative relationships with its suppliers in the late 1990’s, it’s performance improved significantly.

The software industry has some lessons to learn in the area of contractual agreements between organizations. It needs to learn how to structure collaborative relationships which maximize the overall results of both parties. A key lesson the software industry needs to learn is how to structure contracts for incremental deliveries that are not pre-defined in the contract, yet assure the customer of prompt delivery of business value appropriate to their investment. Here again, we can learn from lean production.

Lean manufacturing organizations develop a limited number of relationships with ‘trusted’ suppliers, and in turn, gain the ‘trust’ of these suppliers. What does ‘trust’ mean? “Trust [is] one party’s confidence that the other party in the exchange relationship will fulfill its promises and commitments and will not exploit its vulnerabilities.”[10] “…trust…[is] not based on greater interpersonal trust, but rather greater trust in the fairness, stability, and predictability of [the company’s] routines and processes.”[11]

It has been the practice of legal departments writing software contracts to put into contractual language all of the protections necessary to keep the other side ‘honest.’ However, the transaction costs associated with creating and monitoring such contracts are enormous. Many contracts all but demand a waterfall process, even if both companies believe this is not the best approach. It’s time that the software development industry learned the lesson of Supply Chain Management – “Extraordinary productivity gains in the production network or value chain are possible when companies are willing to collaborate in unique ways, often achieving competitive advantage by sharing resources, knowledge, and assets…. Today competition occurs between value chains and not simply between companies.”[12]

Summary and Conclusion
The lean production metaphor is a good one for software development, if it is applied in keeping with the underlying spirit of lean thinking. In the past, the application of some manufacturing concepts to software development (‘Do It Right the First Time’ comes to mind) may have lacked a deep understanding of what makes lean principles work. The underlying principles of eliminating waste, empowering front line workers, responding immediately to customer requests, and optimizing across the value chain are fundamental to lean thinking. When applied to software development, these concepts provide a broad framework for improving software development.

References

[1]The Machine That Changed the World : The Story of Lean Production, by Womack, James P., Daniel T. Jones, and Daniel Roos, New York: Rawson and Associates; 1990.

[2]The Machine That Changed the World : The Story of Lean Production, by Womack, James P., Daniel T. Jones, and Daniel Roos, New York: Rawson and Associates; 1990.

[3]The Machine That Changed the World : The Story of Lean Production, by Womack, James P., Daniel T. Jones, and Daniel Roos, New York: Rawson and Associates; 1990.

[4]Direct from Dell, by Michael Dell with Catherine Fredman, Harper Business, 1999, p 159.

[5]Q&A with eBay's Pierre Omidyar, Business Week Online, December 3, 2001.

[6]Inside Microsoft: Balancing Creativity and Discipline, Herbold, Robert J.; Harvard Business Review, January 2002.

[7]The Machine That Changed the World : The Story of Lean Production, by Womack, James P., Daniel T. Jones, and Daniel Roos, New York: Rawson and Associates; 1990.

[8]The Machine That Changed the World : The Story of Lean Production, by Womack, James P., Daniel T. Jones, and Daniel Roos, New York: Rawson and Associates; 1990.

[9] Collaborative Advantage, by Jeffrey H. Dyer, Oxford University Press; 2000

[10] Collaborative Advantage, by Jeffrey H. Dyer, Oxford University Press; 2000

[11] Collaborative Advantage, by Jeffrey H. Dyer, Oxford University Press; 2000

[12] Collaborative Advantage, by Jeffrey H. Dyer, Oxford University Press; 2000

Published in OOPSLA Onward! - November 2002.

Screen Beans Art, © A Bit Better Corporation

August 14, 2002

Why Predictability is Bad and Surprises are Good

“Too many mangers think that the key problem with product development is the surprises. They try to eliminate all variability from the process. By now you should understand that it is the uncertainty that creates the information and the information that crates the value of product development. This means that it is foolish to try to drive out variability from the development process.

“We would propose an alternative solution. We need to create processes that continue to function in a world with variation. Fortunately, there are abundant tools to do this. The primary obstacle to using them is the belief that product development is or should be deterministic. It is time to discard this notion and use the right tools. It is time to recognize that the emperor has no clothes on, and that he never will. We need to treat development as a process with inherent variability [and] approach development process design with the objective of making the process tolerant of variability as a key design objective.”

These are the words of words of product development expert Donald G. Reinertsen in his excellent book, Managing the Design Factory, Free Press, 1997. A co-author Developing Products in Half the Time, John Wiley and Sons, 1991, Second Edition 1997, Reinertsen developed simple economic models for trading off development cost, unit cost, product performance and development delay. In Managing the Design Factory, he sets out to show how to apply the principles of Lean Manufacturing to Product Development, and how not to apply them.

Try-it-fix-it gives better quality faster
“There are two schools of thought as to how we might get to [a] good design. One school holds that we should strive as developers to reduce the error rates. If we keep analyzing the design to minimize the number of errors, we will get a better design on the first try.... The other school of thought says: do it, try it, fix it. This school lacks the moral high ground of the other approach, but is well-grounded in the practical observation of what works for successful companies in the real world.” says Reinertsen in Managing the Design Factory. He goes on to show that a reduced cycle time for iterations produces lower defects in less time. “On the surface, this seems too good to be true. The try-it-fix-it approach is faster and higher quality.” Reinertsen says. However, as long as iterations do not contain significant fixed costs, the try-it-fix-it approach dose in fact produce better quality faster.

The reason for this is that design processes must create information if they are to add value. Creating information involves finding failures, especially unexpected (or low probability) failures. “The fallacy in thinking that high first-pass success optimizes the design process lies in underestimating the importance of information generation that occurs with failure.”

Do It Right The Second Time
Reinertsen notes that once we learn from a failure, it is a waste to learn the same lesson again. So we need to learn how to create failure and how to avoid the same failure a second time.

Screen Beans Art, © A Bit Better Corporation

August 8, 2002

XP in a Safety-Critical Environment

Recently I chanced to meet a gentleman on a plane who audits the software used in medical and pharmaceutical instruments. During our long and interesting conversation, he cited several instances where defects in software had resulted in deaths. One that comes to mind is a machine which mixed a lethal dosage of radiation [1]. We discussed how such deaths could be prevented, and he was adamant – it is a well-known fact that when developing safety-critical software, all requirements must be documented up front and all code must be traced to the requirements. I asked how one could be sure that the requirements themselves would not cause of a problem. He paused and admitted that indeed, the integrity of the requirements is a critical issue, but one which is difficult to regulate. The best hope is that if a shop is disciplined shop in other areas, it will not make mistakes in documenting requirements.

One of the people this auditor might be checking up on is Ron Morsicato. Ron is a veteran developer who writes software for computers that control how devices respond to people. The device might be a weapon or a medical instrument, but often if Ron’s software goes astray, it can kill people. Last year Ron started using Extreme Programming (XP) for a pharmaceutical instrument, and found it quite compatible with a highly regulated and safety-critical environment. In fact, a representative of a worldwide pharmaceutical customer audited his process. This seasoned auditor concluded that Ron’s development team had implemented practices sufficiently good to be on a par with the expected good practices in the field. This was a strong affirmation of the practices used by the only XP team in the company.

However, Ron’s team did not pass the audit. The auditor was disturbed that the team had been allowed to unilaterally implement XP. He noted that the organization did not have policies concerning which processes must be used, and no process, even one which was quite acceptable, should be independently implemented by a development team.

The message that Ron’s team heard was that they had done an excellent job using XP when audited against a pharmaceutical standard. What their management heard was that the XP process had failed the audit. This company probably won’t be using XP again, which is too bad, because Ron thinks it is an important step forward in designing better safety-critical systems.

The Ying and Yang of Safety-Critical Software
Ron points out that there are two key issues with safety-critical systems. First, you have to understand all the situations in which a hazardous condition might occur. The way to discover all of the safety issues in a system is to get a lot of knowledgeable people in a room and have them imagine scenarios that could lead to a breach of safety. In weapons development programs, there is a Systems Safety Working Group that provides a useful forum for this process. Once a dangerous scenario is identified, it’s relatively easy to build into the system a control that will keep it from happening. The hard part is thinking of everything that could go wrong in the first place. Software rarely causes problems that were anticipated, but the literature is loaded with accounts of accidents whose root causes stem from a completely unexpected set of circumstances. Causes of accidents include not only the physical design of the object, but also its operational practices [2]. Therefore, the most important aspect of software safety is making sure that all operational possibilities are considered.

The second issue with safety is to be sure that once dangerous scenarios are identified and controls are designed to keep them from happening, future changes to the system take this prior knowledge into account. The lethal examples my friend on the airplane cited were cases in which a new programming team was suspected of making a change without realizing that the change defeated a safety control. The point is, once a hazard has been identified, it probably will be contained initially, but it may be forgotten in the future. For this reason, it is felt that all changes must be traced to the initial design and requirements.

Ron has noticed an inherent conflict in these two goals. He is convinced that best way to identify all possible hazard scenarios is to continually refactor the design and re-evaluate the safety issues. Yet the traditional way to avoid forgetting about previously identified failure modes is to freeze the design and trace all code back to the original requirements.

Ron notes that up until now there were two approaches: the ‘ad hoc’ approach and the ‘freeze up front’ approach. The ‘ad hoc’ approach might identify more hazards, but it will not insure that they will continue to be addressed through the product lifecycle. The ‘freeze up front’ approach insures that identified failure modes have controls, but it is not good at finding all the failure modes. Theoretically, a good safety program employs both approaches, but when a new hazard is identified there is a strong impetus to pigeonhole a fix into the current design so as not to disturb the audit trails spawned by policing a static design. XP is a third option – one that is much better at finding all the failure modes, yet can contain the discipline to protect existing controls.

Requirements Traceability
My encounter on the plane told me that those who inspected Ron’s XP process would be looking for traceability of code to requirements. Since his XP processes fared well under review, I wondered how he satisfied inspectors that his code was traceable to requirements. Did he trace code to requirements after it was written?

“Just because you’re doing XP doesn’t mean you abandon good software engineering practices,” Ron says. “It means that you don’t have to pretend that you know everything there is to know about the system in the beginning.” In fact, XP is quite explicit about not writing code until there is a user story calling for it to be written. And Ron points out that the user stories are the requirements.

The important thing about requirements, according to Ron, is that they must reflect the customer’s perspective of how the device will be used. In a DOD contract, requirements stem from a document aptly named the Operational Requirements Document, or ORD. In a medical device development, requirements would be customer scenarios about how the instrument will be used. Sometimes initial requirements are broken down into more detail, but that process results in derived requirements, which are actually the start of the design. When developing safety-critical systems, it is necessary to develop a good understanding of how the device will be used, so derived requirements are not the place to start. The ORD or customer scenarios, along with any derived requirements that have crept in, should be broken down into story cards.

In order to use XP in a safety environment, the customers representatives working on story cards should 1) be aware of the ORD and/or needs of system users and able to distinguish between originating and derived requirements, 2) have a firm understanding of system safety engineering, preferably as a member of the System Safety Working Group, and 3) have the ear and confidence of whatever change authority exits. Using XP practices with this kind of customer team puts in place the framework for a process that maintains a system’s fitness for use during its development, continually reassesses the risk inherent in the system, and facilitates the adaptation of risk reduction measures.

Refactoring
Ron finds that refactoring a design is extremely valuable for discovering failure scenarios in embedded software. It is especially important because you never know how the device will work at the beginning of software development. Ron notes that many new weapons systems will be built with bleeding edge technology, and any new pharmaceutical instrument will be subject to the whims of the marketplace. So things change, and there is no way to get a complete picture of all the failure modes of a device at the beginning of the project. There is a subtler but equally important advantage to refactoring. The quality of a safety control will be improved because of the opportunities to simplify its design and deal with the inevitable design flaws that will be discovered.

“It’s all about feedback. You try something, see how it works, refactor it, improve it.” In fact, the positive assessment of the auditor notwithstanding, if there were one thing that Ron’s team would do differently the next time: they would do more refactoring. Often they knew they should refactor, but forged ahead without doing it. “We did a root-cause analysis of our bugs, and concluded that when we think refactoring might be useful, we should go ahead and do it.”

It is dangerous to think that all the safety issues will be exposed during an initial design, according to Ron Morsicato. It is far better to review safety issues on a regular basis, taking into account what has been learned as development proceeds. Ron is convinced that when the team regularly thinks through failure scenarios, invariably new ones will be discovered as time goes on.

Refactoring activities must be made visible to the business side of the planning game, for it is from there that the impetus to reevaluate the new design from a systems safety aspect needs to occur. Ron believes that if the system safety designers feel that impetus and take on an “XP attitude,” then the benefits of both the “ad hoc” and “freeze” approaches can be realized. The customer-developer team will achieve a safer design by keeping the system as simple as possible, helping them to achieve a heightened focus on the safety of its users.

Testing
The most important XP discipline is unit testing, according to Ron Morsicato. He noted that too many managers ignore the discipline of thorough testing during development, which tends to create a ‘hacker’s’ environment. When presented with a ton of untested code, developers are presented with an impossible task. Random fixes are often applied, the overall design gets lost, and the code base becomes increasingly messy.

Ron feels that no code should be submitted to a developing system unless it is completely unit tested, so the systems debuggers need only look at the interfaces for causes of defects. Instead of emphasizing sequential steps in development and thorough documentation, emphasizing rigorous in-line testing will result in better code. When coupled with effective use of the planning game and regular refactoring, on-going testing is the best way to develop safe software.

The XP testing discipline provides a further benefit for safety-critical systems. By assuring that all safety controls have tests that run every time the system is changed, it is easier to be sure that safety controls cannot be broken as the software undergoes inevitable future changes.

Us vs. Them
I asked Ron what single thing was the most important trait of a good manager. He replied without hesitation, “Managers who give you the big picture of what you are supposed to achieve, rather than telling you what to do, are far and away the best managers. Developers really do not like managers telling them how to do their job, but they don’t appreciate a hacking environment either.” One thing Ron has observed in his experiences is that the increasing pressure on developers to conform to a specific process has created an “us” vs. “them” mentality. The word “process” has become tainted among developers; it means something imposed by people who have lost touch with the realities of code development. Developers accuse ‘them’ of imposing processes because they sound good in theory, and are a foolproof way of passing an auditor’s comparison of the practice to a particular standard. Developers find themselves overloaded with work that they feel they don’t have to do in order to produce good code. The unfortunate consequence of this is that anything said by the “process camp” tends to be disregarded by the “developer camp.” This leads to an unwillingness to adopt a good practice just because the process people support it.

According to Ron, XP is a process that doesn’t feel like a process. It’s presented as a set of practices that directly address the problems developers continually run into from their own perspective. When engaged in any of the XP practices, a developer has a sense of incrementally contributing to the quality of the product. This reinforces developers’ commitment to quality and strengthens their confidence that they are doing the right thing. If I were getting some medical treatment from a device that looked like it could kill me if someone made the wrong move, I’d certainly hope that the engineers who developed the gizmo had that confidence and commitment.

Software developers will deliver high quality code if they clearly understand what quality means to their customer, if they can constantly test their code against that understanding, and if they regularly refactor. Keeping up with change, whether the emerging insights into the system design, the inevitable improvements in device technology, or the evolving customer values, is critical to their success. With good management, they will look upon themselves as members of a safety team immersed in a culture of safety.

The Audit
Ron’s team implemented XP practices with confidence and dedication, met deadlines that would have been impossible with any other approach, and delivered a solid product, while adhering to practices that met pharmaceutical standards. And yet even though these practices were praised by a veteran auditor, the organization failed the audit at the policy level. What went wrong?

The theory of punctuated equilibrium holds that biological species are not likely to change over a long period of time because mutations are usually swamped by the genes of the existing population. If a mutation occurs in an isolated spot away from the main population, it has a greater chance of surviving. This is like saying that it is easier for a strange new fish to grow large in a small pond. Similarly, disruptive technologies [3] (new species of technologies) do not prosper in companies selling similar older technologies, nor are they initially aimed at the markets served by the older technologies. Disruptive technologies are strange little fish, so they only grow big in a small pond.

Ron’s project was being run under a military policy, even though it was a commercial project. If the company policy had segmented off a commercial area for software development and explicitly allowed the team to develop its own process in that segment, then the auditor would have been satisfied. He would not have seen a strange little fish swimming around in a big pond, looking different from all the other fish. Instead he would have seen a new little fish swimming in its own, ‘official’ small pond. There XP practices could have thrived and grown mature, at which time they might have invaded the larger pond of traditional practices.

But it was not to be. The project was canceled, a victim not of the audit but of the economy and a distant corporate merger. Today the thing managers remember about the project is that XP did not pass the audit. The little fish did not survive in the big pond.
__________________
Footnotes:

[1] What he was probably was referring to was the Therac-25 series of accidents, where indeed they had suspect software practices, including after-the-fact requirements traceability.

[2] For a comprehensive account of accidents in software based systems, see Safeware: System Safety and Computers, Nancy G. Leveson, Addison-Wesley, 1995

[3] See The Innovator’s Dilemma, by Clayton M. Christensen, Harper-Business edition, 2000.

Published in Cutter IT Journal – September, 2002

April 16, 2002

Righteous Contracts

Right"eous a. Doing that which is right; yielding to all their due; just; equitable.
[Webster’s Revised Unabridged Dictionary, 1913]

Righteous contracts. A good name for contracts whose purpose is to assure that the parties act in a just and equitable manner and yield to the other their due. Righteous contracts are those governing investments in specialized assets – assets which are very important to a business, but have no value anywhere else. For example, software developed specifically for a single company is a specialized asset, since it is useful only by the company for which it was developed. Agreements to develop specialized assets create a bilateral monopoly; that is, once the parties start working together, they have little option but to continue working together. This bilateral monopoly provides an ideal environment for opportunistic behavior on the part of both supplier and customer.

Thus the purpose of righteous contracts is to prevent opportunistic behavior, to keep one party from taking advantage of another when market forces are not in a position to do so. In a free market where there are several competent competitors, market forces control opportunism. This works for standard or commodity components, but not for specialized assets.

The traditional way to develop specialized assets has been to keep the work inside a vertical organization, where opportunism is controlled by administration. Inside a company, local optimization would presumably be prevented by someone positioned to arbitrate between departments for the overall good of the enterprise. Vertical integration allows a company to deal with uncertainty and change in a rapid and adaptive manner.

Outsourcing
Recently, however, outsourcing has become common in many companies, for very good reasons. An outside company may have lower labor costs or more specialized experience in an area that is not one of the firms core competencies. The cost of producing a service or asset can be considerably lower in an outside company. Of course, there are transaction costs associated with outsourcing, and the total cost (production costs plus transaction costs) must be lower, or vertical integration would make more sense.

Transaction costs associated with outsourcing include the cost of selecting potential suppliers, negotiating and renegotiating agreements, monitoring and enforcing the agreement, billing and tracking payments. Transaction costs also include inventory and transportation above that needed for vertical integration. In addition, there are risks associated with outsourcing, which may result in additional costs. One cost would be that of diminished communication. For example, development of any kind usually requires intense communication between various technical specialties and target users. If distance or intellectual property issues reduce the communication, it will cost more to develop the asset and the results may suffer as well. In addition, moving a specialized skill outside the company may incur lost opportunity costs.

There are two types of contracts which are used for developing specialized assets – Contracts which are executed before the development is done by the supplier, and contracts which are executed after the supplier does the work. A contract executed before work is done is known as a before-the-fact (or ex ante) contract. There are two types of before-the-fact contracts – fixed price contracts and flexible (time-and-materials) contracts. Once these contracts are executed, they set up a bilateral monopoly, fraught with opportunities for exploitation on one side or the other. Therefore, the purpose of these contract is to set up control systems to prevent exploitation.

A contract executed after work is done is called an after-the-fact (or ex post) contract. Suppose a supplier develops a system that it thinks a customer will find valuable and then tries to sell the system. In this case, control comes after the fact; the supplier makes its own decisions, and it’s reward is based on the results. Of course this is a risky proposition, so the supplier has to hedge its bets. One way to do this is to sell the system to multiple customers, basically making it into a commodity product. But this doesn’t help a company that wants suppliers to develop proprietary components for them. In order to entice suppliers to develop specialized assets prior to a contract, a company usually sets up a sole source or favored source program. If a company treats its favored suppliers well, the suppliers develop confidence that their investments will be rewarded and continue to make investments.

On the surface, after-the-fact contracts may seem implausible for software development, but in fact, they are the best solution for contracting a development project. Moreover, the best kind of development processes to use inside a company are those that mimic after-the-fact contracts. How can this be? The explanation starts by understanding why before-the-fact contracts provide poor governance for development projects.

Fixed-Price Contracts
Let’s examine the most commonly used before-the-fact contract, the fixed price contract. A key motivator for fixed price contracts is the desire of a customer to transfer risk to the supplier. This may work for simple, well-defined problems, but it is inappropriate for wicked problems.[1] If the project is complex or uncertain, a fixed price contract transfers a very high risk to the supplier. If the supplier is not equipped to deal with this risk, it will come back to haunt the customer.

Risk should be born by the party best able to manage it. If a problem is technically complex, then the supplier is most likely to be in a position to manage it. If a problem is uncertain or changing, then the customer is in the best position to manage it. Transferring the risk for such problems to the supplier is not only unfair, it is also unwise. There is no such thing as a win-loose contract. If a supplier is trapped on the wrong side of a win-loose contract, the bilateral monopoly which has been formed will trap the customer as well. Both sides loose in the end.

Fixed price contracts do not usually lower cost, because there is always at least some risk in estimating the cost. If the supplier is competent, it will include this risk in the bid. If the supplier does not understand the complexity of the problem, it is likely to underbid. The process of selecting a supplier for a fixed-price contract has a tendency to favor the most optimistic (or the most desperate) supplier. Consequently, the supplier least likely to understand the project’s complexity is most likely to be selected. Thus fixed price contracts tend to select the supplier most likely to get in trouble.

Therefore it is quite common for the customer find a supplier unable to deliver on a fixed price contract. Because the customer no longer has the option to choose another supplier, they must often come to the rescue of the supplier. Alternatively, the supplier might be able to cover its loss, but most likely it will attempt to make the loss up through change orders which add more revenue to the contract. This leads the customer to aggressively avoid any change to the contract. Faced with no other way to recoup the loss, a supplier will be motivated to find ways to deliver less than the customer really wants, either by lowering the quality or reducing the features.

The customer using fixed price contracts to transfer responsibility and risk will often find both back on their plates in the end, and if so, they will be worse off because of it.

Flexible Contracts
“Customers should prefer flexible-price contracts to fixed-price contracts where it is cheaper for the customer to deal with uncertainty than it is for the contractor to do so or where the customer is more concerned with the ability of the contractor to provide a product that works than with price,” writes Fred Thompson in the Handbook of Public Administration. (Second Edition), Rabin, Hildreth, Miller, editors, New York: Marcel Dekker, Inc., 1998.

The flexible-price contract is designed to deal with uncertainty and complexity, but it does not do away with risk, it simply shifts it from the supplier to the customer. For example, after the DOD (U.S. Department of Defense) experienced some very high profile bailouts on fixed price contracts, it began to use more flexible-price contracts is situations where the government was better able to manage the risk. Of course, with the risk transferred to the customer, the supplier has little incentive to contain costs in a flexible-price contract, a point that did not escape contract negotiators at DOD. In order to protect the public interest, DOD perfected controls imposed on the supplier.

Controlling suppliers of flexible-price contracts evolved into a discipline called project management. The waterfall lifecycle grew out of military contracts, and an early focus of PMI (Project Management Institute) was DOD contracts. Companies with DOD contracts not only hire administrators to oversee compliance with contract requirements, they also add accountants to sort out allowable and unallowable costs. Flexible-price contracts invariably have high transaction costs, due to the high cost of control.

Controls Do Not Add Value
High transaction costs would be reasonable if they added value, but in fact, transaction costs are by definition non-value-adding costs. Fred Thompson (Ibid.) notes, “Controls contribute nothing of positive value; their singular purpose lies in helping us to avoid waste. To the extent that they do what they are supposed to do, they can generate substantial savings. But it must be recognized that controls are themselves very costly.”

One way to avoid the high cost of control in flexible-price contracts is not to use them. It may be better to do development internally, where it is easier to deal with uncertainty and respond to change. The question is, on what basis should an outsourcing decision be made? Thompson (Ibid.) counsels, “The choice of institutional design should depend upon minimizing the sum of production costs and transactions costs.” He also notes, “Vertical integration occurs because it permits transaction or control costs to be minimized.”

An interesting problem with this equation is that vertical integration does not always work to minimize control costs. In fact, many organizations find themselves using DOD-like project management controls internally. It seems incongruous that control mechanisms which add cost but not value, and which were invented to prevent opportunistic behavior, would come to dominate development in the very place where they should not be needed. If the reason to develop internally is to provide flexibility in the face of uncertainty, then costly, change-resistant control practices are inappropriate. Traditional project control practices (that freeze requirements, require approval for changes, and track tasks instead of features) have a tendency to create waste, not value, when used inside a company.

After-the-fact Contracts
Let’s assume for the sake of argument that the choice has been made to outsource a complex, specialized development effort. The next question is, how can transaction costs be reduced? In the manufacturing industry, this is done with after-the-fact contracts.

Despite the obvious risks, is not uncommon for suppliers to develop specialized components for a manufacturer prior to obtaining a contract. For example, 3M Optical Systems Division used to develop optically precise lenses for specialized automotive taillights. The reward was a one year contract for a specific model. Unfortunately, after the first year, the automotive company would invariably find a cheaper way to make a similar lens, and Optical Systems would loose the business before it had recovered its investment. The division eventually decided that after-the-fact contracts with Detroit automakers were not profitable and left the business.

There are ways to make after-the-fact contracts work better. Toyota awards contracts for an entire run of a model, and uses target costing to manage costs. Thus a supplier knows that if it wins the business, it can recover its investment, while the customer is confident that the supplier will work reduce costs in line with its economic requirements. In addition, the supplier understands that it will receive favored consideration for similar components in the future.

After-the-fact contracts require two elements to work: shared risk and trust. Toyota shares the risk with a component supplier by guaranteeing the business over the life of an model. Both parties agree to work together to try to meet a target cost profile over the life of the agreement. Note that meeting future target costs is neither guaranteed nor is it the sole responsibility of the supplier. In the best relationships, technical personnel from each company work freely together without worrying about proprietary information, both to meet target costs and to develop new components not yet subject to a contract.

If both parties are pleased with the results of the first contract, they develop trust and a good working relationship, and are more likely continue to do business together. The supplier is inclined to risk more in developing new components when it has developed confidence that the investment will pay off. This kind of relationship can achieve all of the benefits of both outsourcing and vertical integration combined.

But Software is Different…
You might be saying to yourself, this is fine if there is something to be manufactured and sold many times over, like a taillight, but in software we develop a system only once, it is complex and expensive, it is subject to many changes, and if it is not properly designed and executed, huge waste might result. Where is the parallel to THIS in manufacturing?

Consider the large and expensive metal dies which stamp out vehicle body panels. The cost of developing dies can account for half of a new model’s capital investment. Consequently, a great deal of time is spent in all automotive companies working to minimize the cost of these dies. The approach in Japan is distinctly different from that in the U.S., and dramatically more effective. The best Japanese companies develop stamping dies for half the cost and in half the time as their counterparts in the West. The resulting Japanese dies will be able to stamp out a body panel in 70% of the time needed by U.S. stamping operations.

From the classic book Product Development Performance by Clark and Fujimoto, Harvard Business School Press, 1991:

Japan firms use an ‘early design, early cut’ approach, while U.S. practice is essentially “wait to design,, wait to cut.”

Because it entails making resource commitments while the body design is still subject to frequent changes, the Japanese early design, early cut approach entails significant risks of waste and duplication of resources…. Many engineering changes occur after final release of blueprints. At peak, hundreds of changes are ordered per month.

Behind the wait to design, wait to cut approach in U.S. projects is a desire to avoid expensive die rework and scrappage, which we would expect to be an inevitable consequence of the bold overlapping that characterizes the Japanese projects. However, our study revealed a quite different reality. U.S. firms, despite their conservative approach to overlapping, were spending more on engineering changes than Japanese firms. U.S. car makers reported spending as much as 30-50 percent of original die cost on rework due to engineering changes, compared to a 10-20 percent margin allowed for engineering changes by Japanese products.

The Japanese cost advantage comes not from lower wages or lower material prices, but from fundamental differences in the attitudes of designers and tool and die makers toward changes and the way changes are implemented…. In Japan, when a die is expected to exceed its cost target, die engineers and tool makers work to find ways to compensate in other areas…. Die shops in high-performing companies develop know-how techniques for absorbing engineering changes at minimum cost…. In the United States, by contrast, engineering changes have been viewed as profit opportunities by tool makers….

Suppose a body engineer decides to change the design of a panel to strengthen body-shell rigidity. The high performers tend to move quickly. The body designer immediately instructs the die shop to stop cutting the die on the milling machine. Without paperwork or formal approval, the body designer goes directly to the die shop, discusses modifications with the die engineers, checks production feasibility, and makes the agree-upon changes on the spot. Unless the changes are major, decisions are made at the working level. Traditionally, the die shop simply resumes working on the same die. Paperwork is completed after the change has been made and submitted to supervisors for approval. The cost incurred by the change is also negotiated after the fact. The attitude is “change now, negotiate later.

In companies in which die development takes a long time and changes are expensive, the engineering change process is quite different. Consider the context in which changes occur. In extreme versions of the traditional U.S. system, tool and die makers are selected in a competitive bidding process that treats ‘outside’ tool shops as providers of a commodity service. The relationship with the die maker is managed by the purchasing department, with communication taking place through intermediaries and drawings. The individuals who design the dies and body panels never interact directly whit the people who make the dies.

You would think that tool and die makers in Japan must be a department inside the automotive company. How else could it be possible for a designer to walk into a tool and die shop, stop the milling, make changes, and start up the milling again, leaving approvals and cost negotiations for later? But this is not the case. Tool and die makers are supplier companies in Japan, just as they are in the U.S. The difference lies in the attitudes of the different countries toward supplier contracts.

For Toyota in particular, a supplier is a partner. The basis of this partnership is a target cost for each area of the car. This translates into target costs for all development activities, including dies. Of course, U.S. companies have target costs for each component also, but they tend to impose the cost on the supplier without regard to feasibility. This has a tendency to create a win-loose relationship, leaving the supplier no option but to recoup costs through the change process.

In contrast, Toyota does not impose cost targets on suppliers that it does not know how to meet, and engineers from both companies work together to meet target costs. If something goes wrong and the targets cannot be met, Toyota shares the problem in an equitable manner. In this win-win environment, arms-length exchange of information through written documentation and an extensive change approval processes is unnecessary.

The Toyota Production System is founded on the premise that superior results come from eliminating anything which does not add value. Since control systems do not add value, they must be minimized, just like inventory and set-up times. Therefore supplier partnerships based on shared risk and trust are the preferred relationship. The hallmarks of these partnerships are worker-level responsibility for meeting business goals, intense communication at the technical level, a stop-the-line and fix-it-immediately attitude, and an emphasis on speed. Even for large, one-of-a-kind development projects which require highly specialized design, this approach produces dramatically superior results.

Can this work for Software Development?
Developing specialized dies is not that much different than developing specialized software. The key is to establish a partnership relationship which allows true development to take place. Development is done using a repeated cycle of design-build-test, allowing the solution to emerge. The question is, how can a contract be written to support the emergent nature of development?

Neither fixed-price nor flexible-price contracts support the nature of software development. Development always involves tradeoffs, and an organization which facilitates the best tradeoff decisions will produce the best result. Before-the-fact contracts do not support the give-and-take between developers and customers necessary to make the best tradeoffs. A developer should not have to worry about dealing with problems as they arise, but with before-the-fact contracts, this activity has to be paid for by one company or the other. Since every hour must be accounted for, the give-and-take necessary for trade-off decisions is discouraged.

What is needed is a contract approach which allows developers and customers work closely together to develop a business value for a target cost. Examples of how to do this in a vertical organization abound. There many successful examples of using Scrum for product development. Microsoft’s approach to product development is documented by Michael Cusumano in Microsoft Secrets, Simon and Schuster, 1998. The general approach is to set a clear business goal, fix resources, prioritize features, deliver working software in short cycles, and stop working on features when time runs out. This approach has a track record of delivering systems, even large ones, in a predictable timeframe for a predicable cost.

The question is, how can a contract be written to support the same approach? The answer is to move to after-the-fact contracts in which a supplier is paid for the value of the work they do. It works like this: A customer has a clearly defined business value and a target cost in mind for achieving that value. This target cost includes payments to a supplier for their contributions. The customer comes to an agreement with a supplying partner that the business value and the target cost are achievable, including the target cost for the supplier’s participation. Work proceeds without contractual guarantees that the value will be delivered or the target cost will be achieved, but both partners are committed to meet these goals.

Workers at each company use adaptive processes[2] to develop the system as a single team. They communicate intensely at the developer-customer level to make the necessary tradeoffs to achieve the value within the target cost. As working software is delivered, both supplier and customer work together using velocity charts to monitor development progress. If adjustments to the business value or the target cost structure are required, these become apparent early, when they can be addressed by limiting the feature list or extending the schedule. If this changes the business value or target cost, the parties negotiate an equitable way to share the burden or benefit.

Conclusion
Trusted-based partnerships are the first requirement to make after-the-fact contracts work. Partnerships are necessary to facilitate worker-level responsibility for meeting business goals, intense communication between developers and users to make optimal tradeoffs, daily builds and automated testing to facilitate a fix-it-immediately attitude, and a focus on early delivery of working software to create the feedback system critical to good development.

Companies that develop contracts allowing these values to flourish can expect to produce the same dramatically superior results in software development that these values produce in product development.

Lessons for Outsourcers
If your company outsources software development, consider the following:

1. Fixed Price Contracts
Fixed price contracts are risky. There is both a technical risk that the job can’t be done for the allotted cost, and the very real risk that the selection process favors less knowledgeable suppliers. If you assure that you get a competent supplier, then you can be sure the supplier will add a good margin to the cost to cover their risk. Remember that risk should be born by the party most able to manage it, so if the project is complex and changes are likely, you should assume the risk. If the project is a wicked project, you should not even consider a fixed price contract.

If you are considering a fixed price contract, you are probably interested in transferring all responsibility to your supplier. But remember, if this results in a win-loose situation, you will not win. You are going to be committed to the supplier before the cracks begin to show, and if things go wrong, you will suffer as much, if not more, than your supplier. You may have to bail them out. They will no doubt be looking to make up for loses through change orders, so you will have to control these aggressively. That means if your project is prone to uncertainty or change, you really don’t want a fixed price contract.

And finally, it is much more difficult to get what you really need under a fixed price contract. If you got the low bidder, you probably did not get the supplier most familiar with your domain. If the bid was too low, your supplier will want to cut corners. This may mean less testing, fewer features, a clumsy user interface, out-of-date technology. You are going to need to carefully limit user feedback to control changes and keep the price in line, which will make it more difficult to get what your users really want.

Traditional Control Processes
Traditional project management processes tend to emphasize scope management using requirements traceability and an authorization-based change control system. Typically cost control is provided with some variation of an earned value measurement. The first thing to realize is that all of these techniques are expensive and do not add any value to the resulting software. These practices are meant to control opportunism, and if you are concerned that your supplier might take advantage of you, they might make sense. (But try partnerships first.)

You most likely do not want to be using these practices inside your own company; they get in the way of good software development. It’s pretty well known that an iterative approach to software development, with regular user communication and feedback, is far better than the waterfall approach. However, those pesky project management practices tend to favor waterfalls. It’s a good bet that your project will be subject to change (users change their preferences, technology changes, someone forgot a critical requirement), so you want to be using adaptive processes.

2. Trust-based Partnerships
For starters, all internal development should be based on trust-based partnerships – after all, that’s why you are doing inside development in the first place! If you can’t trust someone in your own company, who can you trust?

The fastest, cheapest way to develop software with a supplier is to let their technical people make decisions based on close interaction with and regular guidance from your users. You get the best results and the happiest users this way too. This kind of relationship requires risk sharing and excellent on-going communications. In exchange for this investment, trust-based partnerships adapt well to change and uncertainty and are most likely to yield faster, better, cheaper results.

Lessons for Contractors
If your company supplies software development, consider the following:

1. Fixed Price Contracts
You owe it to your customers to educate them on the pitfalls of fixed price contracts. Make sure they understand that this will make it more difficult for you to deliver the best business value.

2. Traditional Control Processes
Don’t accept traditional control mechanisms; there are better ways. Instead, use prioritized feature sets, rapid iterations and velocity charts to monitor projects.

Never allow the customer to fix cost, schedule and features simultaneously. Preferably, you want to agree to meet cost, schedule and overall business value targets, and provide a variable feature set. If the detailed feature set is not negotiable, then at least one of the other two must be flexible.

Find out what is REALLY important to your customer in terms of business value and deliver that.

3. Trust-based Partnerships
Your top priority when negotiating the relationship is to assure that your development team will have constant user involvement and feedback. You can negotiate what this means and who represents the user, but if you don’t have access to users or a user proxy, you will have a difficult time delivering business value. And delivering business value must be your main objective.
____________________
Footnotes:

[1] A Wicked Problem is one in which each attempt at creating a solution changes the understanding of the problem. See “Wicked Projects” by Mary Poppendieck, Software Development Magazine, May, 2002, posted on this site under the title “Wicked Problems.”

[2] For a discussion of adaptive processes, see “Wicked Projects” by Mary Poppendieck, Software Development Magazine, May, 2002, posted on this site under the title “Wicked Problems.”

Screen Beans Art, © A Bit Better Corporation

Lean Contracts

Tool and Die Contracts
The cost of developing dies which stamp out body panels for a new model car can account for half of the model’s capital investment. Consequently, a great deal of time is spent in all automotive companies working to minimize the cost of these dies. The approach in Japan is distinctly different from that in the U.S., and dramatically more effective. The best Japanese companies develop these dies for half the cost and in half the time as their counterparts in the West. The resulting Japanese dies average five shots per panel, while U.S. dies average seven shots per panel, significantly reducing manufacturing costs as well.

From the classic book Product Development Performance by Clark and Fujimoto, Harvard Business School Press, 1991:

Japan firms use an ‘early design, early cut’ approach, while U.S. practice is essentially ‘wait to design, wait to cut.’

Because it entails making resource commitments while the body design is still subject to frequent changes, the Japanese early design, early cut approach entails significant risks of waste and duplication of resources…. Many engineering changes occur after final release of blueprints. At peak, hundreds of changes are ordered per month.

Behind the wait to design, wait to cut approach in U.S. projects is a desire to avoid expensive die rework and scrappage, which we would expect to be an inevitable consequence of the bold overlapping that characterizes the Japanese projects. However, our study revealed a quite different reality. U.S. firms, despite their conservative approach to overlapping, were spending more on engineering changes than Japanese firms. U.S. car makers reported spending as much as 30-50 percent of original die cost on rework due to engineering changes, compared to a 10-20 percent margin allowed for engineering changes by Japanese products.

The Japanese cost advantage comes not from lower wages or lower material prices, but from fundamental differences in the attitudes of designers and tool and die makers toward changes and the way changes are implemented…. In Japan, when a die is expected to exceed its cost target, die engineers and tool makers work to find ways to compensate in other areas…. Die shops in high-performing companies develop know-how techniques for absorbing engineering changes at minimum cost…. In the United States, by contrast, engineering changes have been viewed as profit opportunities by tool makers….

Suppose a body engineer decides to change the design of a panel to strengthen body-shell rigidity. The high performers tend to move quickly. The body designer immediately instructs the die shop to stop cutting the die on the milling machine. Without paperwork or formal approval, the body designer goes directly to the die shop, discusses modifications with the die engineers, checks production feasibility, and makes the agree-upon changes on the spot. Unless the changes are major, decisions are made at the working level. Traditionally, the die shop simply resumes working on the same die. Paperwork is completed after the change has been made and submitted to supervisors for approval. The cost incurred by the change is also negotiated after the fact. The attitude is “change now, negotiate later.”

In companies in which die development takes a long time and changes are expensive, the engineering change process is quite different. Consider the context in which changes occur. In extreme versions of the traditional U.S. system, tool and die makers are selected in a competitive bidding process that treats “outside” tool shops as providers of a commodity service. The relationship with the die maker is managed by the purchasing department, with communication taking place through intermediaries and drawings. The individuals who design the dies and body panels never interact directly whit the people who make the dies.