A couple years earlier, this rapidly growing young company (call it XYZ) saw its revenue flattening and had regulators questioning its processes. In an attempt to impose discipline and cut costs, XYZ centralized software development. Executives felt that this would give them more visibility into the development portfolio and ensure that a standard process was used throughout the organization. They also hoped to even out the software development workload, increase utilization of resources, and eventually be able to outsource a good deal of development.
The problem was, XYZ’s products were largely based on software. When the software developers left the business units, the company's track record for fielding successful new products plummeted. Time-to-market stretched out for even the most important initiatives, and market share dropped. What went wrong?
People or Resources?
Company XYZ did very well when new products were designed and implemented inside of a business unit. When a key part of the product development team, those developing the software, were removed from the business unit, many important features of product development were lost. For example, funding was no longer incremental, based on stages and gates. Complete funding for product development had to be justified and allocated in order to get software development resources assigned. In turn, a complete specification and estimate of spending was required before work could begin. Software architects were no longer involved at the fuzzy front end of new product development, and the discovery loops that used to be part of product development were no longer acceptable.
Even as feedback loops were removed from product development, they became more important than ever, because the software development people on the product team were usually not familiar with the business, and worse, they were not even familiar with each other. In places where tacit knowledge and team cohesiveness are important, treating people as interchangeable resources just doesn’t work.
Scheduling
Prior to the reorganization, when software people were embedded in divisions, some people were regularly assigned to several projects at a time, while others appeared to be less than fully utilized. The company decided that it would be more effective to assign individuals to only one project at a time, and if possible, the projects should be small. Unfortunately, this created a scheduling nightmare, so XYZ invested in a computerized resource scheduling system to help sort out the complex resource assignments.
Computerized scheduling systems have a well known problem: they do not accommodate variation. XYZ discovered that if projects didn’t end when they were scheduled to end, the system’s assumptions were invalid, so the system’s resource assignments were often out of touch with reality. The company tried to fix such problems by keeping some teams intact and by holding weekly management meetings to arbitrate the conflicts between the computer’s schedule and reality. In practice, the overhead of management intervention and idle workers waiting for teams to assemble outweighed any efficiencies the system generated.
Company XYZ also tried to reduce the variability of project completion by urging teams to make reliable estimates and rewarding project managers who delivered on schedule. Unfortunately, such attempts to reduce variability generally don’t work. The reason for this becomes clear with a quick look at the theory of variation.
Variation
W. Edwards Deming[1] first popularized the theory of variation, which is now a cornerstone of Six Sigma programs. Deming taught that there are two kinds of variation: common variation and special variation. Common variation is inherent in the system, and special variation is something that can be discovered and corrected. Common variation can be measured and control charts can be used to keep the system within the predicted tolerances. But it is not possible for even the most dedicated workers to reduce common variation; the only way to reduce common variation is to change the system. And here’s the important point: Deming felt that most variation, (95%+)[2] is common variation, especially in systems where people are involved.
The other kind of variation is special variation, which is variation that can be attributed to a cause. Once the cause is determined, action can be taken to remove it. But there is danger here: “tampering” is taking action to remove common variation based on the mistaken belief that it is special variation. Deming insisted that tampering creates more problems that it fixes.
In summary: The overwhelming majority of variation is inherent in a system, and trying to remove that variation without changing the system only makes things worse. We can assume that most of the variation in project completion dates is common variation, but since computerized scheduling systems are deterministic, they can’t really deal with any variation. The bottom line: a computerized scheduling system will almost never work at the level of detail that XYZ was trying to use it. Exhorting workers to estimate more carefully and project mangers to be more diligent in meeting deadlines is not going to remove variation from projects. We need to change the rules of the game.
We know that estimates for large systems and for distant timeframes have a wide margin of uncertainty, made wider if the development team is an unknown. We should stop trying to change that; it is inherent in the system. If we want reliable estimates, we need to reduce the size of the work package being estimated and limit the estimate to the near future. Furthermore, estimates will be more accurate if the team implementing the system already exists, is familiar with the domain and technology, makes its own estimates covering a short period of time, and updates these estimates based on feedback. The good news is, once such a team establishes a track record, its variability can be measured and predicted.
Utilization
Unfortunately, Company XYZ believed that efficiency would be improved by increasing resource utilization. Trying to maximize utilization can have serious unexpected side effects, not the least of which is decreased efficiency and reduced utilization. If this seems odd, think about how efficient our highways are during rush hour. Most systems behave like traffic systems; as utilization of resources passes a critical point, non-linear effects take over, and everything slows to a crawl. Even the most brilliant scheduling system cannot prevent delays if you insist on 100% utilization.
When a computer operations manager looks at the utilization history of her equipment, she would never say: “Look at that – we’re only using 80% of our server capacity and 85% of our SAN’s. Let’s use them more efficiently!” She knows that such high utilization is a warning that the systems are operating on the edge of their capacity, and even now response times are probably slowing down.
But when a development manager takes a look at the utilization history of his department, he will often say: “Look at that – we are only using 95% of our available hours. We have enough free time to add another project!” At the same time, he is probably asking himself, “I wonder why I’m getting all these complaints about our response time?” And all too often his solution is, “We’ll just have to set more aggressive deadlines.”
Response Time
Consider the release manager of a software product. Assume she has service level agreement which calls for critical defects to be found and patched in four hours, serious defects to be found and patched in 48 hours, and normal defects to be fixed in the next monthly release. You can be sure that her primary measurement is response time and she adjusts staffing until the service level is achieved. Because of this, there will always be people available to attack defects, and occasionally people may have a bit of spare time.
In one of my classes, two teams did value stream maps for almost the same problem – deliver on a feature request which would take about 2 hours of coding. One team documented an average response time of 9 hours to deployment, the other team documented an average response time of 32 days. In the first case, the policy was: “When a request is approved, there will always be someone available to work on it.” In the second case, the request got stuck twice in two-week-long queues waiting for resources. The interesting thing is that the first organization actually did more work with fewer people, because they did not have to manage queues, customer queries, change requests and the like. They were more efficient despite, or perhaps because of, a focus on response time rather than resource utilization.
The bottom line is that managing response time, or time-to-market, is more efficient and more profitable than managing utilization. You need some slack to keep development and innovation flowing. As any good operations manager already knows, when work flows rapidly and reliably through an organization, its efficiency and utilization will be higher than in a organization jammed up with too much work.
Rules of the Game
Queuing theory gives us six rules for reducing software development cycle time:
- Limit work to capacity
- Even out the arrival of work
- Minimize the number of Things-in-Process
- Minimize the size of the Things-in-Process
- Establish a regular cadence
- Use pull scheduling
The biggest favor you can do for your organization is not to accept any more work than it can handle. Of course, to do this, you have to know the capacity of your organization. One way to estimate the capacity is to look at output. If you currently complete one large system a year, deliver about three services a quarter, and respond to about seven change requests per week, this is a rough approximation of your capacity, and a good limit on the amount of new work you should accept.
Next you might calculate how much work you have already accepted. In one of my classes, an executive did the math and discovered that he had seven years worth of work in a queue that was reviewed every week. He decided that he could toss out all but a few months of work; the rest would never get done, but it was consuming a lot of time.
2. Even out the arrival of work
At Company XYZ, one of the scheduling headaches was caused by a huge workload during the first six months of the year, and a relatively low demand for the second half of the year. At first this puzzled me, because the company’s business was not seasonal, so there seemed to be no reason for the uneven demand. I suspected that there was a sub-optimizing measurement somewhere that might be the cause.; When I asked if the annual budgeting cycle or executive performance measurement system might be driving uneven demand, my suspicions were confirmed. I recommended that the organization work to change the measurement system, rather than accommodate it.
3. Minimize the number of Things-in-Process
One of the basic laws of queuing theory is Little’s Law[3]:
According to Little’s Law there are two ways to improve response time: you can spend money to improve the Average Completion Rate, or you can apply intellectual fortitude to reduce the number of Things-in-Process. For example, assume you can respond to about six feature requests per month. If twelve requests are released for work, they will take an average of two months to complete. If, however, only three requests are released at a time, it will take an average of two weeks to respond to a feature request.
4. Minimize the size of the Things-in-Process
We’ve already noted the effect of high utilization on cycle time; we should also note that as batch size increases the effect is much more pronounced.[4] This is shown in the graph:
So if you want high utilization, you should develop in very small batches. For example, you will get much faster throughput and higher utilization if you develop ten services one at a time, rather than developing all ten at the same time.
5. Establish a regular cadence
In a lean factory, every process is runs at a regular cadence called ‘tact time.’ If you want to produce 80 cars in 8 hours, you produce 10 cars per hour, so one car rolls of the line every 6 minutes. In software development the recommended practice is to establish an iteration cadence of perhaps two weeks or a month, and deliver small batches of deployment-ready software every iteration.
A regular cadence, or ‘heartbeat,’ establishes the capability of a team to reliably deliver working software at a dependable velocity. An organization that delivers at a regular cadence has established its process capability and can easily measure its capacity.
A regular cadence also gives inter-dependent teams synchronization points that they can depend on. Synchronization points are good places to get customer feedback, they are useful for coordinating the work across multiple feature teams, and they can help decouple hardware development from software development.
6. Use pull scheduling
Once both batch and queue size have been reduced and a cadence has been established, pull scheduling is the best method to compensate for variation and limit work to capacity. At the beginning of an iteration, the team ‘pulls’ work from a prioritized queue. They pull only the amount of work that they have demonstrated they can complete in an iteration. When a team is first formed or the project is new, it may take a couple of iterations for the team to establish its ‘velocity’ (the amount of work it can complete in an iteration). But once the team hits its stride, it can reliably estimate how much work can be done in an iteration and that is the amount of work it pulls from the queue.
There are other points where queues might be established: there could be a queue of proposed work that needs a ROM (Rough Order of Magnitude estimate). There may be a queue of work for a preliminary architecture assessment. (See figure below.) Note that these queues should be short, and a team should not pull work from a queue until it has available time to do the work.
A pull system assures that everyone always has something to do (unless a queue is empty), but no one is overloaded. The development process is managed by managing the queues. Management intervention is accomplished by changing the priority or contents of the queues. The cadence should be fast enough that changes can wait until the next iteration; in which case, changes are accommodated at the cadence of the process.
Conclusion
Development teams can do a lot to control their own destiny. They can make sure they have the right information, the necessary skills, and the appropriate processes to do a good job. But some things that impact the performance of the development team are outside of their control. Managing the pipeline is one of those things. If a development organization is swamped with work, no amount of good intentions or good process can overcome the laws of physics. If deterministic rules are applied to an inherently variable system, no amount of exhortation, reward, or punishment can make the system work. When the rules of the game have to change, the six rules for reducing cycle time are a good place to start.
References
[1]W. Edwards Deming – 1900-1993. Thought leader of the Quality Movement in Japan and the US.
[2] W. Edwards Deming, The New Economics, Second Edition, MIT Press, 2000, p 38.
[3] Usually the numerator is WIP (Work-in-Process). The term Things-in-Process comes from Michael L. George and Stephen A. Wilson, Conquering Complexity in Your Business, McGraw-Hill, 2004, p.37.
[4] See Factory Physics by Wallace Hopp and Mark Spearman, McGraw-Hill, 2000
Disclaimer: XYZ Company is not a real company, it is an amalgamation of companies I have worked with.