Getting measurements right can be devilishly difficult, but getting them wrong can be downright dangerous. If you look underneath most self-defeating behavior in organizations, you will often find a well-intentioned measurement which has gone wrong. Consider the rather innocent-sounding measurement of productivity, and it’s close cousin, utilization. One of the biggest impediments to adopting Just-in-Time manufacturing was the time-honored practice of trying to extract maximum productivity out of every machine. The inevitable result was that mounds of inventory collected to feed machines and additional piles of inventory stacked up at the output side of the machines. The long queues of material slowed everything down, as always queues do. Quality problems often took days to surface, and customer orders often took weeks to fill. Eventually manufacturing people learned that running machines for maximum productivity was a sub-optimizing practice, but it was a difficult lesson.
As software development organizations search for productivity on today’s tight economy, we see the same lesson being learned again. Consider the testing department which is expected to run at 100% utilization. Mounds of code tend to accumulate at the input side of the testing department, and piles completed tests stack up at the output side of the testing department. Many defects lurk in the mountain of code, and more are being created by developers who do not have immediate feedback on their work. When a testing department is expected to run at full utilization, the likely result will be an increased defect level, resulting in more work for the testing department.
Nucor Steel grew from a startup in 1968 into a $4 billion giant, attributing much of its success to an incentive pay system based on productivity. Productivity? How did Nucor keep their productivity measurement robust and honest throughout all of that growth? How did they avoid the sub-optimization so common most productivity measurements?
The secret is that Nucor measures productivity at a team level, not at an individual level. For example, a plant manager is not rewarded on the productivity of his or her plant, but on the productivity of all plants. The genius of Nucor’s productivity measurement is that it avoids sub-optimization by measuring results at one level higher than one would expect, thus encouraging knowledge sharing and system-wide optimization.
How can this be fair? How can plant managers be rewarded based on productivity of plants over which they have no control? The problem is, if we measure people solely on results over which they have full control, they have little incentive to collaborate beyond their own sphere of influence to optimize the overall business. While local measurements may seem fair to individuals, they are hardly fair to the organization as a whole.
Measure-UP, the practice of measuring results at the team rather than the individual level, keeps measurements honest and robust. The simple act of raising a measurement one level up from the level over which an individual has control changes its dynamic from a personal performance measurement to a system effectiveness indicator.
In the book “Measuring and Managing Performance in Organizations”, Dorset House 1996, Robert Austin discusses the dangers of performance measurements. The beauty of performance measurements is that “You get what you measure.” The problem with performance measurements is that “You get only what you measure, nothing else.” You tend to loose the things that you can’t measure: insight, collaboration, creativity, dedication to customer satisfaction.
Austin recommends aggregating individual performance measurements into higher level informational measures that hide individual results in favor of group results. As radical as this may sound, it is not unfamiliar. Edward Demming, the noted quality expert, insisted that most quality defects are not caused by individuals, but by management systems that make error-free performance all but impossible. Attributing defects to individuals does little to address the systemic causes of defects, and placing blame on individuals when the problem is systematic perpetuates the problem.
Software defect measurements are frequently attributed to individual developers, but the development environment often conspires against individual developers and makes it impossible to write defect-free code. Instead of charting errors by developer, a systematic effort to provide developers with immediate testing feedback, along with a root cause analysis of remaining defects, is much more effective at reducing the overall software defect rate.
By aggregating defect counts into an informational measurement, and hiding individual performance measurements, it becomes easier to address the root causes of defects. If an entire development team, testers and developers alike, feel responsible for the defect count, then testers will tend to become involved earlier and provide more timely and useful feedback to developers. Defects caused by code integration will become everyone’s problem, not just the unlucky person who wrote the last bit of code.
It flies in the face of conventional wisdom to suggest that the most effective way to avoid the pitfalls of measurements is to use measurements that are outside the personal control of the individual being measured. But conventional wisdom is misleading. Instead of making sure that people are measured within their span of control, it is more effective to measure people one level above their span of control. This is the best way to encourage teamwork, collaboration, and global, rather than local, optimization.
Screen Beans Art, © A Bit Better Corporation