Lean Essays: Cadence

Showing posts with label Cadence. Show all posts

June 11, 2022

When Demand Exceeds Capacity

We are getting solar panels on our roof. Someday. We signed a contract last November and the original estimate for installation was May. Since no one does outside work in the winter here in Minnesota, this was about two months after the earliest possible installation date. But May has come and gone, and we still do not have an installation date. Why? The electrical design of for our 40-year-old house is complex and the young solar company has limited electrical design capability. So does our mature electrical utility company. Between the two of them, our project has been held up for a long time.

Solar systems have become widely popular since the electrical grid meltdown in Texas and the sharp rise in energy prices. So, it is not a surprise that the nascent solar industry is dealing with far more demand than the young solar companies and staid utility companies are equipped to handle. It is difficult to find enough talented people to do the complex system design, project management, and installation of solar systems. Thus, solar customers experience long delays that cost us months of potential savings, while they suffer from months of delayed revenue.

Our solar company has an eager sales force and a handful of project managers. Once contracts are signed, sales people hand them off to project managers so they can focus on pursuing more sales. Since the rate of signed contracts greatly exceeds the rate of completed installations, the number of customers assigned to each project manager is constantly increasing. It is impossible for a project manager to manage dozens of complex solar projects at the same time, but that is what they are expected to do.

Clearly, our solar company is unwilling to reduce its sales rate to match its project completion rate and unable to increase the completion rate to match the rate at which contracts are signed. Thus, the length of time a project takes continually increases, putting customers and project managers in a lose-lose situation. In fact, the solar industry in hardly the first industry in the world to experience far more demand that it has the capacity to supply; many other industries have navigated these waters. Typically, companies that do not learn how to manage excess demand end up failing because they generate annoyed customers faster than they generate revenue. Companies that became giants in rapidly growing industries – for example, Airbnb, AWS, Google, Netflix, and SpaceX – learned quickly to increase their delivery capacity to match the rate at which they generated sales. Successful solar companies will have to learn the same thing.

How do companies rapidly increase delivery capacity? Hiring more people might help, but it is not nearly enough. Young companies in a growth industry must figure out how to complete more jobs faster while doing less work or they will not be able to keep up with the rapid increase in demand. There are three essential strategies that successful industry giants have used to increase efficiency which other companies would do well to understand.

1. Focus on Customers Outcomes

The first strategy for increasing efficiency is to pay attention to the metrics that drive behavior. Startups tend to measure success by counting the number sales, while more efficient companies tend to measure the number of satisfied customers. When success is measured by the number of satisfied customers rather than booked revenue, several good things happen. The company learns not to accept work at a faster rate than it can complete work, because doing so does little more than stretch out the time it takes for its people to be successful while lengthening the queue of increasingly impatient customers. Work teams also learn to focus on completing installations as quickly as possible, because more installations will be seen as better performance. Finally, teams are unlikely to develop products that customers are not interested in – a very common source of wasted effort in startup companies.

It is extraordinarily difficult for an ambitious young company to turn away people that are beating a path to its door. Everyone’s instinct is to hire more salespeople and let the work pour in. But let’s face it, the most important thing that young company needs to learn is how to COMPETE work quickly, not how to ACCEPT work quickly. If it opens the doors wide and lets everyone in, there will be so many customers milling about that workers will be distracted trying to deal with the chaos rather than focused on getting work done. The most important thing for that young company to do at this point is to stop spending so much time on managing excess work and start concentrating on how to get jobs done as fast as they arrive.

Every growth industry has gone through this learning curve. Once upon a time, mail order catalogs (Sears, for example) accepted orders as fast as they could and shipped them in about two weeks. Because of the delay, they had to deal with people changing their mind and a complex order change system was necessary. Then L.L.Bean had a novel idea – why not ship products within 24 hours of receiving the order? It worked – customers were delighted, and no order change system was needed. Today almost all fulfillment systems work on the same principle – focus on building the capacity to ship product at the same rate orders arrive. Once shipping capacity matches demand, the company can ship immediately after an order arrives, delighting customers and reducing congestion while giving customers little time to change their minds.

2. Manage FLOW, not tasks

The second strategy for increasing efficiency is counterintuitive; it requires a shift in the company’s mental model of how work gets done. Consider our solar company. As a small startup, it thinks of every project as a separate entity and assigns a project manager to sequence the steps in an installation so all necessary work gets done. Unfortunately, this approach is not readily scalable, and even if it were, there are not enough experienced project managers available to hire. Industry leaders do things differently; they manage packages of items as they move through their system, not individual projects or tasks.

What does it mean to manage groups of items? Consider the shipping industry. Until the 1950’s, ships were loaded with individual items packed carefully to keep the ship stable. Each item had to be traced through every port as it moved off the ship and onto another ship or a train or a truck. Then the shipping container was invented, and it was no longer necessary to manage individual items moving through the supply chain. Instead, shippers managed containers as they moved from origin to destination.

Some years ago, we bought a sofa that was heavily discounted because of a missing headrest. A few months later the store called to say it was filling a container of furniture from their supplier in Norway and they could add our missing headrest for a small fee. It would have been too complex (and expensive) to order and track the headrest individually; but it was simply assigned to the container in which many items moved as a package from Norway to Minnesota. The cost of shipping was negligible. It's easy to see how containers created a reduction in complexity that dramatically reduced the cost of shipping goods and led to substantial changes in the global economy.

How do you group and manage workflow in a growing solar company? First, you think of a contract as a number of steps that need to be completed. Certainly, some of the steps can be done in parallel – for example solar panel layout can be done in parallel with electrical design. Other steps have prerequisites that must be completed first. You organize work so that each step has its own responsible team, its own queue, and its own completion rate. As each new contract is signed, it is coded with the various steps it must move through, including any required sequencing. Then you release the contract to the starting step (or steps). When a step is done, work should automatically move to the next step(s) until all work is complete. Your job is to manage the FLOW of work through the system rather that the progress of individual items through their various steps.

How do you manage the flow of work through a system? Start by managing the amount of time a job spends waiting for attention in the queue of each step. Queueing theory is clear: if you limit the length of a queue and send work through in FIFO (first-in-first-out) order, then time in a queue is the number of items in the queue divided by the average completion rate. For example, if you limit the items in a work queue to twenty and each item takes two days, work will wait in the queue – if it is FIFO – for no more than ten days. (Note: If a FIFO queue is not acceptable, then the queue is too long! Make it shorter.)

Once a job reaches the end of a queue, it should flow quickly through the work step. The best approach is for a responsible team begin a job, focus on it until it is done, then go on to the next job. Focusing on one thing at a time improves the quality of the work, eliminates wasted time due to multitasking and gets everything done faster. (See Single Responsibility Teams, below.)

If a step in the workflow is completed inside of your company, you have a lot of control over the rate at which work moves through the step (queue time + work time). Your goal is to create short, predictable completion rates for every internal step. If a step occurs outside your company (the utility company, for example, or the city permitting process) you do not have control of the time that takes, but you can measure it – you should track average time and variation. Note that it is important to separate internal steps from external steps – otherwise variations in the external completion rate will make it impossible to control internal rates of completion.

It is also important to ensure that downstream steps have capacity in their queues to handle work as it is completed upstream. If this sounds complicated, an analogy might be helpful: Consider an order coming into Amazon. Once upon a time, orders were handled by a central database which directed each step in the processing of orders as they moved through the company. But in the early 2000’s Amazon outgrew this process; they could not buy large enough database computers to keep up with surges in demand. Amazon needed a new approach, so it abandoned the central databases and thought of each order as moving through a series of independent services. When an order is placed, it is coded with an initial chain of necessary services (Do we need to do a credit check? Does the product require special shipping? ...) and sent on its way through the various processing steps. A service can divert the order or add additional steps (Oops, product is out of stock! ...) and downstream processes are expected to have the capacity to accept work from upstream processes. (If the product can be picked from a warehouse, it should be possible to package and ship it immediately.)

Armed with a set of steps each contract needs and the time each step will take, you, like Amazon, should be able to predict the expected installation date very early – ideally when the contract is signed. You should also have a handle on variation due to outside queues and dependencies such as parts availability and supply chain delays. You can send the contract into your workflow, track how well it is coming along and flag exceptions as soon as they occur.

In addition, you can spot bottlenecks that need to be improved and generally manage the capacity of your organization. You can be alerted when you need to limit sales to avoid overcommitment: Do not accept work faster than the rate at which work moves through your bottleneck process. Focus on the bottleneck until it is improved, when of course a new bottleneck will appear. Find the next bottleneck and limit sales of projects that use it to the rate the bottleneck can handle. Repeat...

By far the biggest benefit of managing the flow of groups of items is this: You will get a lot more done with a lot less effort. How can this be? When you no longer have half-done work clogging up your system, demanding attention, and distracting workers, everyone can spend much more of their time actually getting useful work done.

3. Form Single-Responsibility Teams

The third big change you should make is to charter single-responsibility teams, organized and led by single-responsibility leaders. What is a single-responsibility team? It is a team that has one and only one thing to do, a team that can focus all its efforts on accomplishing one thing at a time and doing that one thing well. Be careful to minimize dependencies among these teams; each team should be able to begin and complete its work independently; teams should not need to coordinate with or get permission from managers or other teams. Finally, work should not appear in the queue of a single-responsibility team until everything is in place for that work to be done, otherwise the team will have to finish someone else’s work before they can focus on their own.

How is it possible to organize complex work into pieces that can be accomplished by single-responsibility teams? While there is no such thing as the “best” organizational structure for companies in diverse domains, it is easy to find examples that demonstrate the concept. Let’s start with AWS (Amazon Web Services). Over the past decade AWS has replaced most enterprise software companies with an array of independent services, each managed by a single-responsibility team and strung together through hardened interfaces. No one has ever seen an AWS roadmap; instead, teams identify (or are given) a customer problem and they start by writing a “press release” describing how their solution will solve the problem. Then they proceed to develop the solution, testing it with customers as often as possible. AWS has released multiple new enterprise-level services every year since 2012, each aimed at solving a specific customer problem. These services have gradually lured companies away from the complex, integrated packages of traditional vendors, often prompting AWS customers to organize their own work around autonomous, focused teams.

Now let’s look at SpaceX, which is organized very differently than AWS. SpaceX is fundamentally an engineering organization; its teams are organized around the components of booster rockets that launch things into space. There is a team for each stage of the booster (eg. payload, stage 1, stage 2, engine cluster…) with sub-teams for each sub-component of the stage (Merlin engine, landing legs…).

You might wonder: How do teams know what to do and when to do it? How do they coordinate with other teams? How do they know they have done a good job? SpaceX operates on the principle of responsibility, which says that each team / sub-team is expected to understand the role its component must play in the next launch and have it ready by the scheduled launch date (perhaps a couple of months away). The team, led by a responsible engineer, has one job: Ensure that the component works as well as possible during the launch and does its job as part of the overall system.

How does SpaceX manage FLOW? For a large, multi-team development effort involving both hardware and software, managing flow is usually accomplished by scheduling a series of frequent integration events. The purpose of each event is to take a well-defined step forward in the overall development effort to learn what works, locate unintended integration effects, and discover ways to improve the system. A series of integration events sets the cadence for all teams involved in the development effort. In the case of SpaceX a test launch is scheduled every few months throughout a development cycle; importantly, launch dates Do. Not. Change. Each launch is an integration event that moves all teams forward together, creating a steady flow of real progress toward a working system.

Of necessity, based on the structure of a booster, SpaceX teams have more dependencies than AWS teams. They also have a clear responsibility to understand and account for those dependencies. Since SpaceX teams work to meet fixed launch deadlines, their systems sometimes fail during a launch. Failures are part of the process, but teams must carefully document launch behavior to be sure that the cause of any failure is identified and eliminated. Believe it or not, this approach to development is much faster and less costly (and usually safer!) than trying to think through every failure mode and eliminate any possibility of failure before attempting a test launch.

Does SpaceX focus on customer outcomes? The company undertook the difficult challenge of developing a reusable booster because it was clear that sending things into space was too expensive for many potential customers. Reusing boosters was an essential step in dramatically reducing the cost of a launch, significantly increasing the number of customers with access to space. This need to reduce launch costs has driven most major booster development decisions at SpaceX.

SpaceX’s reusable booster has reduced the cost of launching a kilo of payload into space by a factor of 7, while spending an order of magnitude less money on development than previous booster rockets.

Summary

A lot of companies have more work than they can possibly handle – and that is not a good thing. It creates long delays, excess complexity, and eventually, dissatisfied customers. The secret is to limit demand to capacity and then increase capacity to meet additional demand by creating a steady flow of work while teams focus on a single responsibility. You also need to stop doing work that does not contribute to great customer outcomes: Stop multitasking, stop prioritizing, get rid of backlogs, stop shepherding individual tasks. If companies as large and successful as AWS and SpaceX can organize work around customer outcomes, steady flow, and focused, single-responsibility teams – then so can you.

July 15, 2011

How Cadence Predicts Process

If you want to learn a lot about a software development organization very quickly, there are a few simple questions you might ask. You might find out if the organization focuses on projects or products. You might look into what development process it uses. But perhaps most the revealing question is this: How far apart are the software releases?

It is rare that new software is developed from scratch; typically existing software is expanded and modified, usually on a regular basis. As a result, most software development shops that we run into are focused on the next release, and very often releases are spaced out at regular intervals. We have discovered that a significant differentiator between development organizations is the length of that interval – the length of the software release cycle.

Organizations with release cycles of six months to a year (or more) tend to work like this: Before a release cycle begins, time is spent deciding what will be delivered in the next release. Estimates are made. Managers commit. Promises are made to customers. And then the development team is left to make good on all of those promises. As the code complete date gets closer, emergencies arise and changes have to be made, and yet, those initial promises are difficult to ignore. Pressure increases.

If all goes according to plan, about two-thirds of the way through the release cycle, code will be frozen for system integration testing (SIT) and user acceptance testing (UAT). Then the fun begins, because no one really knows what sort of unintended interactions will be exposed or how serious the consequences of those interactions will be. It goes without saying that there will be defects; the real question is, can all of the critical defects be found and fixed before the promised release date?

Releases are so time-consuming and risky that organizations tend to extend the length of their release cycle so as not to have to deal with this pain too often. Extending the release cycle invariably increases the pain, but at least the pain occurs less frequently. Counteracting the tendency to extend release cycles is the rapid pace of change in business environments that depend on software, because the longer release cycles become a constraint on business flexibility. This clash of cadences results in an intense pressure to cram as many features as possible into each release. As lengthy release cycles progress, pressure mounts to add more features, and yet the development organization is expected to meet the release date at all costs.

Into this intensely difficult environment a new idea often emerges – why not shorten the release cycle, rather than lengthen it? This seems like an excellent way to break the death spiral, but it isn’t as simple as it seems. The problem, as Kent Beck points out in his talk “Software G Forces: the Effects of Acceleration,” is that shorter release cycles demand different processes, different sales strategies, different behavior on the part of customers, and different governance systems. These kinds of changes are notoriously difficult to implement.

Quick and Dirty Value Stream Map
I’m standing in front of a large audience. I ask the question: “Who here has a release cycle longer than three months?” Many hands go up. I ask someone whose hand is up, “How long is your release cycle?” She may answer, “Six months.” “Let me guess how much time you reserve for final integration, testing, hardening, and UAT,” I say. “Maybe two months?” If she had said a year, I would have guessed four months. If she had said 18 months, I would have guessed 6 months. And my guess would be very close, every time. It seems quite acceptable to spend two-thirds of a release cycle building buggy software and the last third of the cycle finding and fixing as many of those bugs as possible.

The next question I ask is: When do you decide what features should go into the release? Invariably when the release cycle is six months or longer, the answer is: “Just before we start the cycle.” Think about a six month release cycle: For the half year prior to the start of the cycle, demand for new or changed features has been accumulating – presumably at a steady pace. So the average wait of a feature to even be considered for development is three months – half of the six month cycle time. Thus – on the average – it will take a feature three months of waiting before the cycle begins, plus six months of being in development and test before that feature is released to customers; nine months in all.

Finally I ask, “About how many features might you develop during a six month release cycle?” Answers to this vary widely from one domain to another, but let’s say I am told that that about 25 features are developed in a six month release, which averages out to about one feature per week.

This leaves us with a quick and dirty vision of the value stream: a feature takes a week to develop and best case it takes nine months (38 weeks) to make it through the system. So the process efficiency is 1÷38, or about 2.6%. A lot of this low efficiency can be attributed to batching up 25 features in a single release. A lot more can be attributed to the fact that only 4 months of the 9 total months are actually spent developing software – the rest of the time is spent waiting for a release cycle to start or waiting for integration testing to finish.

Why not Quarterly Releases?
With such dismal process efficiency, let’s revisit to the brilliant idea of shortening release cycles. The first problem we encounter is that at a six month cadence, integration testing generally takes about two months; if releases are annual, integration testing probably takes three or four months. This makes quarterly releases quite a challenge.

For starters, the bulk of the integration testing is going to have to be automated. However, most people rapidly discover that their code base is very difficult to test automatically, because it wasn’t designed or written to be tested automatically. If this sounds like your situation, I recommend that you read Gojko Adzic’s book “Specification by Example.” You will learn to think of automated tests as executable specifications that become living documentation. You will not be surprised to discover that automating integration tests is technically challenging, but the detailed case studies of successful teams will give you guidance on both the benefits and the pitfalls of creating a well architected integration test harness.

Once you have the beginnings of an automated integration test harness in place, you may as well start using it frequently, because its real value is to expose problems as soon as possible. But you will find that code needs to “done” in order to be tested in this harness, otherwise you will get a lot of false negatives. Thus all teams contributing to the release would do well to work in 2-4 week iterations and bring their code to a state that can be checked by the integration test harness at the end of every iteration. Once you can reasonably begin early, frequent integration testing, you will greatly reduce final integration time, making quarterly releases practical.

Be careful, however, not to move to quarterly releases without thinking through all of the implications. As Kent Beck noted in his Software G Forces talk, sales and support models at many companies are based on annual maintenance releases. If you move from an annual to a quarterly release, your support model will have to change for two reasons: 1) customers will not want to purchase a new release every quarter, and 2) you will not be able to support every single release over a long period of time. You might consider quarterly private releases with a single annual public release, or you might want to move to a subscription model for software support. In either case, you would be wise not to guarantee long term support for more than one release per year, or support will rapidly become very expensive.

From Quarterly to Monthly Releases
Organizations that have adjusted their processes and business models to deal with a quarterly release cycle begin to see the advantages of shorter release cycles. They see more stability, more predictability, less pressure, and they can be more responsive to their customers. The question then becomes, why not increase the pace and release monthly? They quickly discover that an additional level of process and business change will be necessary to achieve the faster cycle time because four weeks – twenty days – from design to deployment is not a whole lot of time.

At this cadence, as Kent Beck notes, there isn’t time for a lot of information to move back and forth between different departments; you need a development team that includes analysts and testers, developers and build specialists. This cross-functional team synchronizes via short daily meetings and visualization techniques such as cards and charts on the wall – because there simply isn’t time for paper-based communication. The team adopts processes to ensure that the code base always remains defect-free, because there isn’t time to insert defects and then remove them later. Both TDD (Test Driven Development) and SBE (Specification by Example) become essential disciplines.

From a business standpoint, monthly releases tend to work best with software-as-a-service (SaaS). First of all, pushing monthly releases to users for them to install creates huge support headaches and takes far too much time. Secondly, it is easy to instrument a service to see how useful any new feature might be, giving the development team immediate and valuable feedback.

Weekly / Daily Releases
There are many organizations that consider monthly releases a glacial pace, so they adopt weekly or even daily releases. At a weekly or daily cadence, iterations become largely irrelevant, as does estimating and commitment. Instead, a flow approach is used; features flow from design to done without pause, and at the end of the day or week, everything that is ready to be deployed is pushed to production. This rapid deployment is supported by a great deal of automation and requires a great deal of discipline, and it is usually limited to internal or SaaS environments.

There are a lot of companies doing daily releases; for example, one of our customers with a very large web-based business has been doing daily releases for five years. The developers at this company don’t really relate to the concept of iterations. They work on something, push it to systems test, and if it passes it is deployed at the end of the day. Features that are not complete are hidden from view until a keystone is put in place to expose the feature, but code is deployed daily, as it is written. Occasionally a roll-back is necessary, but this is becoming increasingly rare as the test suites improve. Managers at the company cannot imagine working in at a slower cadence; they believe that daily deployment increases predictability, stability, and responsiveness – all at the same time.

Continuous Delivery
Jez Humble and David Farley wrote “Continuous Delivery” to share with the software development community techniques they have developed to push code to the production environment as soon as it is developed and tested. But continuous delivery is not just a matter of automation. As noted above, sales models, pricing, organizational structure and the governance system all merit thoughtful consideration.

Every step of your software delivery process should operate at the same cadence. For example, with continuous delivery, portfolio management becomes a thing of the past; instead people make frequent decisions about what should be done next. Continuous design is necessary to keep pace with the downstream development, validation and verification flow. And finally, measurements of the “success” of software development have to be based on delivered value and improved business performance, because there is nothing else left to measure.