Lean Essays: When Demand Exceeds Capacity

We are getting solar panels on our roof. Someday. We signed a contract last November and the original estimate for installation was May. Since no one does outside work in the winter here in Minnesota, this was about two months after the earliest possible installation date. But May has come and gone, and we still do not have an installation date. Why? The electrical design of for our 40-year-old house is complex and the young solar company has limited electrical design capability. So does our mature electrical utility company. Between the two of them, our project has been held up for a long time.

Solar systems have become widely popular since the electrical grid meltdown in Texas and the sharp rise in energy prices. So, it is not a surprise that the nascent solar industry is dealing with far more demand than the young solar companies and staid utility companies are equipped to handle. It is difficult to find enough talented people to do the complex system design, project management, and installation of solar systems. Thus, solar customers experience long delays that cost us months of potential savings, while they suffer from months of delayed revenue.

Our solar company has an eager sales force and a handful of project managers. Once contracts are signed, sales people hand them off to project managers so they can focus on pursuing more sales. Since the rate of signed contracts greatly exceeds the rate of completed installations, the number of customers assigned to each project manager is constantly increasing. It is impossible for a project manager to manage dozens of complex solar projects at the same time, but that is what they are expected to do.

Clearly, our solar company is unwilling to reduce its sales rate to match its project completion rate and unable to increase the completion rate to match the rate at which contracts are signed. Thus, the length of time a project takes continually increases, putting customers and project managers in a lose-lose situation. In fact, the solar industry in hardly the first industry in the world to experience far more demand that it has the capacity to supply; many other industries have navigated these waters. Typically, companies that do not learn how to manage excess demand end up failing because they generate annoyed customers faster than they generate revenue. Companies that became giants in rapidly growing industries – for example, Airbnb, AWS, Google, Netflix, and SpaceX – learned quickly to increase their delivery capacity to match the rate at which they generated sales. Successful solar companies will have to learn the same thing.

How do companies rapidly increase delivery capacity? Hiring more people might help, but it is not nearly enough. Young companies in a growth industry must figure out how to complete more jobs faster while doing less work or they will not be able to keep up with the rapid increase in demand. There are three essential strategies that successful industry giants have used to increase efficiency which other companies would do well to understand.

1. Focus on Customers Outcomes

The first strategy for increasing efficiency is to pay attention to the metrics that drive behavior. Startups tend to measure success by counting the number sales, while more efficient companies tend to measure the number of satisfied customers. When success is measured by the number of satisfied customers rather than booked revenue, several good things happen. The company learns not to accept work at a faster rate than it can complete work, because doing so does little more than stretch out the time it takes for its people to be successful while lengthening the queue of increasingly impatient customers. Work teams also learn to focus on completing installations as quickly as possible, because more installations will be seen as better performance. Finally, teams are unlikely to develop products that customers are not interested in – a very common source of wasted effort in startup companies.

It is extraordinarily difficult for an ambitious young company to turn away people that are beating a path to its door. Everyone’s instinct is to hire more salespeople and let the work pour in. But let’s face it, the most important thing that young company needs to learn is how to COMPETE work quickly, not how to ACCEPT work quickly. If it opens the doors wide and lets everyone in, there will be so many customers milling about that workers will be distracted trying to deal with the chaos rather than focused on getting work done. The most important thing for that young company to do at this point is to stop spending so much time on managing excess work and start concentrating on how to get jobs done as fast as they arrive.

Every growth industry has gone through this learning curve. Once upon a time, mail order catalogs (Sears, for example) accepted orders as fast as they could and shipped them in about two weeks. Because of the delay, they had to deal with people changing their mind and a complex order change system was necessary. Then L.L.Bean had a novel idea – why not ship products within 24 hours of receiving the order? It worked – customers were delighted, and no order change system was needed. Today almost all fulfillment systems work on the same principle – focus on building the capacity to ship product at the same rate orders arrive. Once shipping capacity matches demand, the company can ship immediately after an order arrives, delighting customers and reducing congestion while giving customers little time to change their minds.

2. Manage FLOW, not tasks

The second strategy for increasing efficiency is counterintuitive; it requires a shift in the company’s mental model of how work gets done. Consider our solar company. As a small startup, it thinks of every project as a separate entity and assigns a project manager to sequence the steps in an installation so all necessary work gets done. Unfortunately, this approach is not readily scalable, and even if it were, there are not enough experienced project managers available to hire. Industry leaders do things differently; they manage packages of items as they move through their system, not individual projects or tasks.

What does it mean to manage groups of items? Consider the shipping industry. Until the 1950’s, ships were loaded with individual items packed carefully to keep the ship stable. Each item had to be traced through every port as it moved off the ship and onto another ship or a train or a truck. Then the shipping container was invented, and it was no longer necessary to manage individual items moving through the supply chain. Instead, shippers managed containers as they moved from origin to destination.

Some years ago, we bought a sofa that was heavily discounted because of a missing headrest. A few months later the store called to say it was filling a container of furniture from their supplier in Norway and they could add our missing headrest for a small fee. It would have been too complex (and expensive) to order and track the headrest individually; but it was simply assigned to the container in which many items moved as a package from Norway to Minnesota. The cost of shipping was negligible. It's easy to see how containers created a reduction in complexity that dramatically reduced the cost of shipping goods and led to substantial changes in the global economy.

How do you group and manage workflow in a growing solar company? First, you think of a contract as a number of steps that need to be completed. Certainly, some of the steps can be done in parallel – for example solar panel layout can be done in parallel with electrical design. Other steps have prerequisites that must be completed first. You organize work so that each step has its own responsible team, its own queue, and its own completion rate. As each new contract is signed, it is coded with the various steps it must move through, including any required sequencing. Then you release the contract to the starting step (or steps). When a step is done, work should automatically move to the next step(s) until all work is complete. Your job is to manage the FLOW of work through the system rather that the progress of individual items through their various steps.

How do you manage the flow of work through a system? Start by managing the amount of time a job spends waiting for attention in the queue of each step. Queueing theory is clear: if you limit the length of a queue and send work through in FIFO (first-in-first-out) order, then time in a queue is the number of items in the queue divided by the average completion rate. For example, if you limit the items in a work queue to twenty and each item takes two days, work will wait in the queue – if it is FIFO – for no more than ten days. (Note: If a FIFO queue is not acceptable, then the queue is too long! Make it shorter.)

Once a job reaches the end of a queue, it should flow quickly through the work step. The best approach is for a responsible team begin a job, focus on it until it is done, then go on to the next job. Focusing on one thing at a time improves the quality of the work, eliminates wasted time due to multitasking and gets everything done faster. (See Single Responsibility Teams, below.)

If a step in the workflow is completed inside of your company, you have a lot of control over the rate at which work moves through the step (queue time + work time). Your goal is to create short, predictable completion rates for every internal step. If a step occurs outside your company (the utility company, for example, or the city permitting process) you do not have control of the time that takes, but you can measure it – you should track average time and variation. Note that it is important to separate internal steps from external steps – otherwise variations in the external completion rate will make it impossible to control internal rates of completion.

It is also important to ensure that downstream steps have capacity in their queues to handle work as it is completed upstream. If this sounds complicated, an analogy might be helpful: Consider an order coming into Amazon. Once upon a time, orders were handled by a central database which directed each step in the processing of orders as they moved through the company. But in the early 2000’s Amazon outgrew this process; they could not buy large enough database computers to keep up with surges in demand. Amazon needed a new approach, so it abandoned the central databases and thought of each order as moving through a series of independent services. When an order is placed, it is coded with an initial chain of necessary services (Do we need to do a credit check? Does the product require special shipping? ...) and sent on its way through the various processing steps. A service can divert the order or add additional steps (Oops, product is out of stock! ...) and downstream processes are expected to have the capacity to accept work from upstream processes. (If the product can be picked from a warehouse, it should be possible to package and ship it immediately.)

Armed with a set of steps each contract needs and the time each step will take, you, like Amazon, should be able to predict the expected installation date very early – ideally when the contract is signed. You should also have a handle on variation due to outside queues and dependencies such as parts availability and supply chain delays. You can send the contract into your workflow, track how well it is coming along and flag exceptions as soon as they occur.

In addition, you can spot bottlenecks that need to be improved and generally manage the capacity of your organization. You can be alerted when you need to limit sales to avoid overcommitment: Do not accept work faster than the rate at which work moves through your bottleneck process. Focus on the bottleneck until it is improved, when of course a new bottleneck will appear. Find the next bottleneck and limit sales of projects that use it to the rate the bottleneck can handle. Repeat...

By far the biggest benefit of managing the flow of groups of items is this: You will get a lot more done with a lot less effort. How can this be? When you no longer have half-done work clogging up your system, demanding attention, and distracting workers, everyone can spend much more of their time actually getting useful work done.

3. Form Single-Responsibility Teams

The third big change you should make is to charter single-responsibility teams, organized and led by single-responsibility leaders. What is a single-responsibility team? It is a team that has one and only one thing to do, a team that can focus all its efforts on accomplishing one thing at a time and doing that one thing well. Be careful to minimize dependencies among these teams; each team should be able to begin and complete its work independently; teams should not need to coordinate with or get permission from managers or other teams. Finally, work should not appear in the queue of a single-responsibility team until everything is in place for that work to be done, otherwise the team will have to finish someone else’s work before they can focus on their own.

How is it possible to organize complex work into pieces that can be accomplished by single-responsibility teams? While there is no such thing as the “best” organizational structure for companies in diverse domains, it is easy to find examples that demonstrate the concept. Let’s start with AWS (Amazon Web Services). Over the past decade AWS has replaced most enterprise software companies with an array of independent services, each managed by a single-responsibility team and strung together through hardened interfaces. No one has ever seen an AWS roadmap; instead, teams identify (or are given) a customer problem and they start by writing a “press release” describing how their solution will solve the problem. Then they proceed to develop the solution, testing it with customers as often as possible. AWS has released multiple new enterprise-level services every year since 2012, each aimed at solving a specific customer problem. These services have gradually lured companies away from the complex, integrated packages of traditional vendors, often prompting AWS customers to organize their own work around autonomous, focused teams.

Now let’s look at SpaceX, which is organized very differently than AWS. SpaceX is fundamentally an engineering organization; its teams are organized around the components of booster rockets that launch things into space. There is a team for each stage of the booster (eg. payload, stage 1, stage 2, engine cluster…) with sub-teams for each sub-component of the stage (Merlin engine, landing legs…).

You might wonder: How do teams know what to do and when to do it? How do they coordinate with other teams? How do they know they have done a good job? SpaceX operates on the principle of responsibility, which says that each team / sub-team is expected to understand the role its component must play in the next launch and have it ready by the scheduled launch date (perhaps a couple of months away). The team, led by a responsible engineer, has one job: Ensure that the component works as well as possible during the launch and does its job as part of the overall system.

How does SpaceX manage FLOW? For a large, multi-team development effort involving both hardware and software, managing flow is usually accomplished by scheduling a series of frequent integration events. The purpose of each event is to take a well-defined step forward in the overall development effort to learn what works, locate unintended integration effects, and discover ways to improve the system. A series of integration events sets the cadence for all teams involved in the development effort. In the case of SpaceX a test launch is scheduled every few months throughout a development cycle; importantly, launch dates Do. Not. Change. Each launch is an integration event that moves all teams forward together, creating a steady flow of real progress toward a working system.

Of necessity, based on the structure of a booster, SpaceX teams have more dependencies than AWS teams. They also have a clear responsibility to understand and account for those dependencies. Since SpaceX teams work to meet fixed launch deadlines, their systems sometimes fail during a launch. Failures are part of the process, but teams must carefully document launch behavior to be sure that the cause of any failure is identified and eliminated. Believe it or not, this approach to development is much faster and less costly (and usually safer!) than trying to think through every failure mode and eliminate any possibility of failure before attempting a test launch.

Does SpaceX focus on customer outcomes? The company undertook the difficult challenge of developing a reusable booster because it was clear that sending things into space was too expensive for many potential customers. Reusing boosters was an essential step in dramatically reducing the cost of a launch, significantly increasing the number of customers with access to space. This need to reduce launch costs has driven most major booster development decisions at SpaceX.

SpaceX’s reusable booster has reduced the cost of launching a kilo of payload into space by a factor of 7, while spending an order of magnitude less money on development than previous booster rockets.

Summary

A lot of companies have more work than they can possibly handle – and that is not a good thing. It creates long delays, excess complexity, and eventually, dissatisfied customers. The secret is to limit demand to capacity and then increase capacity to meet additional demand by creating a steady flow of work while teams focus on a single responsibility. You also need to stop doing work that does not contribute to great customer outcomes: Stop multitasking, stop prioritizing, get rid of backlogs, stop shepherding individual tasks. If companies as large and successful as AWS and SpaceX can organize work around customer outcomes, steady flow, and focused, single-responsibility teams – then so can you.

June 11, 2022

When Demand Exceeds Capacity