Saturday, January 15, 2011

A Tale of Two Terminals

By any measure, Terminal 3 at Beijing Capital Airport is impressive. Built in less than four years and officially opened barely four months before the Olympics, the massive terminal has received numerous awards for both its stunning design and its comfortable atmosphere. And it escaped the start-up affliction of many new airport terminals when it commenced full operations on March 26, 2008 without any notable problems.

The next day, half way around the world, Heathrow Terminal 5 opened for business. At one-third the size of Beijing Terminal 3, the new London terminal had taken twice as long to build and cost twice as much. Proud executives at British Airlines and BAA (British Airports Authority) exuded confidence in a flawless opening, but that was not to be. Instead, hundreds of flights were canceled in the first few days of operation, and about 28,000 bags went missing the first weekend. The chaotic opening of Heathrow Terminal 5 was such an embarrassment that it triggered a government investigation.

The smooth opening of Beijing Terminal 3 was not an anomaly – Terminal 2 at Shanghai Pudong International Airport opened the same day, also without newsworthy incident. Given the timing just before the Beijing Olympic games, it was clear that China was keenly interested in projecting an image of competence to the traveling public. But of course, the UK was equally interested in showcasing its proficiency, and the British executives clearly expected that the opening of Heathrow Terminal 5 would go smoothly. So the question to ponder is this: How did the Chinese airports manage two uneventful terminal openings? Did they do something different, or were the problems in London just bad luck?

It’s not like testing was overlooked at Heathrow Terminal 5. In fact, a simulation of the terminal’s systems was developed and all of the technical systems were tested exhaustively, even before they were built. A special testing committee was formed and thousands of people were recruited to be mock passengers, culminating in a test with 2000 volunteer passengers a few weeks before the terminal opened. On the other hand, the planned testing regime was curtailed because the terminal construction was not completed as early as planned; in fact, hard-hats were required in the baggage handling area until shortly before opening day. In addition, a decision was made to move 70% of the flights targeted for Terminal 5 on the very first day of operations, because it was difficult to imagine how to move in smaller increments.[1]

Those of us in the software industry have heard this story before: the time runs out for system testing, but a big-bang cut-over to a mission critical new system proceeds anyway, because the planned date just can’t be delayed. The result is predictable: wishful thinking gives way to various degrees of disaster.

That kind of disaster wasn’t going to happen in Beijing. I was in Beijing a month before the Olympics, and every single person I met – from tour guide to restaurant worker – seemed to feel personally responsible for projecting a favorable image as the eyes of the world focused on their city. I imagine that for every worker in Terminal 3, a smooth startup was a matter of national pride. But the terminal didn’t open smoothly just because everyone wanted things to go well. It opened smoothly because the airport authorities understood how to orchestrate such a large, complex undertaking that involved hundreds of people. After all, they had just finished building the airport at amazing speed.[2]

The opening ceremony of the Beijing Olympics was also a large, complex undertaking that involved hundreds of people. It’s easy to imagine the many rehearsals that took place to make sure that everyone knew their part. When it comes to opening a new terminal, the idea of rehearsals doesn’t usually occur to the authorities, but at Beijing Capital Airport, rehearsals started in early February. First a couple of thousand mock passengers took part in a rehearsal, then five thousand, and finally, on February 23rd, 8000 mock passengers checked in luggage for 146 flights. This was the average daily load expected a week later, when six minor airlines moved into Terminal 3. During the month of March, Terminal 3 operated on a trial basis, ironing out any problems that arose. Meanwhile, staff from the large airlines about to move to the terminal rehearsed their jobs in the new terminal day after day, so that when the big moving day arrived, everyone knew what to do. On March 26, all the practice paid off when the Terminal was opened with very few problems.

This certainly wasn’t the approach taken at Heathrow Terminal 5. It’s pretty clear that the opening chaos was caused because people did not know what to do: they didn’t know where to park, couldn’t get through security, didn’t know how to sign on to the new PDA’s to get their work assignments, didn’t know where to get help, and didn’t know how to stop all the luggage from coming at them until their problems got sorted out. Even the worst of the technical problems was actually a people problem: the baggage handling software had been put in a ‘safe’ mode for testing, and apparently no one was responsible for removing the patch which cut off communication to other terminals in the airport. It took three days to realize that this very human error was the main cause of the software problems![3]

In testimony to the British House of Commons, union Shop Steward Iggy Vaid testified:[4]
We raised [worker concerns] with our senior management team especially in British Airways. … [Their response was to] involve what we call process engineers who came in and decided what type of process needed to be installed. They only wanted the union to implement that process and it was decided by somebody else, not the people who really worked it. The fact is that they paid lip service to, ignored or did not implement any suggestion we made.

… as early as January there was a meeting with the senior management team at which we highlighted our concerns about how the baggage system and everything else would fail, that the process introduced would not work and so on. We highlighted all these concerns, but there was no time to change the whole plan.

... [Workers] had two days of familiarization in a van or were shown slides; they were shown where their lockers were and so on, but there was no training for hands-on work.….
The opening of a new airport terminal is an exercise in dealing with complexity. At Heathrow Terminal 5, new technical systems and new work arrangements had to come together virtually overnight – and changing the date once it has been set would have been difficult and expensive. Hundreds of people were involved, and every glitch in the work system had a tendency to cascade into ever larger problems.

If this sounds familiar, it’s because this scenario has been played out several times in the lives of many of us in software development. Over time, we have learned a lot about handling unforgiving, complex systems, particularly systems that include people interacting with new technology. But every time we encounter messy transition like the one at Heathrow Terminal 5, we wonder if our hard-learned lessons for dealing with complexity couldn’t be spread a bit wider.

Socio-technical Systems
Not very far from Heathrow, the Tavistock Institute of London has spent some decades researching work designs that deal effectively with turbulence and complexity. In the 1950’s and 60’s, renowned scientists such as Eric Trist and Fred Emery documented novel working arrangements that were particularly effective in the coal mines and factories of Great Britain. They found that especially effective work systems were designed (and continually improved) by semi-autonomous work teams of between 10 and 100 people that accepted responsibility for meaningful (end-to-end) tasks. The teams used their knowledge of the work and of high-level objectives to design a system to accomplish the job in a manner that optimized the overall results. Moreover, these teams were much better at managing uncertainty and rapidly adapting to various problems as they were encountered. The researchers found that the most effective work design occurs when the social aspects of the work are balanced with its technical aspects, so they called these balanced work systems Socio-technical systems.

In 1981, Eric Trist published “The Evolution of Socio-technical Systems,” an engaging history of his work. He attributes the “old paradigm” of work design to Max Weber (bureaucracy) and Frederick Taylor (work fragmentation). He proposed that a “new paradigm” would be far more effective for organizations in turbulent, competitive, or rapidly changing situations:[5]
Old Paradigm New Paradigm
The technological imperative Joint optimization [of social & technical systems]
Man as an extension of the machine Man as complementary to the machine
Man as an expendable spare part Man as a resource to be developed
Maximum task breakdown, simple narrow skills Optimum task grouping, multiple broad skills
External controls (supervisors, specialist staffs, procedures) Internal controls (self-regulating subsystems)
Tall organization chart, autocratic style Flat organization chart, participative style
Competition, gamesmanship Collaboration, collegiality
Organization’s purposes only Members’ and society’s purposes also
Alienation Commitment
Low risk taking Innovation

In the 1980’s the socio-technical paradigm gained increased popularity when team work practices from Japan were widely copied in Europe and America. In the 1990’s socio-technical ideas merged with general systems theory, and the term “socio-technical systems” fell into disuse. But the ideas lived on. These days, it is generally accepted that the most effective way to deal with complex or fast-changing situations is by structuring work around semi-autonomous teams that have the leadership and training to respond effectively to any situation the groups are likely to encounter.

The clearest example we have of semi-autonomous work teams are emergency response teams – firefighters, paramedics, emergency room staff. Their job is to respond to challenging, complex, rapidly changing situations, frequently in dangerous surroundings and often with lives at stake. Emergency response teams prepare for these difficult situations by rehearsing their roles, so everyone knows what to do. During a real emergency, that training coupled with the experience of internal leaders enables the teams to respond dynamically and creatively to the emergency as events unfold.

Design Social Systems Along with Technical Systems
Developing a software system that automates a work system is fraught with just about as much danger as moving to a new airport terminal. There are many things we can do to mitigate that risk:
1. Cutover to any new system should be in small increments. Impossible? Don’t give up on increments too quickly – and don’t leave this to “customers” to decide! The technical risk of a big-bang cut-over is immense. And it’s almost always easier to divide the system in some way to facilitate incremental deployment than it is to deal with the virtually guaranteed chaos of a big-bang cutover.

2. Simplify before you automate. Never automate a work process until the work teams have devised as simple a work process as they possibly can. Automating the right thing is at least as important as automating it right.

3. Do not freeze work design into code! Leave as much work design as possible for work teams to determine and modify. If that is not possible, make sure that the people who will live with the new system are involved in the design of their work.

4. Rehearse! Don’t just test the technical part, include the people who will use the new system in end-to-end rehearsals. Be prepared to adapt the technical system to the social system and to refine the social system as well. Be sure everyone knows what to do; be sure that the new work design makes sense. Leave time to adjust and adapt. Don’t cut this part short.

5. Organize to manage complexity. Structure work around work teams that can adapt to changing situations, especially if the environment is complex, could change rapidly, or is mission critical. At minimum, have emergency response teams on hand when the new system goes live.
Much of the software we write ends up having an impact on the lives of other people; in other words, our work creates changes in social systems. We would do well to consider those social systems as we develop the technical systems. If we want to create systems that are truly successful, the technical and social aspects of our systems must be designed together and kept in balance.
[1] “The opening of Heathrow Terminal 5” report to the House of Commons Transportation Committee pages 13-14.

[2] Contrast this with the absence of the BAA management team that oversaw the on-time, on-budget construction of Heathrow Terminal 5; they were replaced after a 2006 takeover of BAA by the Spanish company Ferrovial.

[3] See “The opening of Heathrow Terminal 5” report to the House of Commons Transportation Committee.

[4] “The opening of Heathrow Terminal 5” report to the House of Commons Transportation Committee pages 22-25.

[5] From "The Evolution of Socio-technical Systems" by Eric Trist, 1981. p 42.