The most dangerous risks on a big project aren’t the ones inside your plan. They’re the ones that travel down the wires connecting your plan to everybody else’s.
In mid-June, the US aviation network had one of those weeks. Severe weather hit a handful of major hubs, air traffic controls kicked in to manage the congestion, and the whole interconnected system began to fold. Thousands of flights cancelled or delayed across dozens of airports, over several days. And the cruel mechanic of it is this: a storm over New York grounds a plane that was supposed to fly out of Denver hours later, which strands a passenger in Las Vegas who never saw a cloud all day. The failure didn’t stay where it started. It rode the network outward.
That’s the thing about hub-and-spoke systems, and it’s the thing about most modern projects too. We build them efficient and tightly coupled – every part depending on every other part, slack squeezed out in the name of cost – and we forget that efficiency and fragility are often the same design viewed from two angles. The system that wastes nothing has nothing spare to absorb a shock. So when one node fails, the failure doesn’t stop. It propagates.
I’ve watched this happen on programmes that looked nothing like an airline. A delay in one workstream that everyone assumed was contained, quietly starving three other workstreams downstream that depended on its output. A single late supplier turning into a cascade because four teams had all built their plans on that supplier being on time. The original problem was small and local. The damage was large and everywhere, because the system had no give in it.
The lesson airlines keep relearning, and the rest of us should steal, is that in a tightly connected system you cannot just manage each part. You have to manage the connections. Where does this workstream hand off to that one? What happens to everyone downstream if this piece is late? Which single node, if it fails, takes half the programme with it? The map of dependencies is as important as the plan itself, and most projects barely draw it.
And the practical defence is deliberately un-lean: slack. Buffer. Redundancy in the places that matter most. The instinct, always, is to strip these out because they look like waste on a spreadsheet – an idle day, a spare supplier, a bit of float in the schedule that nobody is “using.” But that slack is not waste. It is the shock absorber. It’s the difference between a local problem staying local and a local problem becoming everyone’s problem. The leanest possible system is also the one most likely to cascade.
There’s a recovery lesson in here too, and the airlines demonstrate both sides of it. The carriers that recover fastest from a meltdown aren’t the ones who avoided the storm – nobody avoids the storm. They’re the ones with a rehearsed plan for reordering the chaos: how to re-accommodate, what to prioritise, who decides. The meltdown is a given. The recovery is a choice you make in advance.
So when you next design a project, don’t only ask whether each part is sound. Ask where the system is tightly coupled, where one failure could travel, and whether you’ve left any slack in the places a shock would hit. Efficiency is wonderful, right up until the weather turns. Then the only thing that saves you is the spare capacity everyone told you to cut.
Leave a Reply