Imagine a team at full tilt—every consultant crafting AI models, every marketer drafting campaigns, every engineer designing solutions. The project backlog is packed, deadlines loom, and leadership pushes to maximize productivity. No one’s idle; every resource is fully utilized. It’s a familiar scene across knowledge work, from tech startups to consulting firms, and it feels like the pinnacle of efficiency.
We’ve been conditioned to see this as success. Busy schedules signal hard work, and idle time feels like waste. The goal? Squeeze every ounce of output from our teams and systems. But here’s the paradox: Running at 100% capacity often makes you slower, not faster.
This isn’t just an observation—it’s a principle rooted in decades of research, particularly queuing theory, the science of waiting lines. In this post, I’ll explain why chasing maximum utilization introduces delays that harm your business, and how managing work-in-progress can unlock speed, quality, and value instead.
The Science of Slowdowns: Queuing Theory Explained
At its core, queuing theory studies how work flows through a system—whether it’s calls at a help desk, tasks in a project pipeline, or customers at a checkout. Its key insight is stark: As utilization approaches 100%, lead times or wait times don’t just grow—they skyrocket exponentially. This relationship is illustrated here:

What does this mean? At 50% utilization, work moves smoothly—tasks queue briefly, if at all. At 80%, delays become noticeable. Push to 95% or 100%, and the system clogs. Tasks pile up, waiting for someone to free up, and completion times stretch far beyond what you’d expect.
Why the exponential jump? The answer lies in the mathematics of accumulated variability.
Imagine a coffee shop with one barista who makes exactly one drink per minute. When the shop operates at 100% capacity (60 customers/hour), the barista has zero spare capacity. Now consider what happens with natural variability:
When 3 customers arrive simultaneously, the first gets served immediately, the second waits 1 minute, and the third waits 2 minutes.
What about the quiet periods? If no customers arrive for 3 minutes, the barista simply waits. This idle time cannot be “saved” or “banked” to serve future customers faster.
This asymmetry is crucial: The barista can fall behind but can never get ahead. They cannot make drinks any faster than one per minute, even during quiet periods. Consequently:
- Idle time during quiet periods doesn’t help clear existing backlogs faster
- Each new clump of arrivals adds to any existing backlog
- The backlog can only diminish during periods when arrivals temporarily dip below capacity
At 80% utilization, these recovery periods occur frequently enough to consistently clear backlogs. But as utilization approaches 100%, such recovery periods become increasingly rare. The system spends more time in a backlogged state, with each new random cluster of arrivals extending wait times further.
The mathematics of queuing theory confirms this: as utilization (ρ) approaches 100%, the average waiting time is proportional to ρ/(1-ρ). This fraction explodes as ρ nears 1:
- At 50% utilization: 0.5/(1-0.5) = 1
- At 80% utilization: 0.8/(1-0.8) = 4
- At 95% utilization: 0.95/(1-0.95) = 19
- At 99% utilization: 0.99/(1-0.99) = 99
This is why in knowledge work—whether designing an algorithm, drafting a marketing plan, or engineering a prototype—pushing teams to 100% utilization doesn’t just make things a little slower; it mathematically guarantees exponentially growing backlogs that cannot be resolved.
Optimal Utilization
But does this mean we should aim for the lowest utilization possible, where lead times are shortest? That doesn’t sound practical, does it? You’re right—it isn’t. Low utilization means idle capacity, and idle resources cost money without producing value. We need to balance two opposing forces: the cost of delay (longer lead times) and the cost of idle capacity (unused resources).
Let’s break this down visually:

Here, the cost of idle capacity decreases as utilization rises—you’re “wasting” fewer resources. But the cost of delay increases exponentially as utilization nears 100%, due to those skyrocketing lead times. The most interesting part is the sum of these costs:

This U-shaped curve shows there’s an economically optimal utilization—neither 0% nor 100%, but typically somewhere between 70-85%. The good news? This curve has a flat bottom, meaning a few percentage points here or there don’t drastically change the total cost. But instinctively, we often lean toward higher utilization, underestimating delay costs.
So, how much is shorter lead time worth? The answer depends on your context. Consider a product owner at a startup: a competitor launches a buzzworthy feature, and you’re racing to match it. Would keeping a lightly utilized team ready for rapid response be justified?
Now, let’s look at high-risk environments where the cost of delay is extraordinarily high. Take a nuclear power plant, where my sister works—such facilities deliberately maintain utilization at 50-75% through strict regulations on shift lengths and built-in redundancy. Overloading personnel might seem efficient in the short term, but it leads to burnout, slower decision-making, and a greater chance of catastrophic failures, which are far more expensive than maintaining lower but sustainable workloads. Similarly, consider armies: we value their slack, keeping them ready for rare but vital action, not finding busywork during peacetime.
This high cost of delay shifts the optimal utilization lower, as shown here:

In these scenarios, the steeper delay cost curve—reflecting the severe consequences of delays—pushes the economically optimal utilization closer to 50-75%, ensuring staff remain alert and capable of responding to emergencies. For critical work—say, safety-critical engineering or urgent client deliverables—optimal utilization might drop to 60-70%. For less urgent tasks, like internal tools, it could stay higher, around 80%. In knowledge work, while we don’t face such extremes, we still underestimate delay costs, often overloading teams unnecessarily.
The Hidden Cost of 100% Team Utilization
These delays aren’t mere inconveniences—they ripple through your organization, eroding efficiency and value. Here’s how pushing for 100% utilization backfires:
- Delayed Delivery: In AI consulting, a model might sit untested while the team juggles other clients. In marketing, a campaign launch slips as revisions stack up. Across fields, cycle times—the time from start to finish—balloon, delaying value to customers or stakeholders. Context switching, a known productivity killer (studies peg it at 10-20 minutes per switch), compounds this, as teams juggle multiple tasks, losing focus and time with each shift.
- Quality Erosion: Overloaded teams cut corners. An engineer skips a design review; a marketer rushes copy without testing. Feedback loops stretch—think of an AI expert revisiting a model weeks later, its details foggy. Quality suffers, and rework mounts.
- Lost Opportunities: Slow delivery means missed market windows. A competitor beats you to a trend, or a client grows impatient. The financial hit of waiting—the “cost of delay”—often dwarfs the cost of idle time.
- Stifled Innovation: Knowledge work thrives on creativity—brainstorming a new strategy, experimenting with a prototype. At 100% capacity, there’s no room for this. Teams churn out predictable output instead of breakthrough ideas.
- Team Fatigue: Constant busyness burns people out. Morale dips, turnover rises, and productivity takes a long-term hit.
A Smarter Approach
So, how do you act on this without stumbling into a management trap? Are you going to go to your boss and say “I think we should run at 75% capacity—25% of our team should twiddle their thumbs and watch cartoons next year.” Their response might be, “Great! If you can do the same work with 75% of the people, let’s reassign the other 25% and be even more efficient.” That conversation rarely ends well.
Instead, focus on queue length—how much work is in progress—rather than how busy people are. The relationship shown in our graph isn’t just theoretical; it’s the key to understanding why WIP (work-in-progress) limits are so powerful:

This graph illustrates a fundamental queuing theory relationship: L = ρ/(1-ρ). In practical terms, this means queue length (L) explodes as utilization (ρ) approaches 100%. The insight that makes WIP limits effective is that this relationship works in both directions. By capping queue length, you indirectly prevent utilization from creeping into the danger zone.
Here’s why controlling WIP works better than targeting utilization:
- Direct Visibility: You can see 27 tickets in Jira or 14 campaigns on your Kanban board. But do you really know if your team is at a 92% utilization rate? In knowledge work, utilization is notoriously difficult to measure accurately.
- Practical Control: You can decide tomorrow not to start any new projects until three current ones finish. But can you precisely adjust how utilized your team is? Between meetings, context switching, varying task complexity, and creative problem-solving, utilization remains elusive.
- Psychological Reality: Telling a team “don’t be too busy” while they face mounting pressure from stakeholders creates a contradiction. Limiting WIP gives teams permission to focus deeply rather than juggling multiple priorities.
- Mathematical Advantage: Little’s Law (Cycle Time = WIP ÷ Throughput) shows that WIP directly controls cycle time. If your team consistently completes 10 tasks per week, reducing WIP from 40 to 20 will mathematically cut delivery times in half—without needing to measure or control utilization directly.
Now, picture this pitch to your boss: “I’ve noticed we’ve got too many projects in flight—say, 40 active tasks across the team. If we cut that to 20 over the next six months, we could halve our delivery times. Faster delivery means happier clients and more bandwidth for new work, all with the same headcount.” That’s not about idleness—it’s about focus and speed, backed by math.
For example, if your marketing team juggles 20 campaigns, trim it to 10. An engineering crew with 30 active designs? Cap it at 15. Smaller queues mean quicker finishes, less waiting, and happier teams—all without ever needing to explain why 100% utilization is actually your worst-case scenario.
Putting It Into Practice: Steps to Break the 100% Habit
Ready to escape the utilization trap? Here’s how to implement queue-length control across knowledge work:
- Map Your Workflow: Visualize tasks—whether on a Kanban board or spreadsheet—from start to finish.
- Measure Queues: Count items in progress now. Track cycle times as a baseline.
- Set WIP Limits: Start conservative—say, 1-2 tasks per person—and adjust. Aim to halve your current queue over months.
- Prioritize Finishing: Focus on completing tasks before starting new ones. Blockers and urgent fixes take precedence.
- Break Down Work: Smaller tasks flow faster, shortening queues and feedback loops.
- Monitor Impact: Watch cycle times drop, quality rise, and stress ease. Tweak as data shows.
Efficiency Reimagined
The 100% utilization paradox isn’t a fluke—it’s math. Pushing teams to their limit clogs the system, delays value, and costs more than it saves. True efficiency isn’t about constant busyness; it’s about finding the economically optimal utilization and ensuring steady, predictable flow through queue management.
Next time you’re tempted to pile on more work, glance at those queuing curves. Ask: Are we speeding up or slowing down? Managing queues over utilization isn’t just smart—it’s a competitive edge. Your team, your clients, and your bottom line will feel the difference.
Note: If this resonated with you, check out The Principles of Product Development Flow by Donald Reinertsen. It’s packed with insights on optimizing work systems—queue-length control is just the start!