A map for stuck organisations, mostly

Organisations, equal times making it possible to just build cool shit with cool people, and at the same time they can make it damn near impossible. Supposedly there's an art to developing them, but no amount of post-its can paper over the blank stares left after scheduling yet another alignment meeting.

However, due to a somewhat semi-succesfull history of not completely messing everything up, with reservations, I'm finding myself in roles where it falls on me to actually fix those things. I also believe that taking the right questions so seriously it feels silly is one way to start bringing a fuzzy solution into view, so I set out to collect some of those right questions into a reference document for myself.

But a note and a disclaimer: This reference below was written in several long conversations with Claude. I supplied the three broad categories while trying to suss out the ways in which the tools where rhyming with each other, but cross-cutting concerns are easy to find so view them more as lenses rather than truth. The tools themselves are a result of many years of hard thinking by their authors and the summaries carry crude approximations of that thinking.

At this point, while the AI voice is as telling and jarring and still as verbose as ever (though a revision through Claude Fable managed to tighten things up surprisingly, many things still need tightening), I've referenced and read through the whole thing several times, and I have enough of the books mentioned at home to be fairly confident in those sections. That would mainly be SPC, Rumelt, Team Topologies, the Trusted Advisor, the coaching section. The parts that were more unfamiliar to me was Immunity to Change, Polarity Management, and Schein (which I came to via exploring if Stan Slap's Under The Hood had enough depth for an inclusion), so take those with a greater grain of salt. Or perhaps I'm more willing to gloss over errors in the parts I know better too, so take it all with the salt ya' know.

I've been revising paragraphs in order of being particularly egregious. The overall flow reads fine (to me) now, but I'd like to go over the closing sections a bit more. Don't be surprised if the page changes, especially as I get more and more annoyed with the sloppy text, and unless I feel the need to share another reference, I don't imagine I'll post more non-PV words on the site overall. It has however become useful enough that I've wanted to share parts with other people, and sending a link is easier than a full clipboard.

If you're curious on how I've actually used it, beyond acting as a manual reference for my brain, I found it useful to get over something like an "empty page" effect when it came to start drafting diagnosis. I have daily notes with my own meeting notes, observations, and notes on workshop outcomes and the like that can act as raw material. Combining that context with the below as a somewhat extreme prompt to find different ways of looking at the challenges was quite helpful, and helped me towards sharpening the questions I was asking. Pretty sure there's better ways of doing it, and the richness of what comes out still depends (a lot) on what you're able to put in yourself.

Now, to get back from doing the meta-work and actually start doing the work...

Introduction

This is a reference collection of toolkits for organisational problems.

You can read the chapters in any order, land in whichever one fits the problem in front of you, and ignore the rest. The chapters share a structure so that they are browsable: each tool gets the same treatment, in the same order, every time.

Frameworks are judgement

A word on what a collection like this can do, before the tools start.

The frameworks here are not the expertise. They are scaffolding for it. Knowing which mode a stuck situation calls for, reading early in a meeting that the blocker is trust rather than the incentive on the table, catching the sense that a diagnosis is about to fail before you could say why, is tacit expertise, and it does not come from frameworks.

It comes from running many real situations and getting fast, honest feedback on whether you read them right. What a framework adds is a structured way to make your reasoning explicit enough that the feedback, when it arrives, lands on something specific and corrects it.

So this is a reference for the thinking, not a manual for the doing. Each chapter can show you what a tool is, what it is for, and what it looks like applied well and badly. None of them can hand you the judgement about when to reach for it, how far to trust it, or when to set it down, because that judgement is the thing the doing builds and the reading cannot.

The closing chapter returns to this directly, but don't mistake fluency in the tools for the judgement that uses them.

The core argument

A stuck situation in an organisation is rarely a single problem. It is usually a structural problem, a human problem, and a problem of keeping the thing running, all braided together, and one mistake is to see only the strand you are most practised at seeing. This collection treats the three as lenses you move between, not bins that a problem sorts into.

The diagnostic mode asks what is actually true before acting. Seen this way, the product manager who will not commit to dates is often responding to a system that punishes wrong dates more than late ones, and fixing the system changes the behaviour.

The developmental mode asks what is going on between and inside the people. Two engineers who have burned trust with each other, a new tech lead who has not learned to delegate, a team carrying unprocessed strain from a layoff. Here the system is fine, or fine enough, and the work is with humans as humans. This mode draws on a lineage in psychology, therapy, adult development, and organisational culture.

The operating mode asks how to run the organisation while it changes. The true answer and the achievable answer diverge, the team has to ship next week regardless of what the diagnosis says, and the job is to integrate diagnosis and repair under time pressure without stopping the line.

The diagnostic mode is defined by a stance: make implicit reasoning explicit so it can be examined. The developmental mode is defined by its subject: humans as humans, whether the unit is a single person, a pair, or the culture of a whole group. The operating mode is defined by its context of use: you cannot stop, so you integrate under constraint.

Most stuck situations contain all three at once. The skill is identifying which mode the current blocker calls for. A perfect Current Reality Tree will not repair broken trust. A well-run difficult conversation will not fix a broken incentive. And both fail without the operating judgement about when to do which, with what energy, and at what cost to the organisation's ability to keep functioning.

The shared spine

Before the differences, if you take the tools and strip away the notation several reduce to a single loop: decide what you want, choose a method, act, and then check what actually happened against what you predicted.

The clearest instances are the genuine improvement loops. Deming's PDSA is plan-with-a-prediction, do, study the gap, act on what you learned. The Improvement Kata sets a target condition, runs an experiment with a prediction, and goes to see the result. The Transition Tree from the Thinking Processes names a current state, an action, and an expected new state. These are the same loop in different notation, and the load-bearing element in all of them is the same: the prediction, and the disciplined comparison of prediction to result.

A looser cousin is three-part without closing the loop. Rumelt's kernel is diagnosis, guiding policy, and coherent action, a decision structure with no feedback term. It rhymes with the improvement loop and borrows some of its discipline, but it is not the same object: the loop is about learning by experiment, the kernel about committing to a direction. There is a important distinction there, a written prediction which you can be wrong about, which the practitioners also highlight as a key factor in the success of the methods.

THE SHARED SPINE CLOSED LOOP the gap = learning PREDICT ACT CHECK LEARN PDSA · Improvement Kata · Transition Tree OPEN STRUCTURE DIAGNOSIS GUIDING POLICY COHERENT ACTION revisit, but no prediction to be wrong about Only the loop that closes on a written prediction learns; the rest commit and revisit.

This matters for two reasons. First, it is a fast sanity check on any plan. Can you say what you want, how you will get there, and how you would know whether it worked? Teams often answer the first two and never the third, which is where much of the value sits. Second, it tells you when you do not need a heavy tool at all. If three questions on a whiteboard would do, three questions on a whiteboard are the right tool, and reaching for a Current Reality Tree is ceremony. The elaborate tools earn their weight only when the questions are genuinely hard, when causation is tangled, the goal is contested, or the path is unclear.

Choosing a mode

A rough decision procedure when you are staring at a stuck situation.

The principle under the procedure is a split between analysis and action. Because the modes are sorted on different axes, a problem is never in one of them; as the core argument said, most stuck situations contain all three at once. The analysis therefore decomposes into all three. The action does not: you work on whichever mode holds the binding constraint right now and let the other two wait their turn. That is the move the steps below are really making.

Sort the problem before picking the tool. Cynefin, covered in the diagnostic chapter, is the explicit version of this, but even a rough mental sort helps. Is this Complicated and analysable, Complex and requiring you to probe, or Clear and just needing the obvious thing done. Most stuck situations in mid-sized engineering organisations are Complicated with Complex elements, and the trap is treating them as purely one or the other.

Then ask what is actually blocking you right now. "I do not understand why this keeps happening" is diagnostic work. "I understand it and we still cannot move" is developmental if the blocker sits between specific people, operating if it is about the organisation's ability to focus. "The team is exhausted or scared or grieving" is developmental and operating, and a Current Reality Tree there will feel violent. "The decision has to happen by Friday" is operating: a premortem, then the call. The closing chapter compresses this routing into a reference table, symptom by symptom, with the specific tool to reach for first.

Watch for the failure modes of your own preferred mode. Diagnostic-default people, often ICs and consultants, reach for diagrams when the actual blocker is trust or fear. The diagram looks rigorous, the room goes quiet, and nothing moves. Developmental-default people, often coaches and HR-adjacent practitioners, reach for difficult conversations when the actual blocker is a broken incentive. People talk earnestly and the system keeps producing the same behaviour. Operating-default people, often managers, reach for rhythm and structure when the actual blocker is that the strategy is wrong. The organisation executes crisply on the wrong thing.

Move between modes. Sort the problem, use a Cloud to surface the conflict and find an injection, discover the conflict is really about identity or trust and switch to developmental work for that piece of the puzzle, return to diagnostic work for the structural follow-through, then operate the change while the organisation keeps shipping.

The closing section tries to further develop this argument with some worked examples.

A note on evidence across the three modes

It is also tempting to read the modes and rank them in your head: diagnostic tools rigorous, developmental tools soft, operating tools somewhere between. That reading is dangerous and quietly distorts how much weight you put on each tool.

The truth is that evidentiary status varies within every mode at least as much as it varies between them. Deming's statistical core is about as well-established as anything in this collection; the management philosophy built on top of it is a reasoned position, not a proven one. Wardley Mapping rests on practitioner face-validity and a community of practice, genuinely useful, but no better tested than the adult-development theory behind Immunity to Change, which sits in the contested developmental mode. The Theory of Constraints has a solid operations core and a general method that works in skilled hands but has a thin formal evidence base. Coaching, often treated as the softest thing here, actually has meta-analytic support showing it improves goal attainment, with the twist that the effect comes from the quality of the relationship and the questions rather than from any particular framework.

So each tool carries its own epistemic-status note, and you should read those notes rather than the mode label to decide how much to trust a tool. The developmental chapter is the most openly contested, but openly contested is not the same as least grounded.

Diagnostic Tools

These make implicit reasoning explicit so it can be examined, contested, and corrected. The chapter covers six at depth: Cynefin, Goldratt's Thinking Processes, Statistical Process Control, Systems Dynamics and its archetypes, Rumelt's strategy kernel, and Wardley Mapping, plus a section on lightweight structuring tools.

The shape of question they answer: what is actually going on here, where is the leverage, what assumption is keeping us stuck, what kind of problem is this.

The shape of problem they are ill-suited to: anything where the blocker is emotional, relational, developmental, or political. Anything where causation is not stable enough to analyse. Anything where you have to act before you can analyse.

Cynefin

Dave Snowden's framework, developed at IBM in the late 1990s and refined since, is a sense-making tool and somewhat of a sorting device (though the authors would disagree).

The name is Welsh for a “Place of Your Multiple Belongings” and pronounced something like kuh-nev-in. Like the name hints at, is based on trying to figure out exactly what kind of problem space your challenge belongs to: Clear, Complicated, Complex, Chaotic, and use that to decide on the approach to resolve it. The main point is that applying the wrong tools for the situation can produce confident and well-executed plans, but also harmful action.

Epistemic status. Cynefin is a conceptual framework rather than an empirical theory, and Snowden revises it regularly and the names of the domains have changed over time, so it is best treated as a useful sense-making lens rather than a validated model. The distinction between Complicated, where analysis works, and Complex, where you must probe, is widely found useful. The finer-grained claims and the more elaborate versions are more contested, and the framework is frequently misused as a tidy two-by-two, which Snowden himself rejects. The constraint mechanisms are drawn from Alicia Juarrero's work on how constraints create coherence in complex systems

The core ideas

Cynefin distinguishes between five domains that call for a different approach. These sit above a distinction of constraints, which is trying to describe the limits to how the situation can move, rules, habits, physical or human laws, and the domains try to reason about how tightly these constraints bind the situation, and if they stay put or shift as you act.

The domains are:

Clear, where constraints are rigid and fixed, cause and effect are obvious to everyone, which calls for sense, categorise, respond, applying best practice. For example uncovering a known incident and applying a runbook for it, the domain where best practices are the way to go.

Complicated, where constraints are firm but they take expertise to work within, calls for sense, analyse, respond, applying good practice. The cause and effect can be mapped and don't shift over time, but they are not obvious at the start. Create a pilot for the intervention and execute on it.

Complex, where the constraints are loose and shift as you act, while cause and effect can only be seen in retrospect because the system has responded to your interventions. It calls for probe, sense, respond, running small safe-to-fail experiments and amplifying what works. Culture change and novel product development sit here.

Chaotic, where the constraints are gone along with cause and effect, the situation is unstable and calls for act, sense, respond, stabilising first and analysing later. Without any stabilising constraints the situation has nothing to hold on to. For example an in-progress major outage.

Confused is not knowing which of the domain you are in. Involuntary confusion is the dangerous default, you don't understand the situation and might not even realise it, old habits or biases could be miscategorising it for you. But there's also deliberate confusion, where you try to hold on the confusion on purpose in order to induce creative thinking and making a move to the right domain.

The boundary between Complicated and Complex is important. A Complicated problem yields to expert analysis. A Complex one does not, because the system adapts to your analysis. This means that in the Complex domain the right move is probes rather than pilots. A pilot assumes you know the right intervention and are testing scale. A probe assumes you do not yet know, so you run several small ones at once and amplify what produces good signals.

Additionally there is a boundary between Clear and Chaotic. Having turned a problem from another domain into a Clear problem means that it is easier to act on, through adding constraints in the form of rules, standardization or processes. But the better a reliable runbook has worked, the more it can hide if assumptions underneath have slowly shifted, up until the point the problem space has shifted completely and the solution can even be actively harmful.

How to use it

Make sense of the situation before reaching for tools. The first move on any non-trivial problem is working out which domain it is actually in, from how the system is behaving. Resist the pull to decide the domain and then recruit the evidence to confirm it.

For Clear situations, just apply best practice without over-thinking it. For Complicated situations, bring in the diagnostic tools. For Complex situations, design multiple small safe-to-fail probes and accept that what works may be partly opaque. For Chaotic situations, stabilise first, then move toward Complex or Complicated once the crisis passes. When you genuinely cannot tell, do not pretend you can: name the confusion, and if it will not resolve on its own, use it deliberately, breaking the problem down until its parts fall into domains you can act in.

Watch for category errors. Analysing while the building burns. Designing a elaborate eleven step solution to organisational culture. Over-thinking what should be a runbook. And the subtler one, leaving a problem in Clear long after the ground under it has moved.

What well-applied Cynefin looks like

The response visibly matched the domain rather than defaulting to one move: a checklist where it was Clear, probes where it was Complex, stabilise-first where it was Chaotic, instead of the same approach applied to all of them.

Anti-patterns

Drawing Cynefin as a two-by-two with axes of knowable and ordered. The framework treats the order and constrained versus unordered as a real distinction but the domains have qualitatively different structures, not positions on a continuum, and this remains the most common misuse.

Categorising rather than sense-making, deciding the domain from habit and forcing the evidence to fit. This is how a confident team walks itself off the Clear-to-Chaotic cliff.

Sorting without acting, identifying the domain and then doing the same thing regardless.

Permanent domain assignment, treating a situation as forever Complicated or Complex when domains shift.

Complexity-claiming to avoid hard analysis, or its opposite, grinding through analysis on a problem whose adaptive nature makes it pointless.

Remaining in involuntary confusion, treating not-knowing as a stable resting place rather than either resolving it or making it deliberate.

When Cynefin is the wrong tool

When introducing the vocabulary on a high-stakes problem to an audience that does not know it adds learning overhead to an already-hard situation. Use the underlying Complicated-versus-Complex distinction in plain language instead.

When the question is strategic rather than diagnostic, since Cynefin tells you what kind of problem you have but not what to do strategically. Rumelt or Wardley fit better there.

The Thinking Processes

Eli Goldratt developed the Theory of Constraints in manufacturing in the 1980s, then generalised its reasoning into a set of logic tools, the Thinking Processes, for problems beyond the factory floor and those are what is covered in this section.

The operational core, constraints and Five Focusing steps that are part of the Theory of Constraints are still interesting, and part of them (e.g. bottlenecks) show up again under Grove.

Epistemic status. The underlying logic, that systems are limited by a small number of constraints and that surfacing hidden assumptions dissolves apparent conflicts, is sound and widely validated in operations contexts. The manufacturing results are real and replicated. The generalisation to organisational and strategic problems is more practitioner craft than established finding: it works well in skilled hands but has a thin formal evidence base outside operations, and the diagrams can manufacture false confidence when the underlying causation is not as stable as the notation implies. Treat the operational core as well-grounded and the general Thinking Processes as a strong but craft-dependent method.

The core ideas

The Thinking Processes are causal diagrams with strict semantics, and they sort onto the three questions in our shared spine. The Current Reality Tree and the Evaporating Cloud answer what to change. An injection and the Future Reality Tree answer what to change to. The Prerequisite Tree and the Transition Tree answer how to cause the change. You rarely need all six.

The Current Reality Tree is a bottom-up causal diagram. You start by listing the Undesirable Effects you observe, symptoms rather than causes, things like releases slipping or incidents recurring or engineers avoiding review. You connect them with if-then arrows, asking why each exists, and where two causes are jointly required you bind them with an ellipse marking an AND relationship. You push downward until the arrows converge on one or two root causes, usually a policy, a measurement, or an assumption rather than a person. The payoff is that fixing eight symptoms often means fixing one core problem, and that core problem is usually a conflict, which is why the Cloud comes next.

The Evaporating Cloud is for that conflict, two options that seem mutually exclusive. It has five boxes: a shared Objective, two Requirements both needed for it, and two Prerequisites, one feeding each Requirement, that cannot both hold. The conflicting pair is often a single lever pulled opposite ways, hold inventory against hold none, centralise against decentralise. The move is to write the assumptions on every arrow, especially the conflict edge between the Prerequisites. Once you find one that is false or merely context-dependent, you have an injection that dissolves the conflict instead of splitting the difference.

THE EVAPORATING CLOUD CONFLICT name the assumption OBJECTIVE REQUIREMENT REQUIREMENT PREREQUISITE PREREQUISITE A B C D D′ Five boxes, one conflict edge: break the assumption on that edge and the dilemma evaporates.

The Future Reality Tree is the mirror of the Current Reality Tree, built top-down from a proposed injection. You trace its effects forward, checking that the Undesirable Effects are replaced by Desirable Effects, and you watch for two things: missing entities, where the injection alone does not produce the effect, and Negative Branches, where it produces a new problem.

The Negative Branch Reservation is a small Future Reality Tree focused on one worry of the form "if we do X, then Y will happen." You draw the chain from X to Y fully, then look for an assumption to break or an injection to add that severs it. It is the formal version of "yes, but," and it makes objections constructive rather than veto-shaped.

The Prerequisite Tree lists every obstacle between you and a chosen change, defines an Intermediate Objective that overcomes each, and sequences those objectives by dependency. The result is a directed graph of milestones rather than a Gantt chart.

The Transition Tree is the most tactical. For each Intermediate Objective it specifies current state, action, expected new state, and rationale. You usually build it only for the next objective or two, since building them all is over-planning. It is also one of the clearest instances of the shared spine from the introduction (current state, action, expected new state is the same loop as PDSA and the Improvement Kata), and it pairs directly with the Kata when you want experimental discipline for walking the sequence.

How to cause the change is not only sequencing. Goldratt treated agreement as a series of layers of resistance: that there is a problem, that this is the problem, that the injection solves it, that it will not cause something worse, that the obstacles can be cleared. The trees double as the instruments for working through those layers. A Negative Branch drawn with the person who raised it, rather than at them, is how a "yes, but" becomes shared ground rather than a veto.

How to use it

Run the Current Reality Tree as a workshop with sticky notes, Undesirable Effects at the top, working downward. The discipline is resisting the urge to jump to solutions before the tree is built.

Use the Cloud whenever a conflict recurs. It is cheap, so build it even when you think you already know the conflict; the value is in the assumptions you did not know you were making. It is also a tool that earns its keep standalone, without the rest of the kit.

When you have an injection, test it forward with a Future Reality Tree, and draw the Negative Branches explicitly rather than waving them away. Sequence the rollout with a Prerequisite Tree, and detail only the next step or two with a Transition Tree.

Keep Undesirable Effects as observable present-tense symptoms rather than causes in disguise, bind jointly-required causes with an AND since the missing AND is the usual structural error, and validate a finished tree by reading it aloud from a root cause upward, because logic that sounds forced is wrong.

The tools chain in practice: the Current Reality Tree finds the core problem, the Cloud surfaces the conflict holding it in place, the injection seeds the Future Reality Tree, Negative Branch Reservations stress-test it, and the Prerequisite and Transition Trees plan the rollout. A stuck retro often needs only a CRT and a Cloud. A strategic shift usually wants the CRT, the Cloud, and the FRT.

What well-applied Thinking Processes looks like

The tree ended in a decision or an injection that changed something, rather than a finished diagram presented and filed.

A Cloud surfaced an assumption no one had written down, and attacking it dissolved a conflict that had kept recurring.

A single change at a root cause improved several of the top-level symptoms at once, which is the convergence proving real rather than asserted.

Anti-patterns

Treating the diagram as the deliverable. The tree is a thinking aid. Don't spend three days perfecting one, present it, and change nothing. The output should be a decision or an injection.

Skipping the Cloud and jumping to a solution because you think you already know the conflict. This is a clear road to miss the real assumption.

Undesirable Effects that are already causes. "We do not have enough staffing" is a hypothesised cause, not a symptom. If your symptoms already encode a theory, the tree just confirms what you walked in believing.

Person-shaped root causes. If the tree bottoms out at "the PM does not communicate well," you stopped too early. There is almost always a policy or incentive underneath.

Dismissing Negative Branches as objections. The correct response to a raised branch is to draw it fully, then either sever it with an injection or accept it as a real cost.

Using the tools to win arguments. The notation looks rigorous, so it is tempting to build a tree that proves your preferred answer. People can sense this, and it makes the tools feel like manipulation.

Methodology over toolkit. Insisting on the full sequence and full notation every time. For most problems a quick Cloud on a whiteboard with three assumptions listed is enough, and ceremony kills adoption.

When the Thinking Processes are the wrong tool

When causation is not stable. The tools assume the causal arrows hold over the time horizon of analysis. In complex situations where the system adapts to your interventions, the tree you built last month is fiction now.

When feedback loops dominate. The notation is fundamentally a directed acyclic graph. The moment a meaningful loop appears, such as burnout reducing output which increases pressure which increases burnout, you are contorting the diagram. Systems dynamics handles this natively.

When the goal is contested. The Cloud requires a shared Objective. If half the room wants growth and half wants sustainability and that is the actual disagreement, the Cloud will paper over it with a fake objective nobody shares.

When the situation is political rather than analytical. The tools assume good-faith truth-seeking. Where the real constraint is that some truth is unsafe to say, the analytical surface looks rigorous while the substance is missing.

Statistical Process Control

Statistical Process Control (SPC) is the original data driven approach and ties it's lineage back to three founders. Walter Shewhart developed the underlying theory in the 1920s to control the quality of mass-produced telephone equipment at Bell Labs. Later W. Edwards Deming carried it into management, teaching it to American war industries in the 1940s and then, from 1950, to Japanese industry which would influence the later Toyota Production System and Six Sigma. On top of the statistics he built a management philosophy, the 14 Points, the System of Profound Knowledge, PDSA and Out of the Crisis.

Finally from the 1980s onward Donald Wheeler refined Shewhart's original framework, working against the academic drift that wanted normality tests and distribution-fitting before charting, and further clarified it's usage through the process behaviour chart and the Voice of the Process.

Epistemic status. The statistical core, common cause versus special cause variation and the control chart, is mathematically rigorous and about as well-established as anything in this collection. The management philosophy built on top of it, the 14 Points and the System of Profound Knowledge, is a reasoned position rather than a proven one, though it has held up well in practice and is broadly consistent with later evidence on motivation and systems. The often-quoted claim that the great majority of problems are systemic rather than individual is well-supported in direction, but the famous proportions (94/6 or 85/15; Deming gave different figures at different times, with no published derivation) are rhetorical estimates, not measured constants. Lean on the direction of the claim, not the decimal.

The core ideas

In any system, from manufacturing lines to the processes that drive modern software companies there will be both common cause variation, the expected noise in a stable system. In contrast Special cause variation is the result of an identifiable exceptional event, and both need a different response. Treating common cause as special cause, by reacting to every blip or SLA firing, makes the system worse. Treating special cause as common cause, by ignoring real signals, misses actionable information.

The main tool used to distinguish between the two is the Control Chart, and it is deceptively simple. You plot measurements over time with limits set three sigma either side of the mean, where that sigma is estimated from the average movement between consecutive points (the moving range). Using the moving range instead of the standard deviation is on purpose (Wheeler has spent decades fighting this particular error), as if the series already contains signal, that signal inflates the standard deviation, the limits widen to swallow it, and the chart hides what you built it to find.

Similarly, three sigma was not chosen because of normal distribution theory. Shewhart chose it empirically: for a very broad class of distributions, the chosen limits give a false-alarm rate of roughly 2–3%, which is good enough for practical decision-making but don't take them as a probability statement. In short, the chart tests for whether the variation comes from one underlying process or more than one no matter what the actual statistical distribution shape is. Business data is almost never normal, but the chart works anyway.

In practice the chart is a pair: an individuals chart for the level, and a moving-range chart beneath it. Wheeler calls it a process behaviour chart rather than a control chart, partly to stop people reading the limits as targets. The limits describe what the process does, not what you want it to do.

THE CONTROL CHART INDIVIDUAL VALUES (X) METRIC UCL +3σ MEAN LCL −3σ COMMON CAUSE: THE SYSTEM BEING ITSELF SPECIAL CAUSE investigate this MOVING RANGE (mR) URL mR̄ 0 the same event, in the range The individuals chart tracks the level; the moving range tracks the consistency. A point past the limits, or a sustained run, is a signal worth chasing.

Points inside the limits are common cause. Points outside, or runs that show a clear trend, are special cause and warrant investigation. A spike on the moving-range chart means the process has become less consistent, even when the level looks unremarkable.

However, this is not a judgement call but a set of rules. The strongest is a single point past a limit line, on either chart. The subtlest is a long run on one side of the mean, eight points in a row, which catches a process that has quietly shifted without any one point ever looking unusual. Between them sits a moderate rule: three of four consecutive points in the outer half of the band. One caveat: these apply to the individuals chart only, except the single-point rule, which the moving-range chart also uses. Successive moving ranges are correlated, so the run rules do not belong there.

Based on the rules, and if they indicate that special cause is present or not, then different actions are called for. If the process is unpredictable (i.e. special cause present) investigate and remove the special causes first. You cannot improve a process that is not yet predictable, because an outside factor you have not identified will interfere with any change you make. If the process is predictable (i.e. only common cause) the process is already running the best it can given its current design. There is nothing to investigate in individual points. The only way to change the output is to fundamentally redesign the process.

Underpinning all of this is Wheeler's framing for translating SPC to business thinking: the Voice of the Process versus the Voice of the Customer. The Voice of the Process is what the process is actually delivering. Control limits are the Voice of the Process. The Voice of the Customer is what you want the process to deliver: targets, specifications, SLAs, error budgets. They are different questions. As Wheeler says:

Comparing numbers to specifications will not lead to the improvement of the process. Specifications are the Voice of the Customer, not the Voice of the Process. The specification approach does not reveal any insights into how the process works. So if you only compare the data to the specifications, then you will be unable to improve the system, and will therefore be left with only the last two ways of meeting your goal (i.e. distorting the system, or distorting the data). When a current value is compared to an arbitrary numerical target, it will always create a temptation to make the data look favourable. And distortion is always easier than working to improve the system.

This is the SPC answer to Goodhart's Law. Where Goodhart observes that a measure becomes a target, SPC provides the discipline of separating the measure from the target and reading them as different things.

Part of the core ideas are also PDSA and The System of Profound Knowledge, Deming's later synthesis of four lenses applied together: appreciation for a system, knowledge of variation, theory of knowledge (we operate on hypotheses that need testing), and psychology (people are not interchangeable parts and motivation is mostly intrinsic).

PDSA, Plan-Do-Study-Act, is the improvement loop, a clear example of the shared spine in this collection. Plan an experiment with a prediction, do it, study the gap between prediction and result, act on what you learned.

Many teams fail at the study step, because they do and then move on without examining the result against the prediction. The written prediction is what converts the loop into knowledge, knowledge here meaning a model good enough to predict the system's behaviour next time. That is the load-bearing element the shared spine names, a prediction you can be wrong about is worth more than a plan you merely execute.

Overall, the repeated claim of the SPC practitioners that the great majority of problems are systemic rather than individual is the same stance the Thinking Processes take, reached through statistics rather than causal logic.

How to use SPC

Define your metrics operationally before charting them. Write down the criterion, test procedure, and decision rule for each metric, and record every change to a definition with a date.

Put each of your two or three leading indicators, the ones that predict the outcomes you care about in Grove's sense, on a chart with a mean and limits rather than a target line. Compute the limits from a stable baseline and lock them, extending them forward rather than recomputing every period, since limits that move with each new point absorb the signals they exist to catch.

When the chart and the detection rules says noise, do nothing, and be able to say why. When it says signal, ask what changed, whether the metric moved up or down, since a surprising good week teaches as much as a bad one and a positive signal left uninvestigated is knowledge left on the floor.

Run the changes you do make as PDSA cycles with a written prediction, so each one adds to your picture of how the inputs move the outputs. That picture is the goal: you are not managing a number but building a model of the system good enough to predict it. Amazon's Weekly Business Review is this run as a rhythm, a deck of charts on the controllable input metrics read every week.

What well-applied SPC looks like

The team makes fewer reactive changes than it used to, and can point to a recent case where the chart led it to leave something alone rather than intervene. People can say "that is just noise" or "that looks like a signal" and mean something specific.

A real signal crossed a limit and got investigated within the week, instead of surfacing later in a retro as something everyone had half-noticed.

A bad metric produced a documented change to the process and a named system cause, with no one's name attached to the number.

Anti-patterns

Reacting to common cause, which produces management whiplash and demoralised teams. The control chart exists to prevent exactly this.

Ignoring special cause, dismissing real signals as noise because they are uncomfortable.

Targets as control limits. A target states what you want; the chart's limits state what the system can currently deliver. Wheeler's names for the two are the Voice of the Customer and the Voice of the Process, and to avoid reading one as the other: setting a round-number goal and then reacting to every point that falls short of it as though the goal described the process. It does not. Under that kind of target pressure people do one of three things, and only one is the one you want: they improve the system, they distort the system, or they distort the data (Joiner's Rule). The chart is what makes improving the system the visible option, because it shows whether the number moved for a reason or merely moved.

PDSA without prediction, running the loop as ritual so it produces no learning.

Statistical thinking applied to individuals. Deming was clear that statistical methods work on systems, not people. Ranking people on a distribution and treating the bottom slice as underperformers, when most of that variation is the system expressed through them.

When SPC is the wrong tool

When the system is not yet stable enough to have control limits, as with brand-new processes or too few data points. You cannot characterise common cause without a baseline.

When the thing that matters is genuinely qualitative. Control-charting everything produces a culture that mistakes measurability for importance.

When the measurement itself is wrong, dirty, or gamed. The statistical machinery assumes the metric is meaningful, and when it is not, the analysis produces precise nonsense.

System Dynamics and the Archetypes

System dynamics, developed by Jay Forrester at MIT in the 1950s and popularised by Donella Meadows and Peter Senge, is a tradition for understanding feedback loops, stocks, flows, and delays. The insight is that organisational dysfunction lives in feedback: not one-way chains of cause and effect but loops, accumulations, and delays that make a behaviour persist long after anyone meant it to. However the accessible doorway is not Forrester's mathematics but Senge's archetypes, a pattern language for recurring feedback structures.

Epistemic status. Formal system dynamics modelling is a rigorous quantitative discipline with a solid track record where the variables are measurable. The archetypes, by contrast, are a qualitative pattern language. They are genuinely useful for recognition and communication but are not predictive in any rigorous sense, and it is easy to over-fit a situation to an archetype that does not quite hold. Treat the modelling as solid and the archetypes as a good vocabulary that aids thinking without proving anything.

The core ideas

Stocks are accumulations such as headcount, technical debt, trust, or burnout. Flows are the rates that change them, such as hiring rate or recovery rate. Conflating the two obscures what is happening. "We have a hiring problem" is ambiguous until you separate the stock of people from the inflow and outflow rates.

A reinforcing loop amplifies change in either direction, so success breeds success and decline accelerates decline. A balancing loop acts to close a gap, like a thermostat correcting towards a setpoint or a market approaching saturation. Real systems contain both, and the dynamic depends on which dominates at a given moment.

Delays are the often-invisible third element. An intervention's effect frequently lags the action by weeks or months, and the lag shapes behaviour: people see no result, push harder, then overshoot when the original action finally lands.

STOCKS, FLOWS & FEEDBACK source sink INFLOW a rate OUTFLOW a rate STOCK a level that accumulates “We have a hiring problem” hides whether the level, the inflow, or the outflow is the issue. FEEDBACK & DELAY R REINFORCING amplifies change B BALANCING seeks equilibrium DELAY effects lag the action A stock is a level; flows are the rates that change it. Loops reinforce or balance; delays make effects lag.

The archetypes are recurring patterns that show up across very different organisations. The ones most useful for engineering work:

Shifting the Burden. A problem has a symptomatic fix and a fundamental fix. The symptomatic fix is faster and relieves the symptom but atrophies the capacity for the fundamental fix, so the organisation becomes dependent on it. Hiring contractors to handle overflow rather than having the staffing conversation is the classic case.

Limits to Growth. Growth produces success which produces more growth until it hits a constraint that was invisible during the growth phase. Pushing harder on what worked makes things worse. Scaling a team by hiring until communication overhead becomes the limit is the engineering version.

Growth and Underinvestment. Demand grows, but the investment in capacity that would meet it lags, because everyone is too busy serving demand to build for it. Service quality slips, which suppresses demand or quietly lowers what counts as acceptable, which makes the investment case look weaker just as it becomes most necessary. The platform team drowning in support tickets, unable to build the self-service that would empty the queue because the queue never empties, is the engineering version, and it is the structure underneath most arguments about technical debt that never get won quarter to quarter.

Tragedy of the Commons. Multiple actors share a finite resource, and each individually rational decision to use more produces a collectively irrational outcome. Every team skipping shared maintenance to protect their own velocity, until the shared infrastructure rots, is the pattern.

Fixes That Fail. A solution to the immediate problem creates consequences that make the original problem worse, often after a delay. Adding process to prevent incidents, which slows shipping, which produces rushed work, which produces more incidents.

Success to the Successful. Two parties with similar capability get different initial investment, and the small early advantage compounds until the gap is enormous. The team that got the prestige project becomes the "strong team" in a self-fulfilling way.

Drifting Goals, also called Eroding Goals. Pressure on a hard-to-meet goal produces gradual relaxation of the goal rather than effort to meet it. SLO targets quietly lowered each time they are missed, until they mean nothing. The damage is not the lower number but that the lower number becomes the new definition of normal.

Escalation. Two parties measure themselves relative to each other and respond to the other's action with more of their own. Two teams in conflict where each defensive move provokes a harder one.

How to use it

Sketch the behaviour before naming the pattern. Draw the thing you care about over time, whether that is incidents, lead time, attrition, or backlog, and ask what shape it makes: growing, oscillating, plateauing, drifting down. The shape is the evidence. An archetype that cannot account for the curve you actually see is decoration, and starting from the curve is the discipline that stops you fitting a situation to a pattern it does not hold.

Then use archetypes as candidate explanations, and model formally only when the stakes are high enough or the feedback structure disputed enough to justify it. For most organisational work, recognising the pattern tells you what to look for and roughly what interventions tend to work.

Name stocks and flows explicitly. "We have a quality problem" is ambiguous. Is the stock of bugs growing, is the inflow high, or is the outflow low? Each has a different intervention.

Look for delays, since they are usually where overshoot and oscillation come from. Naming them prevents the "we did nothing, then we did too much" pattern.

Treat the archetype's usual leverage points as hypotheses, not prescriptions. For Shifting the Burden, protect the fundamental fix and watch for atrophy in the capacity it depends on. For Limits to Growth, stop pushing the growth driver and relieve the constraint instead. For Tragedy of the Commons, change the structure so individual incentives align with the collective good.

Watch for stacked archetypes, since real dysfunction often contains several interacting.

What well-applied systems thinking looks like

An intervention aimed at the loop structure changed the behaviour pattern rather than the latest incident, and the team held off re-intervening long enough to let it land.

Naming the archetype shortened the argument: people stuck debating the latest event recognised the pattern and shifted to talking about the structure producing it.

Anti-patterns

Archetype-spotting as a substitute for diagnosis. The pattern tells you what to look for, not what is specifically going on in your situation.

Stock-flow conflation, talking about rates as if they were quantities.

Ignoring delays and amplifying an action because its effect has not landed yet.

Modelling for its own sake, building elaborate diagrams where archetype recognition would have sufficed.

Treating archetypes as universal, forcing a situation into one that does not fit.

Systems thinking as fatalism, using "it is the system" as an excuse not to act. Senge's whole point is that systems can be redesigned.

When systems dynamics is the wrong tool

When the problem really is event-driven, when a specific decision was wrong and the response is specific. Looking for the system behind every event over-explains and misses the actionable layer.

When the timescale is too short. Systems thinking operates over months and years, so "let me think about the loop structure" is unhelpful for a decision due today.

When the data or history is too thin to be confident an apparent pattern is real. In new situations the archetypes are hypotheses to test, not diagnoses.

Coda: when the work is the stock

Everything above treats stocks, flows, and delays as a lens on organisational behaviour at large. There is one stock where they stop being a lens and start setting the pace of everything else, close enough to touch, and that is the work itself as it moves through a team.

The connecting idea is Little's law: average cycle time equals work in progress divided by throughput. The law itself is only an identity, locking the three quantities together so that none can move without one of the others; it does not on its own promise that cutting work in progress speeds anything up. What makes the cut work is queueing. A team with twelve things half-finished and two shipping a week is not slow because anyone is idle; it is slow because the work is idle, sitting in queues between people while each tends the next thing they started, and queues lengthen sharply as a system fills.

Starting less is how you finish more, and a team that feels productive because everyone is busy is often the clearest instance of the problem, the busyness being inventory accumulating rather than output leaving. Cap the team at three in-flight items, force the fourth to wait, and the first three usually finish faster than anything did before, not because the law commands it but because the queues between them drain.

Two further insights that both deal with delay.

Batch size: A large batch is a long delay built deliberately into the loop while small changes ship sooner, fail smaller, and return feedback faster, which is the whole mechanical case for shipping daily rather than saving a quarter into one release where the failures arrive together and the cause is buried under everything that shipped beside it.

Cost of delay, from Donald Reinertsen, is the discipline of putting a number on the waiting itself: what it costs per week that this is not done, in revenue or in risk. It is a discipline for ordering a backlog honestly, because a feature worth a hundred thousand a month deferred and one worth nothing until a fixed deadline are different decisions that the label "high priority" flattens into a single word.

Rumelt's Good Strategy / Bad Strategy

Richard Rumelt's Good Strategy / Bad Strategy is the diagnostic mode's tool for strategic-altitude work. Its first claim is a test: most strategy documents are goals and slogans in strategy clothing, not strategy at all. That half is widely quoted and easy to agree with. The half that still defeats capable teams is harder, and Rumelt develops it into a method in the later The Crux: even an honest diagnosis usually names the challenge at the wrong altitude. A reliability problem is not addressed by a strategy to improve reliability. It is addressed by discovering that the crux is, say, that you cannot roll back fast enough, and then pointing everything there. The crux is the part of the challenge that is both decisive and addressable, hard enough to matter but open enough to move, and finding it is most of the strategic work and rarely the same as the general statement of the problem.

Epistemic status. This is a structured articulation of expert judgement rather than an empirical theory, drawn from a distinguished strategy scholar's experience. Its diagnostic value, telling real strategy from fluff, is widely found useful and hard to argue with once seen. Its prescriptive power is weaker, since the framework tells you what good strategy contains but not how to generate the insight at its core. Treat it as an excellent quality filter and a clarifying discipline, not as a generator of strategy.

The core ideas

The kernel of good strategy has three parts. Diagnosis names the challenge and identifies which aspects actually matter, simplifying overwhelming complexity by naming the one or two decisive things. Guiding policy is the overall approach to the diagnosed challenge, an approach rather than a goal, ruling things out as much as in. Coherent action is the set of concrete coordinated steps that implement the policy and reinforce each other rather than scatter. (The kernel is a decision structure, not an improvement loop, the introduction's spine distinction.)

The three are interdependent. A guiding policy without diagnosis is arbitrary. A diagnosis without policy is academic. Coherent action without the first two is execution without direction.

Bad strategy has four signatures. Fluff, which is gauzy abstraction that sounds strategic and says nothing. Failure to face the challenge, where the document does not name what is actually hard. Mistaking goals for strategy, where "be the market leader" stands in for how you will get there. And bad strategic objectives, either disconnected from the challenge or a long unprioritised list.

Good strategy concentrates resources and action on the crux. Bad strategy spreads them across a pyramid of sub-strategies and workstreams, so nothing receives enough force to break through. Usually there is one crux; sometimes a few reinforcing ones; in Rumelt's framing, in the hard cases, the single most important thing. What disqualifies a candidate is failing either test: a point that is decisive but immovable is not a crux, and neither is one you can act on that would not change the situation if you did.

How to use it

Audit existing strategy against the kernel. Find the diagnosis, the guiding policy, the coherent action. Usually at least one is missing, and the missing piece is the work.

Write the diagnosis before anything else, and spend disproportionate time on it. A right diagnosis narrows the plausible policies sharply, but a wrong one cannot be saved by any execution downstream.

Use the four signatures as filters when reading or writing strategy.

Identify the crux explicitly, since it is rarely the same as the general statement of the problem.

Test coherence by reading the actions together and asking whether they reinforce each other or merely run in parallel.

What well-applied Rumelt looks like

Someone who was not in the room can read the diagnosis and state the crux, the single hard thing the strategy turns on.

The strategy named what it would not do, and that no held when an attractive off-strategy opportunity appeared, rather than dissolving on contact.

The actions reinforce each other closely enough that dropping one would weaken the rest, and the whole thing bets on a specific diagnosis clearly enough that it could be wrong, which is what separates it from a vision statement a competitor could reuse.

Anti-patterns

Goal-as-strategy, dressing an aspiration in an implementation timeline.

Diagnosis-skipping, jumping to policy or action and addressing the wrong problem confidently.

Fluff-as-vision, the inspirational document a competitor could use verbatim.

No concentration of force, whether by trying to address every issue at once or by leaving nothing out of scope, so resources spread thin and nothing gets enough.

Strategy as document rather than decision, treating the artefact as the work.

When Rumelt is the wrong tool

When the problem is operational rather than strategic, where the framework over-engineers a situation that needs operational diagnosis.

When the strategic situation has not stabilised, as in a genuinely new venture where the diagnosis itself changes weekly, so locking in a strategy is premature commitment.

When political dynamics make acting on the diagnosis impossible. The framework will name the crux correctly even here, and the crux may simply be that a powerful person will not accept the answer. What it cannot do is resolve that, so the honest kernel and the publishable document come apart, and the gap is organisational rather than analytical. The Larson coda below is one low-authority way to work the same wall.

Coda: Larson, and writing the strategy you already have

Rumelt tells you whether what you have is a strategy and what a good one must contain. The kernel does not tell you how to produce the insight at its centre. The Crux offers Rumelt's own answer, the Strategy Foundry, but it is a facilitated workshop that assumes the authority to convene the executives and the licence to name hard things in the room. Will Larson's writing, across An Elegant Puzzle, Staff Engineer, The Engineering Executive's Primer, and Crafting Engineering Strategy, along with a large body of essays on his site, is the bottom-up alternative for people who have neither: a way to generate the strategy from the organisation's own decisions rather than from the room.

It starts from how the organisation already decides. Most have a strategy already, implicit and unwritten, visible only in the pattern of their real decisions, so make them visible: write five honest design documents about decisions the organisation faced, read across them for how those decisions were really made, and that pattern, written down, is the strategy. A vision is the same move once more, five strategies forecast a couple of years out.

The failure it guards against is Rumelt's failure to face the challenge in its most seductive dress, and Larson names it in his own past work: an elegant strategy describing how his organisation wished it made tradeoffs rather than how it did, conceptually pure and useless. Good engineering strategy is therefore boring. It addresses the problems teams already have rather than the brilliant idea you arrived wanting to install, the discipline being to get those ideas out of your head first, write them down somewhere, and set them aside before writing from what the organisation does. The signal that it is time to write one is mundane: the same decision keeps getting relitigated.

The encouraging claim underneath it is that you can be a top-ten-percent engineering strategist simply by documenting your implicit strategy, because so few organisations do, and once it is on paper it can be argued with and improved.

Wardley Mapping

Simon Wardley's technique, consolidated in his book Wardley Maps, is the diagnostic mode's tool for positional and evolutionary analysis. Where Rumelt asks what the strategy is, Wardley asks what the landscape is and where you sit in it, which has to be answered first. Lay out what your users need, place each component your delivery depends on along an evolution axis from genesis through custom-built and product to commodity, and look for the mismatches, the things you are custom-building that the market already treats as a utility, or trying to productise while they are still genesis.

That placement exercise is most of the value. The rest, the Pioneers-Settlers-Town-Planners split, the climate-doctrine-gameplay layering, and the named inertia at evolution boundaries, is depth that sharpens the reading once the map is honest, worth knowing but not worth front-loading. The technique is unusual in being explicitly visual: the map is the artefact, and the discipline is producing one accurate enough to reveal what prose would hide.

Epistemic status. Wardley Mapping is a structured practitioner technique, shared openly and refined by a community, rather than an academically validated method. Its central observations, that components evolve in a predictable direction and that different evolution stages need different methods and people, have strong face validity and are widely found to ring true, but the evolution axis is judgement-based rather than measured, placement is subjective, and the one-directional evolution claim rests on accumulated practitioner conviction rather than tested evidence. Treat it as a useful thinking discipline whose value comes from honest mapping and the conversations it forces, not from any formal proof that its model is correct.

The core ideas

A real map has an anchor, position, and movement. For Wardley Maps the anchor is user need, position is the relationship between components, and movement is evolution. Most business "maps" are actually diagrams, with position but no anchor and no movement, which is why they are less useful.

The two axes are the value chain, vertical, with visible user-facing components at the top and infrastructure at the bottom, and evolution, horizontal, running from genesis through custom-built and product to commodity.

WARDLEY MAP  —  DELIVERY APP VALUE CHAIN GENESIS CUSTOM-BUILT PRODUCT COMMODITY EVOLUTION → the anchor floats here, off the evolution axis a routing API you could buy Customer Track my delivery Dispatch Map data Routing Compute Routing sits at custom-built, alone on the left. The map poses the question: why are we building this?

Components evolve in a predictable direction even if the timing is unpredictable. Genesis becomes custom-built becomes product becomes commodity, one-directionally. This drives most strategic dynamics, because the methods that suit genesis, exploration and experimentation, are wrong for commodity, which wants efficiency and scale, and the reverse.

Climate, doctrine, and gameplay are three layers of strategic thinking. Climate is what you cannot change, such as the evolution of components, which you map and respond to rather than argue with. Doctrine is universal good practice, such as focusing on user needs or using appropriate methods for the evolution stage. Gameplay is context-specific moves that depend on the particular map.

Pioneers, Settlers, and Town Planners are the structural insight about people. Different parts of the value chain need different kinds of people and methods. Pioneers handle the uncertain genesis end, Town Planners handle the commodity end with efficiency and scale, and Settlers handle the middle, productising the pioneers' work for town planners to scale. The relationship is a cycle rather than a ladder: settlers take work off the pioneers and force them on to the next novel thing, town planners take work off the settlers and force them forward in turn, which is what keeps a static role-matching from being the point. Much dysfunction comes from mismatched assignment.

Inertia is what happens when a component evolves but the organisation does not. It is predictable, showing up at evolution boundaries, most visibly at the move from product to commodity where incumbents resist losing margin, but also wherever an existing practice, budget, or governance model was built for the stage the component is leaving.

How to use it

Start with user needs and work downward, since the map is anchored to what the user actually needs rather than to what the organisation wants to build.

Place each component on the evolution axis by its ubiquity and how well it is understood, not by how strategically it feels. This is where the work is, and where the common error lives, since organisations flatter themselves by placing commodity components closer to genesis than they really are.

Look for evolution mismatches. Custom-building what should be commodity produces cost overruns and no differentiation. Trying to productise what is still genesis produces premature standardisation.

Use the Pioneer, Settler, Town Planner distinction in organisational design, matching the kind of work to the kind of person and allowing handoffs between them.

Update the map when the landscape changes, since a map built once and filed becomes wrong over time.

What well-applied Wardley Mapping looks like

A specific decision changed because of the map: custom-building stopped on something the map showed commoditising, or an investment moved ahead of a component's shift, and the decision and the placement behind it can both be named.

Across two versions of the map, components have actually moved, and at least one earlier placement was corrected once reality contradicted it.

An evolution mismatch the map surfaced, commodity being custom-built or genesis being productised, got acted on rather than just noted.

Anti-patterns

Diagrams without movement, the failure to watch for first, producing a static picture of the present that hides the dynamics that matter.

Misplacement on the evolution axis, especially placing components closer to genesis to flatter a sense of innovation.

Mapping as performance, producing impressive maps that change no decision.

Treating Pioneers, Settlers, and Town Planners as a career ladder rather than as types of work and people.

Apparatus over thinking, reaching for Wardley doctrine and vocabulary in place of the actual placement-and-movement work, which both ignores the specific gameplay the map suggests and produces fluent-sounding nonsense.

When Wardley Mapping is the wrong tool

When the question is operational rather than strategic, where mapping is overkill.

When the team cannot name a user, a need, or a dependency even provisionally. A genuinely new market is not the obstacle here; everything simply sits in genesis, and the map is honest about that. The tool only fails when there is nothing to anchor to, and when that happens the missing anchor is usually the first thing worth arguing about rather than a reason to put the map away.

When identifying the user is itself contested, as with some internal infrastructure or organisational-change work. Internal infrastructure has internal users, the on-call engineer or the team consuming a platform, and the map can be built once you decide whose need it serves. The exercise loses sharpness only where the organisation will not make that decision, which is a finding, not a limitation of the tool.

When the map is the wrong artefact for the audience. The map is where the analysis happens; it is not always what you present. An audience that needs an executive summary or a project plan should get one, drawn from the map rather than instead of it.

Coda: drawing your first map

The core ideas describe what a map is, but now how to actually build one and Wardley's own teaching example, a tea shop, is one way to start.

Start from the user and one need: a customer who wants a cup of tea. Then ask what that need depends on, and what those things depend on in turn, drawing a link each time. The cup of tea needs hot water, the hot water needs a kettle, the kettle needs power. That descending chain of dependencies, anchored to the user at the top, is the vertical axis; you have not placed anything on evolution yet.

Then place each component left to right by how widespread and how well understood it is. Power is a utility, so it sits hard right at commodity. A kettle is a standard product. Hot water is closer to commodity than it looks. The cup of tea, as the thing this particular shop competes on, might be a product, or might be edging toward something more bespoke if the shop is differentiating on it. The placement is where the argument starts, and the argument is the point.

If a component is hard to place, that is usually the signal that it is really several components, and to breaking it apart rather than to guess. Try to map the part of the landscape that decisions actually turns on, mapping the whole business produces a tangle no one reads.

After making the placements its time to read the map. Look for the mismatches first: anything custom-built that the right-hand side of the map says is a commodity, anything being productised while it is still genesis. Then look at movement: which components are sliding rightward, and what becomes possible above them once they do, since a component commoditising underneath you is also a component you no longer have to build. Then look for inertia: where a budget, a team, or a governance process is built for the stage a component is leaving, because that is where the resistance will come from. A map that surfaces none of these and changes no decision was decoration.

The tea shop is legible because everyone already knows the chain, but the same five questions, user, need, dependencies, placement, reading, produce the map above: a user need resting on a web application, a platform, a data store, and compute, with compute already a commodity and the platform commoditising under it. The domain changes; the moves do not.

Lightweight tools

The six tools above are substantial frameworks. Alongside them sits a family of small structuring devices, mostly quadrants and checklists, that do not need a chapter each but earn their place because they are fast, legible, and good at preventing specific oversights. Their epistemic status is simply that they are useful formats, widely used because they work.

RAID is a tracking checklist for Risks, Assumptions, Issues, and Dependencies. Its value is that the four categories are easy to confuse and each needs different handling: a risk has not happened yet and wants mitigation, an issue is already real and wants resolution, an assumption is something you are taking on faith and wants validation, and a dependency is something outside your control that wants tracking. Keeping the four columns separate stops a team from treating a live issue as a hypothetical risk, or from burying a load-bearing assumption in a list of tasks. It is most useful on projects with enough moving parts that things fall through the cracks, and it adds pure overhead on small, legible work.

Adjacent to RAID is the 4Ts of risk response, sometimes shown as a column on a risk register: Treat (mitigate, reduce likelihood or impact), Tolerate (accept, live with it and document why), Transfer (push to another party better placed to carry it, such as insurance, a vendor SLA, or a partner team), and Terminate (avoid, change the scope or the system so the risk no longer exists). The value of the column is that it forces a verb against every row, so a register becomes something other than a list of worries. Two caveats: Investigate is a hold-state while you scope, not a fifth T, and rows in it need a decide-by date; and Tolerate is a real choice rather than a non-answer, but only if the rationale is written down.

Bow-tie analysis, a process-safety cousin that draws causes to the left of a top event and consequences to the right with responses as barriers across the diagram, asks one question the register alone does not, whether you have controls on both sides of the event, but is heavyweight for most engineering-org work. The operating chapter's lightweight section has the natural companions: the decision log as the home for the Tolerate rationale, and the pre-mortem as the source of register rows in the first place.

RACI clarifies decision rights by naming, for each task or decision, who is Responsible (does the work), Accountable (owns the outcome and is the single point of decision), Consulted (gives input before), and Informed (told after). Its central rule is exactly one Accountable per item, since diffused accountability is a common cause of decisions that never quite get made. The failure mode is the R and the A: people who have not internalised the difference end up marking the same person both, or marking everyone Responsible, until the chart records activity rather than decision rights and clarifies nothing. It is worth reaching for when "who actually decides this" is genuinely unclear, and it becomes bureaucratic theatre when applied to work where everyone already knows. RACI is at least as much an operating tool as a diagnostic one; the operating chapter returns to it and to DACI, a leaner variant for one-off decisions that drops the standing-role framing.

The Eisenhower matrix sorts work on two axes, urgent versus not urgent and important versus not important. The point it forces is the distinction between urgent and important, which people conflate under pressure, so that the not-urgent-but-important quadrant, the strategic and developmental work, stops being crowded out by the urgent-but-unimportant. It is a personal prioritisation aid more than an organisational tool, and its weakness is that urgency announces itself while importance does not, so under real pressure the matrix tends to collapse into a urgent-or-not sort, with the importance axis quietly back-filled to justify whatever someone already meant to do. The grid is only as honest as the judgement of importance you bring to it.

A few others in the same family are worth knowing by name. A SWOT grid scans strengths, weaknesses, opportunities, and threats, useful as a conversation-starter and weak as analysis since it rarely survives a hard question. A two-by-two of impact against effort triages a backlog quickly, with the high-impact low-effort quadrant being the obvious first place to look. A risk matrix plots likelihood against severity to sort what to worry about, with the caveat that both axes are usually guesses dressed as numbers; FMEA, Failure Mode and Effects Analysis from reliability engineering, is the heavier sibling that adds a Detection axis to ask whether you would notice before damage was done, and earns its place when "we wouldn't catch it" is part of what makes a risk frightening. MoSCoW sorts requirements into Must, Should, Could, and Won't-this-time, and its real value is the explicit Won't, which forces the deprioritisation people otherwise avoid.

The Ladder of Inference is the closest of the lightweight tools to the developmental mode, and the one that earns the most from a worked instance. Two people are arguing about a conclusion and getting nowhere, because each is treating their own conclusion as the plain reading of the facts. The ladder, Chris Argyris's account of reasoning rendered as the now-standard climb by Peter Senge, names why: from the same pool of observable data, each person selects a different few facts, reads a meaning into them, adds assumptions, and arrives at a conclusion that feels obvious from where they stand. By the time they speak, they are several rungs up and arguing about the top rung. The move is to climb back down, not to restate your conclusion harder but to ask what the other person actually saw, the data and the selection that put them on a different ladder. The disagreement usually turns out to live near the bottom, in which facts each treated as relevant, not at the top where the shouting is. It pairs naturally with the assumptions work in an Evaporating Cloud, and the force-field sketch, listing forces pushing for a change against those resisting it, is a lighter cousin of the Cloud in the same spirit.

The shared discipline with all of these is to treat the format as a prompt, not as the analysis. The grid asks the question. You still have to answer it, and the neat quadrants can give a false sense that filling them in was the thinking.

Operating Tools

These are for running an organisation while it is also being diagnosed and repaired. They assume you cannot stop, that the work has to keep shipping, and that the job is to integrate analysis and people-shaped concerns under real time pressure. The chapter covers four: Grove's framework from High Output Management, Team Topologies, the Improvement Kata, and Barry Johnson's Polarity Management, plus a section on lightweight coordination tools.

This mode is defined by context of use rather than by a shared method, which is why several of its tools are diagnostic ideas applied under pressure. Grove's limiting step is Goldratt's constraint, seen from the manager's chair. The Improvement Kata is Deming's PDSA, drilled until it is a reflex. Team Topologies takes Conway's law as climate, in Wardley's exact sense of a force you map and respond to rather than argue with.

The shape of problem they are ill-suited to: deep root-cause work, which they tend to assume has been done already. Personal development work, since they are about role and rhythm rather than psychology. Genuinely novel situations with no operating template.

Grove's framework (High Output Management)

Andy Grove's 1983 book starts from a redefinition. A manager's output is not the manager's activity. It is the output of the organisation they manage, plus the output of the neighbouring organisations they materially influence. Everything else in the book, leverage, indicators, inspection, maturity, meetings, and feedback, is an instrument for raising that number. Read this way it is a production manual pointed at managerial attention, and its first discipline is refusing to count activity as output.

What follows is not a summary of the whole book. It extracts the operating spine: define output, allocate managerial time by leverage, inspect early, watch predictive indicators, manage and develop people by task-relevant maturity, and run the recurring meetings that turn information into changed behaviour. Grove's material on motivation, formal performance review, and planning sits outside this spine and is left to the book.

Epistemic status. This is experience codified rather than experimentally tested: the distilled practice of running Intel, written down as transferable principle. It has held up across four decades and underpins much of modern technology management, which is strong evidence of practical value but not the same as measured fact. The output and leverage ideas generalise cleanly. Task-relevant maturity is a useful heuristic, not a measured construct, so apply it as a lens rather than a scoring system.

The Core Ideas

Output. A leader's output is the output of their organisation plus the output of the neighbouring organisations they influence. Don't count long messages, meetings attended, or being well informed, because none of them is what the organisation produced, they are all inputs. Most of the feeling of being busy lives in the gap between the two. The point about neighbouring organisations is also important: a leader is accountable for output they do not directly control, teams you depend on and negotiate with are part of your output by definition, so time spent raising theirs counts.

Leverage. If output is the goal, leverage is how you reach it, as the scarce resource is time. A high-leverage activity moves a great deal of organisational output for a little time of yours; a low-leverage one moves little; a negative-leverage one, reversing a decision without context or becoming the bottleneck others wait on, moves it backwards. Hiring, training someone on a recurring task, and a decision that shapes many people's work for a long time are high-leverage. Reading every message and attending meetings where you hold no decision are low.

The point is that leverage is a measurement, not a virtue. You get no credit for wanting to do high-leverage things.

Observing output

Leaders cannot personally observe all the work without becoming part of the process, so they instead need indicators or inspection.

Indicators, and especially the difference between lagging indicators that tell you a result has happened and the leading indicators that predict it. A rise in unresolved design questions may predict delivery risk before the milestone slips; escaped defects reveal a quality problem only after the customer has felt it. Which of two indicators leads and which lags depends on the outcome you are trying to predict, not on the metric in the abstract. The second distinction is pairing. You compare an output indicator against a quality one, so that pushing throughput cannot quietly degrade the work. A small set of paired, predictive indicators reviewed on a rhythm frequent enough to act before the lagging result moves beats a dashboard assembled because the data happened to exist.

Inspection at the cheapest stage. Grove builds much of the book on a single worked image: a waiter serving a breakfast of a boiled egg, toast, and coffee, treated as a small production line. The three steps take different times and depend on each other, which makes the breakfast a place to see both where to inspect and where the line is paced. Inspection first. You reject the bad egg as it arrives from the supplier, before you have spent any cooking on it, because a defect caught late has already consumed all the work done after it. In knowledge work this is reviewing the one-page outline before the full document, and the document before the code is written. The earlier the stage, the cheaper the rejection. This is the same move as an indicator at a smaller scale: the work between inspection points is a process you cannot watch directly, so you check it at the boundaries and treat the result as a surrogate for the state of everything in between.

The limiting step. Output has a bottleneck, and finding it is half the job. In the same breakfast, the slowest step, the boiling egg, sets the pace of the whole line, so the rest is scheduled backwards from it: the toast and coffee start late enough that everything arrives together rather than going cold while it waits. The bottleneck observation is the part that resembles a constraint. The scheduling around it, the staggering of dependent steps so that nothing waits, is the more useful and more often neglected half, and it is closer to the stocks, flows, and delays of systems dynamics than to a static cap on capacity. Applied to a leader, the question is which step actually caps what the team produces, and whether your attention is on it or on something more comfortable.

Meetings in general as production machinery. Grove divides them by purpose. Process meetings, the one-to-one and the staff meeting, are recurring mechanisms for information flow and coordination, scheduled because the need is continuous. Mission meetings exist to produce one specific decision or solve one specific problem, and should be rare, urgent, and short. A one-hour meeting of fifteen people that produces no decision is a negative-leverage event. The error is not having meetings, it is using the wrong type for the job.

Raising the output of people

The largest single store of future output is the people doing the work, and three instruments raise it: matching how you manage to where they are, developing them past it, and leveraging one-on-ones to do both.

Task-relevant maturity. The largest lever on a person's output is not their experience in general but their maturity for the specific task in front of them. The same engineer can be high maturity on a refactor in code they have owned for years and low maturity negotiating with a vendor for the first time. Maturity for the task, not seniority of the person, sets how you manage it: structured and directive when it is low, two-way reasoning and support in the middle, and little beyond agreed objectives when it is high. It also sets how often you meet and how closely you monitor: tight when maturity is low, daily check-ins on the risky area, and light when it is high, a periodic conversation with written updates in between.

The leader navigates this in order to raise it, and a person whose maturity you have raised produces more output for less of your time, which means developing maturity is itself the high-leverage move, adapting the approach as it grows. Delegation without the follow-through that builds maturity is not delegation. It is abdication.

Training. The mechanism by which maturity rises is training, and Grove is clear that it belongs to the direct leaders rather than to a separate function. The reason is leverage: a few hours spent teaching a recurring task changes the quality of every future instance of that task, which is one of the highest ratios of output-moved to time-spent available. The practical move is to assign a task slightly beyond current competence with enough scaffolding that the attempt succeeds but not so much that nothing is learned, then debrief the gap between what was expected and what happened. Treating training as someone else's responsibility forfeits the cheapest compounding output you have.

The one-on-one. Grove's argument is: ninety minutes of time can shape two weeks of a person's work, some eighty hours, which is a high ratio. It is primarily the report's meeting: their agenda should lead, their problems should take most of the time, and the leader's job is to listen, probe, teach, and surface what has not been said. The one-to-one is for ambiguity, judgement, feedback, development, and the things that only come out when someone is asked. Used for status it becomes the most expensive way to read an update, which is the surest way to waste the instrument.

Feedback closes the loop. A system that measures without feeding back observes its own decline without altering it, and no indicator and no inspection is worth running unless someone acts on what it shows. Grove also treats performance feedback as among the highest-leverage things you can do, because it is the point where measurement becomes learning.

How to use it

Run Grove as a calendar-and-output audit.

Name the output first, in observable terms: shipped customer value, reliable service, engineers hired and ramped, the output of the neighbouring teams you depend on and negotiate scope with. Then take last week's calendar and message load and mark each activity high, medium, low, or negative leverage against that output. Mark an activity high only if it changed many people's future work, improved a recurring process, raised someone's task maturity, or removed a real bottleneck, not because it felt important. Most of the "I am so busy" energy turns out to sit in a third column that produces neither your output nor a neighbour's.

Find the limiting step. Ask what currently caps output: unclear priorities, slow review, a missing skill, a dependency on another team, decision latency, rework. Move attention there before optimising easier work elsewhere, and place inspection just before whichever stage makes defects expensive, the outline before the document, the design before the implementation.

For each report, list their two or three actual areas of work and put a maturity band on each, not on the person. The senior engineer is high on the refactor and low on the vendor negotiation, which means you stay out of the first and meet often on the second; the verdict you must never write is a single band next to their name. Manage to raise the bands, not merely to route around them.

Then protect the routines. Pick one or two predictive indicators per outcome, each paired with a quality guardrail, on a rhythm fast enough to act before the lagging result lands. Keep status out of the one-to-one and put it in writing. Make sure every recurring meeting has a type and every mission meeting has a decision it exists to produce.

What well-applied Grove looks like

The calendar has visibly moved toward high-leverage work, and they can name the low-leverage thing they stopped doing to make the room. The same person is managed closely in one area of their work and left alone in another, and the difference tracks their maturity for each task rather than a single label. A leading indicator moved before the lagging one did, and you acted on the early reading instead of waiting for the result to confirm itself. A recurring meeting was cancelled because nobody could name its output, and a mission meeting that had been drifting for weeks was given an owner and closed in one sitting.

Anti-patterns

Maturity read as a fact about the person. Treating someone as low or high maturity in general collapses the per-task judgement that makes the idea work, and produces both the senior person micromanaged on what they know and the junior person abandoned on what they have never done.

Leverage as aspiration rather than measurement. "I should do more high-leverage work" is a slogan until you have named the activities, estimated their effect on output, and stopped doing something lower-leverage to make room. The negative-leverage case is the warning: reversing a decision without context, or becoming the bottleneck others wait on, is not doing low-leverage work but moving output backwards, and the ranking should treat that as worse than zero.

Late inspection. Reviewing only after the expensive work has happened: commenting on the finished document instead of the outline, discovering an architectural disagreement after implementation, or finding quality problems only once customers have felt them.

Monitoring mistaken for meddling, or the reverse. The maturity dial cranked into surveillance on a high-maturity task is meddling and reads as distrust; the same dial left loose on a low-maturity task is abdication. The skill is matching monitoring intensity to maturity, not picking one setting and applying it everywhere.

Indicator inflation. Twenty metrics watched because the pipeline produces them. Two that predict the outcome, each paired with the quality it must not erode, are worth more than twenty that decorate a dashboard.

The one-to-one as a status meeting, which takes the highest-leverage routine and puts it to the lowest use.

Comfort work. Spending managerial time where you are fluent rather than where the system is constrained: rewriting a document yourself while the team's real bottleneck is unclear ownership or a decision nobody will force.

When Grove is the wrong tool

Grove assumes you know what you are producing. When the strategy is contested or the goal is wrong, the production function is well-defined but its output is the wrong thing: you optimise efficiently toward the wrong end, and the leverage ranking faithfully directs your time at the wrong activities. The work to do first is diagnostic, not operational.

Grove also assumes a team in working order. When a team is burned out or has lost trust, the leverage calculus inverts, because the activities that normally produce output now produce little: a one-to-one with a burned-out report is a high-cost conversation that moves almost nothing, so the highest-leverage move is repair rather than execution. That repair is developmental work, and it comes first.

Team Topologies

Matthew Skelton and Manuel Pais's 2019 book gave engineering organisations a shared vocabulary for team design. The load-bearing idea is underneath it: you do not fight Conway's law, that systems come to mirror the communication structures that build them, you run it backwards, starting from the flows of change the organisation needs and then drawing the team and software boundaries whose mirror produces the architecture those flows require. Everything else follows from that. Most teams should be stream-aligned, owning one flow of change end to end, and the other three types exist only to keep them that way, each justified by a specific cognitive load it lifts off a stream-aligned team.

Epistemic status. Practitioner synthesis with one empirically-supported claim holding it up. The Conway's law mechanism beneath it, that systems come to mirror the communication structures of the organisations that build them, has evidence: studies matching codebases of loosely coupled open-source communities to commercial firms building comparable software (MacCormack, Baldwin, and colleagues) found that the more loosely coupled organisations produced more modular code in every pair compared. However, Conway's law is descriptive while the inverse manoeuvre is normative, so it is a pragmatic bet that the mirroring is strong enough to steer the direction, while also assuming you know the target architecture already.

The cognitive-load framing borrows John Sweller's term from instructional psychology, where it describes individual working memory and transplants it to whole teams in delivery settings, which is a jump the primary literature doesn't make. Use it as a question to ask, not a number to compute. And one caveat the model's own flow promise depends on: it holds at the level of a team that owns a vertical slice end to end, and degrades into handoff chains when teams called stream-aligned are secretly sliced by component or layer, so the live question for any application is whether your slices are actually vertical.

In essence, trust the Conway core, treat the manoeuvre as a heuristic rather than a law, use the types and modes as a shared language, and check every boundary against real flow.

The core ideas

Team Topologies is a small kit: one law to design around, four kinds of team, three ways for them to interact, and a method for cutting the lines between them.

Conway's law is the thing you design around. Systems come to mirror the communication structures that build them, so the architecture you ship ends up shaped like your teams whether you meant it or not. Don't fight this and lose, run it backwards: decide the architecture you want, then draw the teams whose mirror produces it. This is the inverse Conway manouver, a bet rather than a guarantee so watch weather the architecture you wanted is actually emerging and adjust when it is not. Conway's own paper, concluded that flexibility of organisation is the point, not mirroring as a fixed state.

The four team types. A stream-aligned team owns one flow of change end to end, a product or a journey or a segment, and is the default that the other three exist to protect. A platform team, which may be a single team or a grouping of several, provides internal services that take load off the stream-aligned teams, run as a product whose customers are those teams. An enabling team grows a capability in another team and then moves on; the team itself is usually long-lived, but its engagement with any one stream-aligned team is temporary, which makes it the Coaching Kata wearing an org chart, succeeding when it has made itself unnecessary. A complicated-subsystem team owns a part that needs genuine specialist depth, a codec or a pricing engine or a regulatory calculation, that would drown a stream-aligned team if smeared across it; it should be rare, justified by inherited difficulty rather than by the part merely being old and unloved, shared, or politically sensitive. Most teams should be stream-aligned, and the other three are there to keep them that way.

Cognitive load is what sizes a team's domain scope. A team holds only so much in its head, and once it owns more domains than it can reason about it slows, decides worse, and starts to behave like a bag of individuals rather than a team. The useful distinction is between intrinsic load, the irreducible difficulty of the domain, extraneous load, the friction that should not be there, flaky tooling, unclear ownership, manual toil, and germane load, the valuable domain thinking you want to preserve. Cut extraneous load and contain intrinsic load inside sensible boundaries, instead of asking for more clever teams. And ask: is this team being asked to understand more than a team can reasonably hold. When the answer is yes, the other three types are the relief valves: the specialist part to a complicated-subsystem team, the undifferentiated heavy lifting to a platform, the missing skill to an enabling team.

The three interaction modes are how the teams talk. Collaboration is two teams working closely and noisily for a bounded stretch, right for discovery and too expensive to leave running. X-as-a-service is one team consuming another through a clean interface with almost no talking, right for predictable scale. Facilitating is one team helping another over a hump. The discipline is to name which mode each important pair is in and to put a clock on collaboration, so that it resolves into a service boundary instead of curdling into permanent coupling nobody chose.

Finding the boundaries is where domain-driven design can come in. DDD's bounded contexts and fracture-plane heuristics tell you where to cut: along domain seams, along change cadence so that what changes together stays together, along data ownership, along regulatory lines, and along cognitive load. The practical tools that surface the seams in the first place are Event Storming, for finding where the domain naturally divides, and Context Mapping, for characterising the relationships between contexts, which rhymes with the three interaction modes without mapping onto them one to one. A good boundary has a narrow interface and a coherent reason its insides change. The rest of DDD is its own subject and stays out of frame; the fracture planes are the slice that serves team design.

How to use it

TEAM TOPOLOGIES  —  OVERLAP IS INTERACTION COMPLICATED-SUBSYSTEM rare; deep specialist STREAM-ALIGNED owns one flow of change, end to end ENABLING long-lived team, here briefly then leaves PLATFORM the foundation underneath, run thin as a product HOW TO READ IT heavy overlap: collaboration clean interface: x-as-a-service light overlap: facilitating How much two teams overlap is the cost of the interaction between them. The stream-aligned bar carries the flow; the rest take load off it.

Run it as an iterative design loop, not a one-off reorg, and start by mapping rather than redrawing.

Start with current-state Team APIs, not a target org chart. Each team drafts a short, living document declaring its current purpose, its type if that is already clear, the software it owns, the services it provides and the service levels for them, and a table of the teams it currently interacts with, naming the mode and the expected duration for each pair. You redraw nothing yet, and you do not force a team to claim a type it does not yet fit.

Then sit with the map and read it. The mismatches surface themselves: the open-ended collaborations, the teams owning more domains than they can hold, the platform teams nobody would choose, the stream-aligned teams that are really sliced by component. Treat this as it's own step, don't go from drafting straight to redrawing.

Cut one boundary at a time. Find the most painful fracture plane, use Event Storming or context mapping to locate the domain seam, and propose a single move rather than a whole-org redraw. Run the inverse Conway manoeuvre on that one boundary: shape the team to mirror into the architecture you want for that slice, then watch whether the architecture actually emerges.

Make most teams stream-aligned, and make every other team earn its name. A platform, complicated-subsystem, or enabling team justifies itself by the load it lifts off a stream-aligned team. If you cannot say whose load, it is overhead in a costume. And check that your stream-aligned teams own a user-facing stream end to end rather than a horizontal component.

Run the platform as the thinnest viable thing that lifts load. "Run it as a product" means asking whether its teams would choose it if they could, and where it must be mandatory, for security or compliance or cost, whether it is good enough that it does not become a hidden coordination tax. The Thinnest Viable Platform is the goal, sometimes a wiki page listing approved cloud services rather than a built system.

Name the mode for each pair, default to service, and clock every collaboration. "We collaborate with X until the end of the quarter, then it becomes a service" is the shape of a healthy interaction; open-ended collaboration is the shape of a missing boundary.

Align the rest of the design, or expect reversion. Moving the boundary is a quarter of the job. Change the rewards, the decision rights, the funding, and the process to match the new shape, or the structure reverts to fit the incentives within a couple of quarters. Then review the Team APIs quarterly, because the architecture moves and the team shape has to track it.

What well-applied Team Topologies looks like

A collaboration that had run open-ended for months got converted into a clean service interface with a date on it, and the two teams stopped needing constant sync.

A boundary moved onto a fracture plane and the symptom went with it: what used to require three teams to coordinate on every change now changes inside one, and a whole customer feature lands inside that one team rather than crossing several.

A platform team can point to the load it lifted, and the stream-aligned teams would choose it if they could, rather than being routed onto it.

The architecture shifted and the team shape moved with it, rather than the structure freezing after one reorg.

Anti-patterns

Renaming, not redesigning. Relabelling the existing teams with the four type names and changing nothing about boundaries or interactions. The most common failure, and it buys a vocabulary with no substance under it.

Stream-aligned in name, component-sliced in fact. The subtler version of the rename. Team Topologies promises fast flow, but if each team owns a layer, component, or narrow technical service that most customer changes must cross, the result is faster component work and slower feature work. The stream-aligned team is supposed to own a vertical slice of change end to end, usually around a user, customer, product, journey, or business capability, so that most routine changes land inside one team. Owning a single service is not automatically wrong, but if your stream-aligned teams are scoped to services over slices, you have optimised the wrong flow.

Platform as a dumping ground. Calling a team "platform" when it is really where unglamorous work goes to die, with no product discipline and no test of whether anyone would choose it. If stream-aligned teams experience it as a mandatory ticket queue or a compliance gate rather than something they would adopt willingly, it is a shared-services function wearing platform language.

Permanent collaboration. Two teams locked in open-ended high-bandwidth collaboration, which is almost always a missing or wrong boundary wearing the costume of teamwork.

Boundaries without the rest of the design. Moving the team boxes while leaving rewards, processes, and decision rights untouched, so the structure reverts.

Inverse Conway as a one-off. Running the manoeuvre once and freezing, when the architecture you want moves over time and the team shape has to track it.

When Team Topologies is the wrong tool

Small organisations. Below roughly four teams the kit is ceremony; you have a handful of people and the real question is who works on what this quarter, not which of four types each team is. Below this abstraction floor you are designing for people, not teams, though the ideas can still serve as hats and boundary checks rather than as an org chart.

When the binding constraint is what to build, not how work flows. Team Topologies optimises the flow of change and assumes coordination, ownership, or cognitive load is the binding constraint. If the real problem is strategy, customer discovery, or clarity about what should exist at all, redrawing boundaries optimises the wrong variable and can lock in a local optimum that becomes a global bottleneck. The tool answers how work should flow, not what should be built or why it matters.

When the binding constraint is not structural at all. If the teams are reasonably shaped and the blocker is trust, a wrong incentive, or an unclear strategy, redrawing boundaries is an expensive way to dodge the actual work.

When the architecture is too rigid to move. The inverse Conway manoeuvre works on new or flexible systems and produces friction on existing ones where the design has ossified. If the codebase is a tangled monolith with deep coupling, reorganising teams adds friction between developers and code without producing the target architecture; a reorganisation will not fix a broken design, and the move there is design work on the code, not team-shape work on the org.

When you cannot move the boundaries. The tool assumes the authority and the slack to reshape teams. Where reporting lines are politically fixed it diagnoses a problem you cannot act on, which is dispiriting rather than useful, and the honest move is to name the constraint, work inside it and choose your battles.

Coda: the design axis, beside and behind

Team Topologies is a tool for organisational design, the generative work of shaping the container rather than acting on the problem inside it. At the same practitioner altitude sit a handful of Will Larson's observations about team shape, and behind the whole axis sit two deeper bodies of knowledge.

Larson's distinction between weak-versus-strong teams asks whether an organisation assigns work to teams or drives it through individuals and their relationships: a strong concept shows up as team-level sprints, tickets, and goals, a weak one as work that moves along personal lines. Small organisations run weak whether they choose to or not, and most drift to strong only past a couple of dozen engineers. Beneath sits another floor: a team of fewer than four engineers behaves like a set of individuals, where you track each person's on-call and leave and a single departure tips it from building to merely maintaining. Draw four crisp boxes over twelve people and you have a diagram, not a structure.

Two further choices need to be managed. Snowflakes are the accumulation of reasonable-sounding "this team needs special rules" exceptions, each fine in isolation, but which together produce an organisation that cannot move because every change must be negotiated against the pile, which is why part of the senior job is refusing reasonable exceptions to preserve the capacity to act. The second is splitting innovation from maintenance, the urge to spin up a fresh team for the new work while existing teams keep the lights on, which buys you a two-tier system of innovators and maintainers whose morale cost usually outweighs the focus it bought. The harder, better move is to innovate inside the existing teams.

Galbraith's Star Model details overarching principles. Structure is one of five points that have to align along with strategy, processes, rewards, people. Whenever you reach for Team Topologies, the Star Model is the reminder that the boundaries are a quarter of the job: if the reward system still pays people for the behaviour the new structure is meant to end, the structure loses and behaviour reverts. This rhymes with the collection's running line that behaviour is produced by context, not character.

According to Mintzberg's configurations organisations cohere into a few types, each a matched bundle of a coordinating mechanism (mutual adjustment, direct supervision, or the standardisation of work, outputs, or skills) and a dominant part, and most structural dysfunction is a mismatch, controls built for one mechanism applied to work that needs another. When coordination is failing, ask which mechanism the work actually needs before adding more of the one you already have.

The Improvement Kata

Mike Rother's Toyota Kata, from 2010, describes a behavioural routine behind lean continuous improvement: the repeated practice that Rother argues organisations miss when they copy the visible tools, the kanban boards and value-stream maps, without the managerial habit that produced them. The Improvement Kata is a routine for moving a team from where it is toward a hard goal it cannot reach in one leap, by taking experimental steps through the unclear territory in between. The word kata means a movement pattern practiced until it becomes second nature, not a methodology you consult. Rother's image for it is a compass rather than a map: there is no map for the place you are trying to reach, so you take a bearing and navigate the next stretch of unknown ground.

Epistemic status. The Kata is Rother's model of how continuous improvement happens at Toyota, drawn from observation and generalised into a teachable practice. The underlying experimental loop is the scientific method applied to operations. Whether the specific four-step and five-question forms are the best way to instil it is a design choice rather than a proven optimum, and the practice depends on the quality of coaching, so results vary with how well it is taught.

The core ideas

Four steps. The first three are planning: they locate where you are going and where you stand. The fourth is the daily execution. The challenge and the target condition hold steady while you work toward them; the current condition and the obstacles list are updated as you go.

What separates the Kata from ordinary project planning is the threshold of knowledge: the line between what you know from facts and data and what is still guesswork. The current condition sits on the near side of it; the target condition sits beyond it, which is why you cannot plan a route there and have to experiment instead. Every experiment moves the threshold forward.

First, decide the challenge. This is the far goal, six months to three years out and hard. "Cut the time from code-merged to deployed-in-production from four hours to under thirty minutes." It is set above the team, at the level of the value stream or the organisation, far enough out that the path to it is unknown, which is the condition the Kata is built for. A challenge you can already see how to reach does not need one.

Second, grasp the current condition. Before planning anything, study how the work operates, in observable terms. Not "deploys feel slow" but "over the last twenty deploys, median time from merge to production was four hours and ten minutes, and the single biggest block of time, about ninety minutes, is the manual QA sign-off step". The starter routine is a block diagram of the process with cycle times, run charts of the key metrics, and a worksheet that turns the two into a description of the current pattern. For a simple process this is an afternoon; for a tangled one it is longer.

Third, set the next target condition. The target condition is not the challenge and not a task. It describes how you want the process to be operating at a specific near date, one week to three months out, and it has three parts: an achieve-by date, a measurable outcome, and the pattern of work that produces it. The date is fixed: when it arrives and you have not reached the condition, you do not slip the date, you reflect on what you learned and set a new target from where you now stand. What the target must not contain is the countermeasure. "Low-risk changes reach production with no human reading a ticket on the critical path, median under ninety minutes, by the fifteenth" is a state; "automate the QA sign-off" is a solution, and putting it in the target leaves the experiments nothing to discover. The target condition lies beyond your threshold of knowledge: you should not yet be able to see how to reach it, and if you can, it is too close.

Fourth, experiment toward the target condition. As soon as the target is set, some obstacles become visible; more appear only as you move, so you keep them in an obstacles parking lot, a list that grows as the work teaches you what is in the way, and you work on one at a time. An obstacle is a condition in the work, not a solution in disguise: "risk cannot be judged without a person reading the diff" is an obstacle, while "build a risk classifier" has already chosen the answer.

Against the one obstacle in front of you, you run the cycle. Name what you are working toward, state where you are now relative to it, then design your next step as a prediction: write down, before acting, what you will do, what you expect to happen, and when you will look. Then run it and go and see the actual result. The learning comes from the gap between prediction and result. If the result matches, your model of the work held, and you extend it; if it misses, the miss is information, because your model was wrong and now you know where to investigate. Either way the next step is shaped by what the last one revealed, and the threshold has moved. You are not executing a plan; you are moving a knowledge threshold toward a target through a chain of small predicted-and-checked steps. Design those steps so a wrong prediction stays cheap, because the misses are where the learning is and you want them to land on you rather than on a customer.

How to use it

Start with one team and one challenge, since the Kata is learned by doing. It spreads by coaching, one real relationship and one real challenge at a time; rolling it out as an organisation-wide programme is a failure Rother warns against.

Grasp the current condition before setting a target. If you cannot state it in facts you are not ready to move, but do not turn the study into a research project either.

The Kata is run at a storyboard, not in someone's head or in a backlog. Its panels are the four steps made visible, with the obstacles parking lot and the experiment record below them.

THE KATA STORYBOARD 1 · CHALLENGE merge → production under 30 min far goal, 6 mo – 3 yr the path is not yet known 2 · CURRENT CONDITION median 4h 10m QA sign-off ~90m, on critical path for all facts, run chart 3 · TARGET CONDITION low-risk changes ship with no ticket-read, median < 90m, by the 15th date · outcome · pattern threshold of knowledge 4a · OBSTACLES PARKING LOT work ONE at a time; list grows as you learn risk can’t be judged without a person reading the diff now staging takes 20m to spin up flaky integration suite 4b · EXPERIMENT RECORD one row = one predicted-and-checked step STEP EXPECT RESULT LEARNED shadow rule: FE-only ~1/3, 0 issues 28%, 1 reg. too broad exclude checkout flow ~25%, 0 reg. ? One obstacle at a time. Each row is a prediction checked against what happened, and the miss feeds the next.

Work one obstacle at a time, and run the inner loop in small fast cycles of one change each, so you can see which change moved the result.

What well-applied Kata looks like

A wrong prediction visibly changed the next experiment: the team can point to a cycle where the result missed and show how the miss reshaped what they tried next.

The target condition got reached through a chain of cycles on the board, each narrowing the gap, rather than a plan executed in one move. Where a date arrived unreached, the team set a fresh target rather than quietly moving the date.

The obstacles parking lot changed over time, picking up obstacles that were invisible at the start, the sign the team was learning from the work rather than executing a list drawn up on day one.

A learner who could not run the loop at the start now runs it without the coach, which is the coaching having worked.

Anti-patterns

Skipping the current condition, jumping from challenge straight to action.

Target conditions that are really goals or tasks, "improve reliability" or "automate QA," rather than a specific near-term process state with a date, an outcome, and a pattern.

Naming obstacles as missing solutions, which pre-commits you to the countermeasure before the experimenting starts.

Breaking the predict-and-check loop, either acting without a written prediction or declaring a change done without going to see the result against it, so the gap that produces the learning never forms.

Batching changes, running several at once so you cannot tell which one moved the outcome.

The coach solving the problem, answering the questions for the learner instead of asking them.

Rolling the Kata out as an organisation-wide programme rather than letting it spread by being coached on one team first.

When the Kata is the wrong tool

When the direction is wrong. The Kata navigates toward a target but says nothing about whether the target is correct, so it will efficiently take you in the wrong direction if the challenge is wrong. Strategic work, including Rumelt, happens before the Kata.

When there is no slack for experiments. The Kata requires capacity to try small things and observe, and a team in pure firefighting mode cannot run it until it has created breathing room first.

When experiments cannot be made short, safe, and observable. The Kata is built for uncertain territory, so novelty alone is not the limit; the limit is the cycle. If the only meaningful test takes months to return, or carries a blast radius you cannot accept, the scientific thinking still applies but the tight daily cadence does not, and you need a looser research design.

Coda: the Coaching Kata

Rother pairs the Improvement Kata with a coaching routine. The learner runs the improvement; the coach develops the learner's ability to run it. While the learner is learning, this means a short coaching cycle, around fifteen minutes, daily, at the storyboard. The daily cadence is doing the work: a manager's untrained default is to give answers and jump to advice, and frequent practice is what overrides it. Mature pairs can loosen the cadence once the pattern holds without it.

The five questions, asked in the same order every time:

  1. What is the target condition?
  2. What is the actual condition now?
  3. What obstacles are in your way, and which one are you addressing now?
  4. What is your next step or experiment, and what do you expect?
  5. How quickly can we go and see what we have learned from taking that step?

Between the second question and the third sits the reflection on the last step:

The reflection is what makes the gap between prediction and result accumulate as learning on the board; without it the storyboard becomes forward-looking planning and stops carrying anything from one cycle to the next. The coach does not supply answers, and the fixed sequence is what stops the drift toward advice and diagnosis. It is the same ask-don't-tell discipline as GROW, with the question set fixed and, for learners, the cadence daily.

Polarity Management

Barry Johnson's framework, developed through the 1980s and consolidated in his 1992 book, turns on a distinction that is easy to state and easy to half-learn. Some situations are problems, with a right answer that ends the issue; some are polarities, pairs of values both necessary and interdependent, where committing permanently to one is what makes things worse.

The distinction is the famous part. The discipline is the part people get wrong even once they hold it: a polarity is not managed by splitting the difference and settling in the middle, which reaches neither pole's upside, but by oscillating deliberately, riding one pole's upside until its downside starts to accumulate and then moving to the other before that downside hardens into a crisis that forces a clumsy overcorrection.

Managing that oscillation over months and years is a running-the-organisation discipline, which is what places the tool in the operating mode: holding a deliberate balance against the constant pull to pick a side and be done. It is the slow, sustaining work that the faster operating tools assume someone is already doing.

Epistemic status. The central distinction, between problems with solutions and polarities that require ongoing management, is conceptually clean and widely found illuminating, and it is the durable contribution. The fuller apparatus, the four-quadrant map and the infinity loop, is a useful structuring device rather than a measured model, and the framework rests on accumulated consulting experience rather than controlled evidence. The main risk is over-application, since labelling a genuine problem a polarity becomes an excuse to avoid deciding.

The core ideas

Problems versus polarities. A problem has a solution and stays solved. A polarity is a pair of values or approaches that are both necessary, both valuable, and interdependent, where the work is not to pick one but to manage the ongoing oscillation between them. Stability and change, individual and team, centralised and decentralised, short-term results and long-term capability. The diagnostic question: if I picked this option and stayed there forever, would problems eventually appear? If yes, it is a polarity.

The polarity map is a four-quadrant tool. Put the two poles on the horizontal axis and upside above, downside below. Each quadrant gets filled in, so for centralised versus decentralised decision-making you capture the upside and downside of each. The map makes visible what advocates of each pole already know about their own upside and the other's downside, and what they typically do not acknowledge about their own downside and the other's upside.

The infinity loop. Organisations move from one pole's upside, accumulate its downside, swing to the other pole's upside, accumulate its downside, and swing back. This is normal and unavoidable, and the work is not to stop the oscillation but to recognise it and move deliberately, catching the downside before it becomes a crisis that forces an over-correction.

THE POLARITY MAP GREATER PURPOSE ( + ) DEEPER FEAR ( − ) POLE A POLE B ++ UPSIDE of A UPSIDE of B DOWNSIDE of A DOWNSIDE of B Both poles are legitimate. Catch each downside early, before it forces a lurch to the other side.

Early warning signs. For each downside, identify the signals that show it is accumulating, which lets you shift toward the other pole before the downside forces the shift.

Its subject is human tension even though its discipline is operating, which makes it the collection's cleanest example of the introduction's point that the modes are sorted on different axes: here the subject pulls one way and the context of use the other.

How to use it

Diagnose first, asking whether this is actually a polarity. Genuine polarities have both sides advocating legitimate values, a history of oscillation, and problems that appear whenever one side is fully committed to. If the answer is "we just need to decide and stick with it forever," it is probably a problem.

Map both poles fully with both camps in the room. The value is in the mixed group, because each camp fills in its own pole's upside and the other's downside easily and struggles with the remaining quadrants, and doing the map together forces each side to acknowledge what the other sees.

Identify observable early warning signs for each downside.

Design action steps for each upside, concrete actions rather than "be more decentralised."

Set a cadence to revisit the map, since polarity maps die when built once and filed.

What well-applied Polarity Management looks like

The oscillation got named in real time and corrected before it became a crisis: someone could say we have swung too far toward centralisation and the group adjusted, rather than discovering it at the next breakdown.

An advocate of one pole defended the other pole's upside accurately, which is the sign the map stopped being a weapon and both sides found it fair.

Anti-patterns

Polarity Management as compromise, splitting the difference permanently and ending up in the middle where neither upside is accessed. This is the most common misunderstanding, since the work is managing the oscillation to access both upsides over time, not living in the middle.

Treating problems as polarities, making everything contestable when some things have right answers.

Treating polarities as problems, the more common error in diagnostic-default cultures, trying to solve stability versus change by picking one.

Filling the map with strawmen, where advocates of the favoured pole exaggerate the other's downside, so the map only works if both sides find the content fair.

No early warnings and no cadence, building the map once and being surprised when the oscillation hits a crisis.

When Polarity Management is the wrong tool

When the situation genuinely is a problem with a right answer, where polarity thinking produces unnecessary equivocation. The diagnostic question is the test.

When power is too asymmetric, so one side can permanently impose its pole regardless of consequences, in which case the intervention is at the power layer.

When the timescale is too short. Polarity Management works over months and years, so for an urgent decision it is unhelpful, and you should decide now and revisit when there is space for the longer frame.

Lightweight tools

As with the diagnostic chapter, a family of small structuring devices sits alongside the substantial frameworks. They carry no theory, and their epistemic status is simply that they are useful formats. The operating versions are mostly about coordinating people and work rather than diagnosing problems.

A3, named for the paper size, fits one problem to one sheet: problem statement, current condition, goal, and root-cause analysis on the left, countermeasures, plan, and follow-up on the right. The page is the whole trick. It forces the analysis to fit and to come before the solution, and it puts one name on the problem. Think of it as a forcing function rather than a framework, with PDCA underneath it, the same loop as the Kata. Its best use is as the artefact a diagnostic workshop distils into: the Current Reality Tree or Cloud does the thinking, the A3 carries the decision and the check to the people who have to act. It is discussed, not filed as an A3 nobody walks through is a status report. It adds ceremony when the problem is too small to need a page or too tangled to fit one.

THE A3 PROBLEM: OWNER: CURRENT STATE · ROOT CAUSE COUNTERMEASURES · ACTION 12345 678 PROBLEM STATEMENT BACKGROUND CURRENT CONDITION GOAL / TARGET ROOT-CAUSE ANALYSIS COUNTERMEASURES IMPLEMENTATION PLAN FOLLOW-UP / CHECK One page, one problem, one owner. The layout forces the analysis to come before the solution.

RACI, covered in the diagnostic chapter's lightweight section, is at least as much an operating tool. Naming a single Accountable person per decision is often the fastest fix for work that stalls because nobody owns the call.

A stakeholder map, listing who is affected by a change and sorting them by how much they care and how much power they have, is a quick way to anticipate where resistance and support will come from before a rollout rather than during it.

A pre-mortem is the lightest operating risk tool there is: imagine the initiative has already failed, ask why, and act on the answers. It takes minutes and reliably surfaces the risks a confident team was talking itself out of. It is also one of the few lightweight tools with real experimental support behind it. Gary Klein's work on the technique draws on research into prospective hindsight, the finding that imagining an outcome as already having happened, rather than as a possibility, measurably improves people's ability to generate reasons for it. The format works because of how it reframes the question, not just because it is a tidy prompt.

A simple decision log, recording what was decided, by whom, when, and why, is unglamorous and disproportionately valuable, since most organisations relitigate settled decisions because nobody wrote down the reasoning. It is the operating counterpart to a strategy that exists on paper.

A decision framework names who drives a decision and who approves it, so that decisions get made rather than circling. RACI is the general version; DACI is a sharper variant for one-off decisions, naming the Driver who runs the process, the single Approver who decides, the Contributors who inform it, and the Informed who hear the outcome. The lighter touch, when even DACI is too much, is simply to declare for a given decision whether it is one person's call after input, a consensus, or a vote, and to say so out loud before the discussion rather than discovering it during the argument.

A working-agreement or team-charter sketch, written once when a team forms or reforms, records how the team has agreed to operate: what "done" means, how decisions get made, when meetings happen, what response times to expect. It is lightweight and prevents a surprising amount of recurring friction by making the implicit explicit, and it connects directly to Larson's weak-versus-strong team concept, since writing the charter is part of how a team moves toward a stronger concept.

The same discipline applies as before: the format is a prompt, not the work. A filled-in RACI with the wrong person Accountable is worse than none, because it looks settled.

Developmental Tools

These work with humans as humans, with feelings, histories, identities, capabilities, competing internal commitments, and the shared assumptions a group forms over time. You reach for them when the diagnostic mode has done its work and the blocker turns out to be that two people do not trust each other, or that someone cannot yet do the thing they are being asked to do, or that the conflict is about identity rather than incentive, or that a whole culture has learned to behave in a way no one in it would choose. The chapter covers five: Difficult Conversations, Immunity to Change, coaching frameworks built around GROW, the Trusted Advisor, and Schein's model of organisational culture, plus a section on lightweight relational tools.

This is the mode with the contested lineage the introduction described, psychology, therapy, conflict resolution, adult development, and organisational culture, and contested is not the same as least grounded. The tools also run across a wide range of scale, from a single conversation up to the culture of an entire organisation.

The shape of problem they are ill-suited to: structural problems that have nothing to do with people, individually or collectively. Situations where there is a wrong incentive and no amount of conversation, and no amount of culture work, will fix it. Problems where the right answer is to change the system rather than to work with the people in it.

Difficult Conversations

The 1999 book by Douglas Stone, Bruce Patton, and Sheila Heen, from the Harvard Negotiation Project, is the most useful tool here for the conversation where you know you must say something hard and do not know how. The move people already know is that a difficult conversation is really three conversations layered together. The move that decides whether it goes well is noticing the third one. The "what happened" layer is where everyone fights, the feelings layer is where the energy actually sits, but the identity layer, what the situation says about whether each person is competent or good or worthy, is the deepest and the one almost always invisible to the people in it. A conversation that keeps escalating past what the surface warrants has hit an identity layer nobody has named, and naming it is usually what lets the other two move.

Epistemic status. This is a structured distillation of negotiation and communication practice rather than an experimentally validated model, but it rests on decades of work at the Harvard Negotiation Project and is consistent with what is known about how people handle conflict and threat. The three-conversations decomposition has strong face validity and is widely reported as useful by practitioners across many fields. Treat it as a reliable framework for structuring your own preparation and listening, not as a predictive theory of what the other person will do.

The core ideas

Every difficult conversation contains three. The "what happened" conversation is about facts, fault, and intent, and it is where most people focus and the least useful work happens, because people disagree about facts when they hold different information and interpretations. The feelings conversation is about the emotions present on both sides, acknowledged or not, and the claim that holds up is that feelings are always present and that refusing to acknowledge them does not remove them, it just lets them drive the conversation from underneath. The identity conversation is about what the situation means for who each person is, whether they are competent or good or worthy, and it is the deepest layer and the one most often invisible to the participants.

Impact is not intent. People argue about intent, "I did not mean to dismiss you," when the issue is impact, "the effect was that I felt dismissed." Conflating them produces defensive spirals that are about neither.

Contribution, not blame. Move from whose fault this is to what each party contributed to the situation. Contribution is not symmetric, since sometimes one party contributed most of it, but the framing opens space for each side to see its own role without being on trial.

Stories, not facts. Each party brings a story built from selectively attended facts plus interpretation plus prior pattern-matching. The work is not to decide whose story is right but to surface both and find what each contains that the other lacks.

How to use it

Before the conversation, write out your three conversations. Your version of what happened, the feelings you are bringing named specifically, and the identity threat for you, what it would mean about you if you are wrong or if you raise this badly.

Anticipate the other person's three conversations too. You will be wrong about specifics, but the exercise prevents being blindsided by a layer you had not considered.

Open by acknowledging that the situation is hard rather than by stating your position. The recommended opening is a third story, the version a neutral observer would tell about the disagreement itself. "I think we see this really differently and I want to talk about it" is a third-story opening. "You did X and that was not okay" puts the other party straight into defence.

Separate impact and intent explicitly when they tangle. Saying "I am not claiming you meant to do this, I am telling you the impact, and those are different" resolves a large share of conversations stalled in arguments about intent.

Listen for what is not being said. Tone changes, topic shifts, over-explanation, sudden withdrawal. When someone reacts far more strongly than the surface warrants, you have hit an unnamed identity layer.

What a well-conducted difficult conversation looks like

Both parties leave understanding the other's story better than they did, even if they still disagree.

The identity layer surfaced for at least one party, even briefly, and feelings were named rather than driving from underneath.

Concrete next steps exist, but the conversation did not rush to them at the expense of understanding, and neither party feels they won.

Anti-patterns

Difficult conversations as performance management, using the framework as a script for delivering a predetermined decision while pretending to be open. People can tell, and it makes the harm worse.

Skipping the feelings conversation because it feels unprofessional, the analytical-thinker default. Suppressed feelings drive everything from underneath. Naming "I am feeling defensive right now" is more professional than letting defensiveness shape what you say unacknowledged.

Identity work as therapy, going deeper into identity than a workplace conversation should. The aim is to surface identity threat enough that it stops driving the conversation invisibly, not to resolve it.

Third-story openings that are not actually third-story, framed so it is clear which side is correct, which the other party sees through immediately.

Stopping at understanding, reaching mutual understanding and then dissolving without producing change. Understanding is necessary but not sufficient.

When Difficult Conversations is the wrong tool

When the issue is structural rather than interpersonal. If two people are in conflict because of genuinely incompatible objectives imposed by the structure, the conversation will not fix it, and the intervention is structural.

When power asymmetry makes openness unsafe. The framework assumes both parties can speak openly, and where there is significant imbalance, a history of retaliation, or ongoing harm, the right move may be to escalate or leave rather than to have a difficult conversation.

When the other party is acting in bad faith. The framework assumes good faith on both sides, and with someone manipulating or harming deliberately it hands them additional tools.

Immunity to Change

Robert Kegan and Lisa Lahey's 2009 book reaches further down than the other developmental tools, and is the one most structurally adjacent to the Thinking Processes. The core insight: when someone genuinely wants to change a behaviour and consistently fails, the failure is not weakness or insufficient motivation. The behaviour is protecting something else they also care about but have not named, and the system of competing commitments is functioning exactly as designed.

Epistemic status. The immunity-mapping method is a well-developed practical technique with a substantial body of case experience behind it, and the core mechanism, that a hidden competing commitment can hold a behaviour in place, is a genuinely useful and often revelatory frame. The broader constructive-developmental theory it sits within, Kegan's stages of adult development, is influential but contested in academic psychology: the evidence for the specific stage structure is mixed, most of it was generated by the theory's own developers, studies tend to be short relative to the timescale of development, and different assessment instruments give inconsistent results on the same people. Use the four-column method confidently as a practical tool and hold the developmental-stage theory more lightly.

The core ideas

The immunity map is a four-column exercise. The first column is the commitment, what you want to change, stated as a positive goal rather than a complaint, such as wanting to delegate more effectively. The second is what you are actually doing or not doing that works against it, in concrete behaviours, such as rewriting the team's work or taking back tasks you delegated. The third is the hidden competing commitment, found by imagining doing the opposite of the second column and noticing what feels threatening, then stating the positive form of that worry, such as being committed to never being seen as a manager whose team produces low-quality work. The fourth is the big assumption, what would have to be true for the competing commitment to be valid, such as believing that any quality problem in your team's work means you are a failing manager.

The four columns are a thinking process, not a worksheet, and the fourth column is the work, since the big assumptions are usually unexamined and often wrong.

The immune-system metaphor. The system of competing commitments is like a biological immune response, protecting the person from a perceived threat and working exactly as designed. The competing commitment is a feature whose purpose is no longer well served. The work is not to overpower it but to understand what it protects and update the assumptions that keep it active.

Big assumptions are testable. They are hypotheses, not truths, and they can be tested through small designed experiments. They are usually exaggerations of real but bounded concerns, and testing them safely is what lets them update.

How to use it

Pick a real commitment that has been stuck for a while, something specific to your work where you have genuinely tried and failed. Without a real history there is no contrast between stated commitment and actual behaviour.

Be specific in the second column. "I avoid difficult conversations" is too abstract. "When this person sends me a message that makes me defensive, I draft a reply, decide it is too harsh, and send nothing for three days" is the right grain.

Work the third column by asking what you would be worried about if you did the opposite of the second column. Keep asking until you find something with emotional weight. The surface answer, "low-quality output," usually hides a heavier one, "I will be seen as the manager who did not catch the problem."

Find the big assumption, what would have to be true for the competing commitment to drive behaviour the way it does, stated specifically enough to be tested.

Design small safe-to-fail tests rather than big leaps. The point is not to declare the assumption wrong and act, but to design an experiment that genuinely tests it, safe enough to run and meaningful enough to teach you something.

What well-applied Immunity to Change looks like

The person runs an experiment they would previously have avoided, and comes back with what it actually showed rather than what they expected.

A column gets rewritten as assumptions get tested, so the map visibly changes between sessions rather than standing as a one-time diagnosis.

Behaviour shifts on the original commitment, not just insight about why it was stuck.

Anti-patterns

Treating the third column as a confession of weakness, which makes people refuse to look at it. Competing commitments usually protect real values in outdated ways.

Skipping the fourth column and trying to overcome the competing commitment by willpower, which leaves nothing to test and the immune system intact.

Big assumptions that are too abstract to test, such as "I assume I have to be perfect," rather than a specific testable claim about a specific consequence.

Treating it as a one-shot exercise, achieving insight and changing nothing, when the framework requires the experimental loop.

Doing it alone for sticky cases. The big assumptions are invisible to the person holding them precisely because they feel like reality, so a coach or thinking partner who can ask the surfacing questions helps enormously.

When Immunity to Change is the wrong tool

When the situation is genuinely external rather than internal. Sometimes you cannot delegate because the team is too junior, not because of a competing commitment, and the method will manufacture an internal explanation for an external constraint. Check the structural and capability layers first.

When the person is not yet ready to examine themselves. The method requires real willingness to look at one's own contribution, and for people in crisis or genuinely unmotivated to change it is counterproductive.

For surface-level habits, where it is overkill and ordinary habit-formation tools fit better. The method is for patterns that have resisted years of attempted change.

Coaching frameworks (GROW and its relatives)

Coaching frameworks are structured ways to help someone think through a problem and arrive at their own next step, rather than being told what to do. GROW, developed by John Whitmore and colleagues in the late 1980s and set out in Coaching for Performance, is the most widely used and the easiest to learn, which makes it the natural anchor for a family of related approaches. The reason coaching belongs in the developmental mode is that its central bet, that people commit to actions they reach themselves far more than to actions imposed on them, is a claim about human motivation, not about analysis.

Epistemic status. This is the tool in the chapter with the strongest empirical backing, which is worth stating plainly given the chapter's reputation. GROW itself is a practical structure distilled from coaching practice rather than a validated model, but workplace coaching as a whole has meta-analytic support: it improves goal attainment, performance, and well-being across studies. The crucial finding for how you use it is that the coaching method does not significantly moderate the effect (GROW, cognitive-behavioural, and solutions-focused approaches produce broadly equivalent results), which means the effect comes from the quality of the relationship and the questions, not from the particular framework. Treat GROW as a reliable scaffold for a useful conversation, and treat the underlying skill, asking good questions and genuinely not supplying the answer, as the thing that actually matters.

The core ideas

GROW is four stages, usually walked in order though good coaches move between them fluidly. Goal is what the person wants from the conversation and from the situation, stated as "what would you have instead" rather than a problem to avoid. Reality is the current situation in honest detail, what has actually been tried, what is actually true, lay out the whole situation and not just the story the person walked in with. Options are the possible courses of action, generated by the person rather than the coach, trying to remove and surface the hidden constraints to prevent any pre-filtering, ideally laying out several before any is evaluated. Will, sometimes called Way Forward, is the specific commitment, what the person will actually do, by when, and how committed they genuinely are on a scale they name themselves. Treat "anything below an eight" as a sign that it won't happen and dig in to find the real obstacles, possibly looping back to Options.

The discipline that makes it work is that the coach asks rather than tells. The whole structure is a container for questions whose answers the coach does not supply. The moment the coach starts steering toward their own preferred option, it stops being coaching and becomes advice with extra steps, and the person's commitment drops accordingly.

Question quality is the real engine. "What have you already tried?" surfaces more than "have you tried X?" "What would you do if you knew you could not fail?" opens the Options stage in a way that "what are your options?" often does not. The framework gives you the stages; the skill is in the questions inside them.

The Relatives

Everything that matters about coaching is in GROW and in the discipline of asking rather than telling; the rest of this section is range, not foundation. The relatives below are variations you borrow from when GROW runs out of reach, and the chapter's own epistemic note is the reason to hold them lightly: the evidence says the method barely moderates the effect, so the schools' decades of differentiation matter far less than relationship and question quality. Read the table for when to reach past GROW, and the notes after it only if you want the texture.

Approach The question it takes most seriously Reach for it over GROW when
GROW (Whitmore) "What do you want, and what will you do toward it?" The default, there is a specific goal to tackle
Co-Active (Whitworth and colleagues) "Who are you trying to become, beyond solving this?" The issue is the person, not a discrete problem
Solutions-Focused (de Shazer, Berg) "When is the problem already not happening?" Reality keeps spiralling into the problem and stalls
Humble Inquiry (Schein) "Am I asking from real curiosity, or do I already know the answer?" You catch yourself asking questions you already know the answer to
Coaching Kata (Rother) "What do you expect to happen, and when will you check?" The work is improvement toward a defined target condition

Co-Active (Whitworth and colleagues) starts from the stance that the person is already creative, resourceful, and whole, and attends the person rather than the task. Its useful additions over GROW are three questions: whether the goal is even theirs (Fulfilment), whether the single perspective they are treating as reality is the only one available (Balance), and whether the work is to stay with a hard experience long enough to learn from it rather than rush past it (Process). It also names the inner critic as the Saboteur, a part to be examined rather than believed, whose claims are in effect the testable big assumptions of Immunity to Change.

Solutions-Focused (de Shazer, Berg) refuses to study the problem at all, on the bet that you do not need to understand a problem to build a solution, which puts it in direct tension with GROW's insistence on an honest Reality stage. It hunts for what is already working, through the exception question, the miracle question (if you woke and this were solved but did not know, what is the first small thing you would notice), and the scaling question (zero to ten, what makes it a four and not a two). Its risk is sharp: aimed at a real structural problem, a broken incentive or an untenable role, its relentless "when is it already better" reads as gaslighting, and it never reaches the mechanism, which is Immunity to Change's territory. The two sit at the chapter's poles, and knowing which a person needs is a large part of the developmental skill.

Two of the five are not quite peers. Schein's Humble Inquiry is less a framework than a stance underneath all of them, the discipline of asking from genuine curiosity rather than asking questions whose answer you already have in mind. The Coaching Kata's question is the predict-and-check loop from the introduction's shared spine; it lives in the operating chapter as the engine of the Improvement Kata, and from here it is simply coaching with the question set clamped to one domain.

All this divergence then runs straight into the entry's own epistemic note: method does not moderate the effect. The schools spent decades differentiating, and the evidence says the differentiation matters far less than relationship and question quality. So the practical use of the relatives is to widen your range. More depth when GROW stays too shallow, exceptions when Reality spirals, the Saboteur when the block is an inner critic.

How to use it

Notice first whether the situation is even a coaching situation. Coaching fits when the person has the capability to find a good answer themselves and the value is in helping them reach and own it. It does not fit when they genuinely lack the knowledge, in which case teaching is honest and coaching is a frustrating game of guess-what-I-am-thinking.

With a new relationship, contract before you coach. Name that you will mostly ask, why, and that you will say so when you switch to giving your own view.

Hold the Reality stage longer than feels comfortable. The most common failure is rushing from Goal to Options without an honest account of the current situation, which produces plans built on the story the person told themselves rather than on what is actually happening.

Make the person generate the Options. If you supply them, you have switched from coaching to advising, and the commitment that coaching exists to produce evaporates. Several weak options the person generated are more useful than one strong option you handed over.

End on a concrete Will, and check the genuine level of commitment rather than the polite one. "On a scale of one to ten, how likely are you to actually do this?" and then, if the answer is below eight, "what would make it a nine?" surfaces obstacles the rest of the conversation missed.

What well-applied coaching looks like

The person does most of the talking, and the coach's turns are mostly questions rather than suggestions wearing question marks.

The Reality stage produced something the person had not articulated before, rather than confirming what they already believed.

The person leaves owning a next step they set themselves and acts on it, which is the commitment coaching exists to produce rather than the compliance that follows advice.

Anti-patterns

Advice in question form, the leading question that steers toward the answer the coach already has in mind. People recognise it immediately and it produces compliance rather than commitment.

Skipping Reality, rushing to options and action before the current situation is honestly mapped, which builds the plan on a fiction.

Coaching when teaching is needed, withholding knowledge the person genuinely lacks in the name of letting them find it themselves, which is just frustrating.

Framework as ritual, walking the four letters mechanically while asking weak questions, which mistakes the scaffold for the skill.

Coaching upward without consent, deploying coaching questions on a peer or manager who has not asked to be coached, which reads as manipulative or condescending.

Reaching for depth that was not invited, running Co-Active-style values or Saboteur work on someone who came with a discrete problem and a contract for help with it, which is how the developmental tools earn their reputation for overreach.

When coaching is the wrong tool

When the person lacks the capability or knowledge to reach a good answer, where honest teaching or direction serves them better.

When the situation is an emergency, where there is no time for the person to work it through and someone needs to make a call.

When there is a genuine power asymmetry that makes the questions feel like a test rather than an offer of help.

When the real issue is structural or interpersonal rather than within the person. Coaching an individual to cope better with a broken incentive or a damaged relationship can quietly shift the burden onto them for a problem that is not theirs to solve, which is the developmental mode's version of treating a system problem as a personal one.

When there is no felt goal or no felt gap yet. A person who cannot say what they want, or who sees no problem to work because the feedback that would reveal one is missing, is in the orienting register that precedes coachable work, treated in this chapter's closing section. Coaching has nothing to take hold of until that groundwork is done.

Coda: Before there is a problem to work

Every tool in this chapter assumes a problem to be worked. The difficult conversation needs something hard both parties can name; Immunity to Change needs a commitment the person has tried and failed to keep; coaching needs a goal, or at least a gap between where the person is and where they want to be. Without that, or a rough sense of which way is up and a willingness to look the frameworks can be applied correctly but nothing will move. Two common ways it can fail:

The first is the person who says, in effect, I do not know what I want. The direction is missing, not the willingness, and the tools stall because they all need a target. Asking harder what the person wants won't work because there's no signal in that question yet. Instead the move is to lower the cost of smaller moves and building the habit of noticing which ones pull, mirroring the actions for Cynefin in the chaotic mode on a personal level. The direction is not found by introspection, but generated by action and then noticed.

The second is the person for whom everything is fine. Here there is no felt gap at all, because the feedback that would reveal one is missing, and from inside their model there is simply no problem to work. The Immunity to Change entry touches this when it notes the method needs someone willing to look, but here they are not yet willing, because they cannot yet see.

Why the feedback is missing decides the move. Sometimes nobody has ever told them, and one clear, specific, kind observation (here is how this lands, which you cannot see from where you sit) creates the workable gap. Sometimes the feedback has been there and has not landed, which is closer to Immunity to Change, because an "everything is fine" held against contrary signals is usually protecting something. But it still needs them willing to look, so build readiness rather than mapping their immunity on day one. And sometimes they are right from where they stand, and the misalignment is that others hold a standard nobody has stated, in which case the missing thing is the organisation's feedback loop, the intervention is structural, and coaching them to "be more self-aware" is only shifting the burden without resolving the gap.

Both versions lack the same thing. The person who cannot find a direction and the person who cannot see a gap are missing the signal that comes only from contact with the world and honest feedback on the result. In the first it is present but untuned; in the second it never receives input. The orienting work here is the work of supplying that feedback, or the contact that produces it, before any framework has something to bite on.

The Trusted Advisor

The 2000 book by David Maister, Charles Green, and Robert Galford, with the work Green carried forward afterward in Trust-Based Selling and, with Andrea Howe, The Trusted Advisor Fieldbook, is the developmental mode's account of a precondition the other tools assume: that the person in front of you trusts you enough to let you work. A difficult conversation, a coaching contract, an immunity map all need standing, and this is the one well-developed treatment of how that standing is earned rather than assigned. It belongs in the developmental mode because its subject is the relationship itself, a claim about trust between people, not about analysis or execution. Its native setting is the external advisor and the client, and the reading that matters here is the transfer of that machinery to the internal case: how anyone working sideways or upward, with real expertise and no automatic seat, becomes part of the room.

Epistemic status. This is practitioner lore, seasoned consulting experience turned into three clean heuristics rather than tested theory. The models have strong face validity and are widely reported as useful, and Green's firm has gathered self-assessment data on the trust equation over many years, but that is a description of how people rate themselves, not evidence that the equation predicts who gets trusted. It earns a little more than its provenance because the decomposition rhymes with the empirically grounded model of organisational trust, where trustworthiness resolves into ability, benevolence, and integrity, with credibility tracking ability and the reliability-intimacy-low-self-orientation cluster tracking the other two. Treat the equation as a reliable diagnostic vocabulary, treat the self-orientation insight as the durable contribution, and treat the staged process and the principles as useful structuring devices rather than measured findings.

The core ideas

The trust equation. Trustworthiness breaks into four parts: credibility (your words, what you know), reliability (your actions, whether you do what you say), intimacy (the safety someone feels confiding in you), and self-orientation (how much your attention is on yourself rather than on them). The first three sit in the numerator and self-orientation sits in the denominator, which is the whole point, because it governs the rest. High self-orientation, the visible interest in your own gain, your own cleverness, your own discomfort, divides down everything the numerator earns. The practical upshot is that the largest lever on being trusted is lowering the denominator, and that most people pull the wrong one.

The denominator is the lever. People trying to earn a seat over-invest in credibility, in proving they are clever enough to be there, and under-invest in lowering self-orientation, in making the other person's problem genuinely more salient to them than their own standing. You get into the room by caring less about being in it. This is the coaching entry's discipline, that the moment the coach steers toward their own answer the commitment drops, raised from the scale of one conversation to the scale of a relationship.

Trust is built in conversations, in an order. Green's staged process, engage, listen, frame, envision, commit, is a claim that trust accumulates through a sequence whose early steps are the ones people skip. The neglected, high-leverage moves are listening past the presenting problem and framing the real issue, including the awkward one nobody has named, before any solution is floated. Rushing to the answer, what Green elsewhere calls the answer trap, is the characteristic failure of the capable.

Trust is reciprocal, and someone goes first. Trust requires a risk when we are risk-averse, and it is mutual: there is the trustworthiness you carry and the trusting the other party has to extend, and the relationship only forms when someone takes the first risk. The advisor usually has to be the one, through a candid observation, a piece of genuine disclosure, the naming of the thing in the room. This is the actionable core of becoming part of the room: standing is granted in response to a risk taken, not accumulated through credentials and waited out.

The principles behind the moves. At the organisational scale the same idea resolves into four habits: other-focus, collaboration as a default rather than a tactic, a relationship-over-transaction horizon, and transparency. They are worth noting because they feed straight back into the equation, transparency in particular both raising credibility and lowering self-orientation by keeping no secrets.

How to use it

Score your own equation for a specific relationship. Take one stakeholder and rate, as they would, your credibility, reliability, intimacy, and self-orientation. The exercise almost always locates the gap in the denominator or in intimacy rather than in credibility, which is where the instinct says to work.

Work the denominator first. Before adding evidence of your competence, remove the signals of self-interest. The fastest way to lose a room is visible concern with your own standing in it.

Slow down at listen and frame. Resist moving to a solution until the real problem, including the part no one has said aloud, has been named and recognised. The naming is often the moment trust forms.

Take the first risk deliberately. Offer the candid read, or the honest "here is what I am unsure of," that invites the other party to reciprocate, rather than waiting until you feel trusted to speak freely.

Spend the relationship slowly. Treat standing as a stock built over many interactions, not a balance to draw down to win a single point, which is the relationship-over-transaction principle applied in the moment.

What well-applied trust-building looks like

You are sought out rather than tolerated, brought the real problem instead of the sanitised version, and asked before decisions rather than informed after them.

You can name an uncomfortable truth and have it land, because the relationship can carry weight the credentials alone could not.

Your self-orientation is low enough that you will, visibly, tell someone something against your own short-term interest, which does more for trust than anything you can claim about yourself, and is the hardest to fake.

Anti-patterns

Trust as technique. Running the moves instrumentally, to manufacture standing you then mean to use, is high self-orientation wearing intimacy's clothes, and people feel the calculation even when they cannot name it. The safeguard is built into the model: the one term you cannot fake into existence is the denominator, because faking it is self-orientation.

Credibility-stacking. Trying to earn the room by piling up evidence of expertise, working the numerator while the denominator quietly sinks the score.

Intimacy as oversharing. Mistaking the safety someone feels confiding in you for a licence to disclose, which forces a closeness the relationship has not reached and reads as a different kind of self-orientation.

The answer trap. Skipping listen and frame to deliver the solution, which is how capable people signal that their own competence matters more than the other person's actual problem.

When the Trusted Advisor is the wrong tool

When you already have standing. Among people who trust you, running trust-building is noise, and the chapter's actual tools, the difficult conversation and the coaching stance, are what the situation needs.

When the deficit is competence, not trust. The numerator is in the equation for a reason, and low self-orientation with nothing to offer is merely pleasant. Sometimes the honest answer is that you have not yet earned the seat on the merits.

When the context is far from its origin, or the power is too steep. The machinery is tuned to advisory and selling relationships, and the transfer to internal, lateral, and upward standing is real but needs translation. In steep hierarchies or bad-faith settings (the closing chapter's gaps), the candour that trust-building runs on is exactly what is unsafe, and no amount of it overcomes a structure that punishes the first risk.

When the problem is structural. Earning trust will not fix a broken incentive or an untenable role, and trying to relate your way through a system problem is the developmental mode's version of treating a structural fault as a personal one.

Schein's Model of Organisational Culture

Edgar Schein, a social psychologist at MIT Sloan, gave organisational culture its most-used academic model, set out first in a 1984 article and then in Organizational Culture and Leadership across five editions before his death in 2023. It has two halves worth separating from the start: a way to read a culture, and an account of how leaders, mostly without meaning to, build one.

Epistemic status. A conceptual lens rather than a validated theory, in the same family as Cynefin: widely taught, durable over decades, and convincing to anyone who has watched stated values diverge from how a place actually behaves, but not predictive and not measured. Its sharpest weakness is internal. The deepest level, the basic assumptions, is defined as unconscious and can only be inferred from the visible levels, which makes it hard to falsify and easy to read in a circle, explaining a behaviour by an assumption you read off that same behaviour. The model also treats a culture as more single and unified than most large organisations are; subcultures are the standing caveat. Trust the three levels and the embedding mechanisms as a vocabulary and a structure for looking, and hold any confidence that you have read the assumptions correctly more loosely.

The core ideas

Culture, for Schein, is the set of shared assumptions a group learned while solving its problems, assumptions that worked well enough to be passed to newcomers as the correct way to perceive, think, and feel. The definition does two useful things. It makes culture a product of a group's history rather than of its current personalities, and it explains why culture is invisible to the people inside it: once an assumption is taught as simply correct, it stops looking like a choice, the way water does not look like anything to a fish.

The spine of the model is three levels, sorted by how visible each is to an outsider.

Artifacts are everything you can see, hear, and feel: layout, tooling, dress, rituals, language, how meetings run. Easy to observe, treacherous to interpret, because the same visible thing can sit on top of opposite assumptions.

Espoused values are what the organisation says about itself: stated strategy, goals, the values on the wall. Real and worth knowing, but claims, sometimes describing how the group works and sometimes only how it hopes to.

Basic underlying assumptions are what Schein calls the essence of culture: the taken-for-granted beliefs about people, time, hierarchy, truth, and what counts as good work that actually drive behaviour. They are the hardest to see and the place durable change has to reach.

The point of the three levels is the relationship between them. You cannot observe assumptions directly, so you read upward from what you can see, and the gap between the espoused values and the artifacts is the cheapest route down to the assumption underneath. A place that says it prizes innovation while reliably punishing the people whose bets fail is telling you, in that gap, what it actually believes about risk.

The second half of the model is how a culture forms, and how leaders embed one whether they intend to or not. Schein divides the levers in two. The primary mechanisms carry most of the signal: what leaders pay attention to and measure, how they behave in a crisis, how they spend money, what they reward, and whom they promote or remove. The secondary mechanisms reinforce whatever the primary ones establish: structure, procedures, rituals, physical space, the stories a place tells about itself, formal statements of values. When the two disagree, a stated value of teamwork against a bonus paid for individual heroics, the group believes the reward and ignores the statement. A culture is built far more by what leaders do under pressure than by what they declare.

Finally, a large organisation is never one culture. It splits into subcultures, and Schein names three that recur, roughly the people who run the work, the people who design the systems, and the executives who answer to finance and the outside. A great deal of what gets blamed on personalities or bureaucracy is really two subcultures holding different assumptions about risk or quality, and failing to align.

How to use it

Read a culture from the outside in, in order, and resist interpreting until you have looked.

First, collect artifacts as an observer rather than a judge. Sit in meetings, read the onboarding docs, notice the language and what the space is arranged around, and write down what you see before deciding what it means.

Second, gather the espoused values from the official sources: the strategy deck, the values page, what leaders say at the all-hands.

Third, work the gaps. List the places where the artifacts contradict the espoused values. Each contradiction points at an assumption, and the contradictions are the actual data, not an embarrassment to explain away.

Fourth, test a candidate assumption against a critical incident rather than by asking. People cannot report their own assumptions, so "what do we assume here" just returns the espoused values, while "walk me through the last time a release broke in production, who did what" surfaces the real assumptions about blame and ownership inside the story.

To change a culture, name the assumption you want to move, then change the primary mechanisms, what gets measured, what gets rewarded, how the next crisis is handled, who gets promoted. Restating values does almost nothing on its own, because the group is reading the mechanisms, not the poster. Changing visible artifacts can work upward over time, but slowly, and only when the mechanisms back it.

And before treating a recurring collision between two groups as a personality problem, check whether you are looking at two subcultures with different assumptions, which is a more tractable thing to work on than two difficult people.

What well-applied Schein looks like

A change effort moved measurement, rewards, or who got promoted, and the assumption it targeted visibly shifted, rather than a values poster going up while nothing changed.

A recurring collision between two groups got re-read as two subcultures with different assumptions, and working the assumptions defused it where treating it as a personality clash had not.

A real assumption surfaced from a specific incident, walking through the last time a release broke rather than reading the values page, and it explained behaviour the espoused values could not.

Anti-patterns

Mistaking artifacts for culture, redesigning the office or rewriting the values and calling the culture changed, while the assumptions underneath go untouched and quietly win.

Reading the assumptions off too fast, especially from outside, and projecting your own culture's meaning onto another's artifacts.

Treating the culture as one thing, designing a single intervention for an organisation that is really several subcultures, so it lands in one and bounces off the rest.

Using the depth as an excuse, treating assumptions as too deep and unconscious to touch. They are reachable through the visible levels and movable through the mechanisms; the depth means work carefully, not give up.

When Schein is the wrong tool

When the problem is not actually cultural. A repeated behaviour can come from a broken incentive, a missing skill, or one bad structure, and excavating assumptions when the answer is a misaligned bonus is slow and expensive.

When you need a quick decision. The model works over the months and years on which assumptions form and shift, so for anything urgent it points at a layer you cannot move in time.

When you lack the access or standing to reach the deeper levels. Surfacing assumptions takes time inside a group and enough legitimacy to ask about real incidents honestly. From outside, or early, you will mostly collect artifacts and espoused values, and should not pretend otherwise.

Coda: Schein, Westrum and the stage-model writers

Schein is the descriptive anchor of a larger body of writing that takes the group, not the individual, as its unit, and most of the rest of it differs from him in one specific way: it adds a ladder. Schein tells you how to read a culture and how it forms, and stays neutral about which culture is better. Writers downstream of him add a direction, one with a mechanism and others with stages forming a ladder.

Ron Westrum studied how organisations in high-consequence fields (e.g. aviation, medicine) caught problems before they became a disaster. His definition of a good culture: one that gets the right information to the right person in the right form at the right time. With this reading culture is not values or vibe, it is a property of how information moves through a group, and what matters most is how the organisation treats whoever carries bad news.

Shoot a messenger once and you have not disciplined one person, you have taught everyone watching to stop reporting, which strips the organisation of its ability to see the next problem coming. That is a feedback loop of exactly the kind this part of the collection describes: suppressed information produces undetected problems, which produce failures, which produce more blame, which produce more suppression. Westrum's term for what a healthy culture preserves is requisite imagination, the capacity to anticipate what might go wrong, which a generative culture cultivates and a pathological one destroys, because the people with that imagination are the ones it punishes for voicing it.

His model is usually reduced to three culture types along six dimensions of how information is handled:

Pathological (power) Bureaucratic (rules) Generative (performance)
Information hoarded may be ignored actively sought
Messengers shot tolerated trained and welcomed
Responsibility shirked narrow, "not my department" shared
Cross-team cooperation discouraged allowed but neglected rewarded
Failure scapegoating finding who broke which rule inquiry
Novelty crushed creates problems implemented

You do not assess these by asking, since people report their espoused values rather than their assumptions. You read them off how the last serious incident was actually handled, which is the same critical-incident move Schein's method uses. The caveats are the ones every typology carries: real organisations are mixtures, subcultures sit at different points, and the measure is survey-based, so the three types are a strong diagnostic vocabulary rather than boxes an organisation sits cleanly inside.

The stage-model writers take the same group-level unit and add an axis of maturity climbing from worse to better.

Three that often come up. Patrick Lencioni's Five Dysfunctions of a Team stacks trust, conflict, commitment, accountability, and results into a pyramid in which each layer fails without the one below it. It is fable-based and has essentially no independent validation, but is useful as a checklist of the order in which those capacities have to be built. Tribal Leadership, by Logan, King, and Fischer-Wright, keys five cultural stages to the language a group uses about itself, from "life sucks" through "I am great and you are not" to "we are great," and casts leadership as nudging a tribe up a stage; it rests on one proprietary study and sits in the Spiral Dynamics family. Frederic Laloux's Reinventing Organizations is the far edge, a colour-coded developmental ladder climbing to a self-managing end-state, built on integral theory, with exemplar companies chosen to fit the thesis and a self-management track record that is mixed once you look past the showcases.

The portable parts are the diagnostics, not the ladders. They are fine prompts for what to check in a struggling team, and how a group talks about itself tells you something about its assumptions, which loops straight back to reading artifacts. What travels less well is the developmental ladder itself. Ranking whole organisations on a maturity scale imports a stage-theory problem: the stages are intuitive, but the evidence for them is thin and largely produced by the people selling the model, and the claim is harder to defend at organisational scale than at the individual one. Read the culture with Schein, Westrum when the question is whether information is flowing, borrow the stage writers' checklists as prompts, and treat their ladders as suggestive rather than as maps you can navigate by.

Lightweight tools

As in the other two chapters, a family of small devices sits alongside the substantial frameworks. The developmental versions are mostly about the fine grain of working with people: giving feedback that lands, naming what is happening in a conversation, and noticing what is making someone defend rather than engage. Several are the heavier tools above distilled to something useful for the moment.

The clearest of these is the set of moves underneath Nonviolent Communication, which are worth far more than the method's stilted phrasing and work without it. There are four. Separate observation from evaluation: "in the last three meetings you arrived after the start" is an observation, "you are unreliable" is an evaluation, and the first invites engagement where the second invites defence. Name the actual feeling rather than a thought dressed as one, so "I feel hurt" rather than "I feel that you do not respect me." Find the need underneath the position, since a demand for an apology is usually a strategy for meeting a need such as acknowledgement, and the need is the thing to work with. And make a request that can be refused, which is what separates it from a demand. The discipline that makes these work is to practice them on the listening side first, hearing the observation, feeling, and need in what someone else says, and to carry them in your own plain language rather than the formula, which in many engineering and European workplaces reads as alien and costs credibility.

SBI, Situation-Behaviour-Impact, is the lightest feedback structure there is: name the situation, describe the specific behaviour without interpretation, then state its impact. Its whole value is keeping feedback concrete and off the person's character, "in yesterday's review you cut Maria off twice and the room went quiet" rather than "you were dominating." It pairs directly with the observation-versus-evaluation move above.

SCARF, from David Rock, is a checklist of five things people feel threatened over when a change lands badly: Status, Certainty, Autonomy, Relatedness, and Fairness. It rests on no neuroscience worth leaning on, but as a prompt for "why is this reasonable change provoking so much resistance" it is fast and often right, and it points at which reassurance to offer. Treat it as a set of questions to ask, not a model of the brain.

The Johari window sorts what is known and unknown about a person, split between what they see and what others see, into four quadrants, and its use is to locate blind spots, the things others see that the person does not, which is exactly the territory feedback has to cross. It is a way of seeing why a piece of feedback is hard to hear, not a measurement of anything.

Two relational-adjacent tools live in the diagnostic chapter's lightweight section and are worth reaching across for when the trouble is between people: the Ladder of Inference, for climbing back down to the data each person is actually working from when a conversation has gone strange, and the force-field sketch, for laying out what is pushing for a change against what is resisting it. Both are diagnostic in form and developmental in use.

The same discipline applies as in the other chapters: the format is a prompt, not the work. A well-run SBI delivered with no willingness to be wrong about the impact is still a verdict in a friendlier font.

Comparing the Modes

One problem, three modes

Take a concrete situation. An engineering team that was effective two years ago now ships slowly, misses commitments, and has lost two senior people in six months. The remaining engineers are disengaged. Leadership wants it fixed.

Each mode sees a different problem.

The diagnostic mode asks what is actually going on. A Current Reality Tree would gather the symptoms and push downward to what causes them jointly, and might land on a measurement system that rewards individual ticket throughput, which quietly discourages the collaboration and maintenance work that keeps the team fast. A systems lens would likely name this as Tragedy of the Commons stacked with a reinforcing decline loop: attrition raises the load on those remaining, which deepens disengagement, which drives more attrition. The diagnostic verdict is structural. The incentive system is producing the behaviour, and changing what gets measured would change the behaviour over time.

The developmental mode asks what is going on between and inside the people. The two departures may have followed a specific breach of trust. The disengagement the diagnostic frame reads as a measurement artefact may be grief and resentment. An Immunity to Change lens on the manager might find that their inability to delegate, which starves the team of ownership, protects a competing commitment to never being blamed for a failure. A Schein reading sits between the structural and the personal. The team's culture has learned an assumption, that here you survive by protecting your own throughput, taught by what leadership rewarded when it was under pressure, and the disengagement is that culture protecting itself.

The operating mode asks how to run the team while it is being fixed. It does not wait for the diagnosis. A Grove lens asks what the team is actually producing and whether the manager is spending time on leverage or on inputs. A Team Topologies lens asks whether the senior departures may have collapsed the team below the cognitive load it can hold, leaving a complicated subsystem no one now owns. The operating verdict is about sequence. You cannot analyse or repair a team that is actively bleeding, so first stabilise, then create the conditions for the other two modes to work.

Where they agree, and where they conflict

All three reject the naive reading, that the team is simply underperforming and needs pressure or replacement, and all three locate the cause in the situation rather than in the character of the individuals. Important, because the instinct under pressure is the opposite, to find the underperformers and act on them.

The genuine conflict is about what is fundamental and therefore what comes first.

Mode Treats as fundamental Would act first by Predicts this failure if ignored
Diagnostic The structure and incentives Changing what gets measured and rewarded Conversations produce warmth the system then grinds back down
Developmental The trust damage and human strain Repairing trust, surfacing what the departures meant Structural change is received as one more thing done to them, and fails
Operating Neither, until the team can sustain an intervention Stabilising and protecting the remaining people You buy time without fixing anything

Change the incentive while the team is in acute distress and the developmental mode predicts rejection. Run repair conversations while the incentive still rewards the destructive behaviour and the diagnostic mode predicts the system wins. Stabilise and stop there and both other modes predict you have only delayed the collapse.

The resolution is not to pick one but to recognise that the conflict is about ordering under constraints, which is itself an operating judgement. In most versions of this situation the workable sequence is operating first, then developmental, then diagnostic: stabilise and protect, then repair the trust and surface what the departures really meant, then change the structure once people can receive the change. But the ordering depends on specifics. If the structural incentive is so toxic that it actively prevents stabilisation, you may have to change it first and accept that the change lands badly. The judgement about sequence is where the real skill lives, and no mode supplies it from outside. Hold that thought; the last section is about exactly that judgement and where it actually comes from.

Choosing a mode: a reference

The introduction gave the reasoning behind the choice. This is the compressed lookup, organised by the symptom you notice first. Each row names the mode that holds the binding constraint, the one to act on first, not the only one in play.

Symptom you notice Mode Reach for
"This keeps happening and I do not understand why" Diagnostic A Cloud for a recurring two-sided conflict; a systems archetype if it survives fixes; SPC first if it shows up as a metric moving
"I understand it and we still cannot move" Developmental or operating A difficult conversation if it sits between specific people; Immunity to Change if it is one person's stuck pattern; leverage and saying no if it is the organisation's inability to focus
"The team is exhausted, scared, or grieving" Developmental, then operating Stabilise first, then address the human layer. A diagnostic workshop here reads as cruelty
"We are busy and producing nothing" Operating Grove. Write down the production function and find the real leverage
"We are not even asking the right question" Diagnostic (strategy) Rumelt if the question is whether a coherent strategy exists; Wardley if it is positional
"We cannot tell whether to analyse or just try things" Pre-sort Sort with Cynefin before anything else
"This decision has to happen today" Operating A premortem, then make the call and revisit when there is space

Two rows are not extra modes. "Diagnostic (strategy)" is the diagnostic mode aimed at a strategy question, and the pre-sort row is the step before any mode applies rather than a mode of its own. With those set aside, the consistent meta-move sits above the table: name which mode the current blocker calls for before reaching for a tool, and watch for the bias of your own preferred mode, since the tool you reach for first is usually the one you are most comfortable with rather than the one the situation needs.

What all three modes miss

While covering a broad range of structure, the three modes still share blind spots:

They assume good faith. Every tool assumes the people involved want, at some level, for things to improve. The Cloud assumes a shared objective; the difficult conversation assumes honest engagement; Grove assumes the organisation is trying to produce something real. None of this holds in organisations captured by bad actors, locked in zero-sum politics, or in decline where the rational individual move is to extract and leave. Applying good-faith tools to a bad-faith situation does not just fail, it hands the bad actors a more sophisticated vocabulary.

They under-serve power. The tools mostly assume participants can speak freely and that the analysis is not constrained by what is safe to say. In steep hierarchies the shared objective is whatever the powerful person says it is, and the difficult conversation is only as honest as the junior party can afford. None of the modes has a real account of working within power structures that distort the analysis itself.

They are weak on the genuinely novel. All three are strongest where there is precedent, stable causation, or a known pattern to match. For a truly unprecedented situation the tools offer a starting vocabulary but not answers, and the right move is closer to research than to diagnosis.

They cannot supply judgement. Every tool tells you how to think about a problem once you have decided which kind it is and which mode to bring. None tells you, from outside, how to make that decision. The collection can sharpen judgement by giving it more options and clearer names, but it cannot replace it.

And then a fifth, which the worked example above ran straight into when it reached the question of sequence: the tools cannot build the expertise that uses them well.

Knowing which mode a blocker calls for, sensing in the first thirty seconds of a stuck meeting that this is identity and not incentive, feeling that a diagnosis is about to fail before you can say why, that is tacit expertise, and the literature on how it actually forms is fairly clear that it does not come from frameworks. It comes from reps in an environment that gives fast, honest feedback. Research on naturalistic decision-making describes how experts mostly do not compare options at all; they recognise a situation as typical of a pattern they have met many times and the workable action comes with the recognition. The same body of work on accelerating expertise, and the deliberate- and purposeful-practice tradition behind it, is consistent on the conditions: repeated exposure to varied real cases, predictions made before outcomes are known, and feedback quick and clear enough to correct the pattern-matcher rather than confirm it.

This frames what this collection can and cannot do. The frameworks are not the expertise. They are scaffolding for the reps, a way to make your reasoning explicit enough that the feedback, when it comes, lands on something specific and corrects it. A practitioner who runs a hundred stuck situations while naming the mode, predicting which intervention will move things, and checking honestly whether it did, will build the judgement the frameworks cannot contain. A practitioner who collects frameworks and never closes that loop will have a richer vocabulary for explaining, after the fact, why the thing that did not work was always going to. The tools earn their place only inside the loop that turns reps into judgement.

Where this collection sits

Beyond naming what the modes miss, it helps to see the shape of the overall box they are in.

One useful framing comes from the research on business expertise: that what expert operators actually carry is not a single skill but a model with three legs: operations, market, and capital. Operations is how you run and improve the thing. Market is the demand you sell into and your position against competitors. Capital is how the whole enterprise is financed and how it sits in the wider economic cycle. The interesting empirical claim attached to this is that the best operators understand all three and the way a change in one moves the other two, and that people who predict business outcomes poorly tend to have a persistent blind spot in one leg.

The entire collection lives in the first leg. Everything in these four chapters is about operations: running and improving an organisation. The market leg (what customers actually want, how competitive position shifts) and the capital leg (cash flow, unit economics, how financing constrains everything else) are real parts of what expert operators reason about, they interact constantly with the operations work these tools address, and are out of scope for this collection. Rumelt and Wardley brush against the market leg, but as diagnostic methods, not as the domain knowledge itself. The point of naming the other two legs is so that when an operations diagnosis keeps failing, you remember to check whether the real constraint was in the operations leg at all.

A closing stance

The argument across the collection has been that organisational problems call for at least three modes of work, and that each mode has tools suited to it.

It is also a simplification, drawn from one vantage and scoped to one leg of a larger picture, and it will serve until it does not. The signs that you have reached its edge are the gaps above: bad faith, steep power, genuine novelty, and underneath them the irreducible matter of judgement, the expertise that only experience in a high-feedback environment will build. When you reach those edges, the move is not to force the situation back into one of the three modes, but to recognise that you have walked off the edge of this particular map.

Knowing when the tool in your hand has stopped fitting the problem in front of you is the one skill the collection most wants to leave you with, because it is the skill every individual tool depends on and that no individual tool can teach. The collection can hand you the modes, the tools, and the vocabulary. Whether they turn into judgement is a matter of how many real problems you take through the loop, and how honestly you check the result.