Risk Management
Every project has risks. The question isn't whether things will go wrong — it's whether you saw it coming. Here's how to identify, assess, and plan for the unexpected.
$327 million, lost to a unit conversion
September 23, 1999. NASA's Mars Climate Orbiter approaches Mars after a 286-day journey. Mission control sends the burn command. The spacecraft fires its thrusters — and drops 170 kilometers too low. It skips into the atmosphere like a stone hitting water at the wrong angle. In seconds, it disintegrates.
Nine months of flight. Years of engineering. $327 million. Gone.
The cause? One team at Lockheed Martin calculated thrust in pound-force seconds (imperial). NASA's navigation team expected newton seconds (metric). Nobody caught it. Nobody checked. There was no process to verify that the two teams were speaking the same language.
This wasn't a freak accident. It was a risk that nobody identified. Someone could have asked: "Are we sure both teams use the same units?" That question would have cost nothing. Not asking it cost $327 million.
Every project carries risks like this. The difference between projects that survive and projects that crash isn't the absence of risk. It's whether someone was paying attention.
So what exactly is a risk?
Here's the formal definition: A risk is an uncertain event or condition that, if it occurs, has a positive or negative effect on a project's objectives.
Read that again. Positive or negative.
Most people hear "risk" and think: disaster, fire, failure. But risks come in two flavours:
| Type | Also called | Example |
|---|---|---|
| Negative risk | Threat | Your lead developer might quit mid-project |
| Positive risk | Opportunity | A competitor might exit the market, giving you their customers |
A threat is something bad that might happen. An opportunity is something good that might happen. Both are uncertain. Both require a plan. Most of this module focuses on threats — because that's where projects fail — but remember: good risk management also means spotting and exploiting opportunities.
Think of risk management like a weather forecast. You can't control whether it rains. But you can check the forecast, bring an umbrella, and move the outdoor wedding under the tent. The rain still happens. You just don't get soaked.
There Are No Dumb Questions
"If a risk is uncertain, why bother planning for it? Most risks never happen."
True — most individual risks don't materialise. But across a project with 20 identified risks, some will. It's like wearing a seatbelt: you probably won't crash today, but over a lifetime of driving, the odds catch up. Planning is cheap. Reacting to surprises is expensive.
"What's a 'positive risk'? That sounds like an oxymoron."
Imagine a major competitor suddenly goes bankrupt. That's an opportunity — you could capture their market share, but only if you're ready. Good risk management means having a plan to exploit lucky breaks, not just survive bad ones.
Finding the risks: how to identify what could go wrong
You can't manage risks you haven't identified. The first step is always the same: get everyone in a room (or on a call) and systematically ask, "What could go wrong?"
Here are the four most common techniques:
1. Brainstorming sessions. Get the team together. Set a timer for 30 minutes. No idea is too paranoid. "What if the API changes its pricing?" "What if our designer leaves?" "What if the client changes their mind about the core feature?" Write everything down. No filtering — that comes later.
2. SWOT analysis. Look at the project through four lenses:
| Helpful | Harmful | |
|---|---|---|
| Internal | Strengths | Weaknesses |
| External | Opportunities | Threats |
Your internal weaknesses and external threats are your risk goldmine. A team with no mobile development experience building a mobile app? That's a weakness that's also a risk.
3. Checklist reviews. Use a list of risks from previous projects. Walk through standard categories — budget, schedule, technical, people, vendor, regulatory — and ask: "Does this apply here?"
4. Expert interviews. Talk to people who've done similar work. The developer who built the last integration knows where the API broke. The designer who worked with this client knows about their approval process. Experience is the best risk radar.
**Brainstorm** — Cast a wide net. Quantity over quality. Filter later.
**SWOT** — Structured lens. Forces you to look at internal AND external factors.
**Checklists** — Learn from the past. Don't discover the same risk twice.
**Expert interviews** — Tap institutional memory. Veterans see risks newcomers miss.
The Risk Matrix: scoring what you've found
You've brainstormed 30 risks. Now what? You can't give equal attention to all of them. Some are likely and catastrophic. Others are unlikely and trivial. You need a way to prioritise.
Enter the Risk Matrix. It's beautifully simple:
Risk Score = Likelihood x Impact
Both are scored 1–5:
| Score | Likelihood | Impact |
|---|---|---|
| 1 | Very unlikely | Negligible |
| 2 | Unlikely | Minor — workaround exists |
| 3 | Possible | Moderate — plan adjustment needed |
| 4 | Likely | Major — significant damage |
| 5 | Almost certain | Catastrophic — project failure |
A "Likely" (4) x "Major" (4) risk scores 16. An "Unlikely" (2) x "Minor" (2) risk scores 4. You deal with the 16 first.
| Impact 1 | Impact 2 | Impact 3 | Impact 4 | Impact 5 | |
|---|---|---|---|---|---|
| Likelihood 5 | 5 | 10 | 15 | 20 | 25 |
| Likelihood 4 | 4 | 8 | 12 | 16 | 20 |
| Likelihood 3 | 3 | 6 | 9 | 12 | 15 |
| Likelihood 2 | 2 | 4 | 6 | 8 | 10 |
| Likelihood 1 | 1 | 2 | 3 | 4 | 5 |
Red zone (15–25): Mitigation plans now. Amber (8–14): Monitor closely, contingency ready. Green (1–7): Accept and watch.
Back to the Mars Orbiter: "Teams using different units" — Likelihood 3, Impact 5. Score: 15. Red zone. Someone would have verified unit consistency. One check. $327 million saved.
Score These Risks
25 XPRisk response strategies: what do you actually DO about it?
You've identified the risks. You've scored them. Now you need a plan. There are four standard responses — think of them as four tools in your toolkit.
✗ Without AI
- ✗**Avoid** — Change the plan to eliminate the risk entirely
- ✗**Mitigate** — Reduce the likelihood or impact
- ✗**Transfer** — Shift the risk to someone else (insurance, outsourcing)
- ✗**Accept** — It's low enough to live with
✓ With AI
- ✓**Exploit** — Make sure it happens
- ✓**Enhance** — Increase the likelihood or impact
- ✓**Share** — Partner with someone who can capitalise
- ✓**Accept** — If it happens, great; don't chase it
Let's make each one concrete with the same project — building a mobile app:
Avoid. "Our team has never built for iOS." Avoidance: Build a web app instead — eliminate the risk by changing the plan. You're not reducing the risk; you're removing the road it lives on.
Mitigate. Same risk, different response: Hire an iOS consultant to review code every sprint. You can't eliminate inexperience, but you catch mistakes earlier and cheaper.
Transfer. Same risk again: Outsource iOS development to a specialist agency. The risk still exists — but now it's the agency's problem. (You've traded a technical risk for a vendor risk.)
Accept. "Small chance Apple rejects the app on first submission." Low likelihood, minor impact (resubmit, lose a week). Don't spend money mitigating a risk that's unlikely and manageable.
There Are No Dumb Questions
"How do I decide between the four strategies?"
Start with the risk score. Red zone → Avoid or Mitigate. Amber → Transfer or Mitigate. Green → Accept. Also consider cost: if mitigation costs more than the risk itself, acceptance is smarter.
"Can I use more than one strategy?"
Absolutely. "We'll hire extra QA (mitigate), but if we still find critical bugs at launch, we'll delay one sprint (accept)." Layering strategies is smart risk management.
The Risk Register: your living document
All those risks, scores, and strategies need to live somewhere. That somewhere is the Risk Register — the single most important risk management artifact.
A risk register is a table (spreadsheet, Notion doc, Jira board — the tool doesn't matter) tracking: ID, risk description, category, likelihood, impact, score, response strategy, mitigation plan, owner, status, and trigger (the early warning sign).
Here's what a few rows look like:
| ID | Risk | L | I | Score | Strategy | Owner |
|---|---|---|---|---|---|---|
| R-001 | Lead dev leaves mid-project | 2 | 5 | 10 | Mitigate — cross-train a backup | CTO |
| R-002 | Client changes scope after sprint 3 | 4 | 4 | 16 | Mitigate — change request process | PM |
| R-003 | Third-party API goes down during launch | 2 | 4 | 8 | Transfer — SLA with vendor | Tech Lead |
| R-004 | Office internet outage | 3 | 2 | 6 | Accept — team can work from home | PM |
The register is a living document. It gets updated every sprint, every week, every risk review meeting. New risks get added. Old risks get closed. Scores change as the project evolves. A risk that was unlikely in month one might become almost certain by month three.
Build a Risk Register
25 XPMonitoring risks: watching for trouble
Identifying risks at the start is not enough. Risks evolve. New ones appear. You need ongoing monitoring.
Triggers. Every risk should have a trigger — a specific, observable event that tells you it's materialising. "Lead dev seems unhappy" is vague. "Lead dev updated their LinkedIn and asked about the notice period" is a trigger.
Early warning signs — the canary in the coal mine:
- Sprint velocity dropping two sprints in a row → schedule risk rising
- Client responding to emails slower → scope change incoming
- Vendor missing minor deadlines → major miss likely
- Team working overtime three weeks straight → burnout risk rising
Regular risk reviews. Every two weeks (or every sprint), spend 15 minutes on the register: new risks to add? Old risks to close? Scores changed? Triggers fired? Mitigation plans on track?
This isn't bureaucracy. It's the difference between seeing the iceberg a mile away and hearing the scraping sound against the hull.
The six risks that kill projects (and what to do about each)
Across industries and project types, the same risks show up again and again. Here's the hit list:
1. Scope creep. The client keeps adding "just one more thing." Mitigation: A formal change request process — every addition gets documented, estimated, and approved with its impact on timeline and budget.
2. Key person departure. Your best developer gets a better offer. Mitigation: Cross-training, documentation, no single point of failure. If only one person understands the codebase, that's a red-zone risk right now.
3. Vendor failure. The API you depend on goes down. The outsourced firm delivers late. Mitigation: SLAs with penalties. Backup vendor identified.
4. Technology doesn't work. The framework can't handle the load. The integration is more complex than estimated. Mitigation: Proof of concept early. Technical feasibility testing before committing to the full build.
5. Budget cuts. Finance announces a 20% budget reduction mid-project. Mitigation: Prioritised backlog — cut scope, not quality.
6. Timeline pressure. The launch date moves up by a month. Mitigation: Scope negotiation. "We can hit that date if we defer these three features to v2."
Risk tolerance: why startups and banks see risk differently
Not every organisation responds to risk the same way. A startup and a bank look at the same risk and make opposite decisions.
✗ Without AI
- ✗Move fast, break things
- ✗Failure is learning
- ✗Accept high risk for high reward
- ✗Minimal process, maximum speed
- ✗'We'll figure it out as we go'
✓ With AI
- ✓Move carefully, break nothing
- ✓Failure is unacceptable
- ✓Minimise risk even at the cost of speed
- ✓Heavy process, maximum safety
- ✓'We'll plan every contingency before we start'
This is called risk tolerance (or risk appetite) — the amount of risk an organisation will accept in pursuit of its objectives.
A startup building a social app might accept the risk of server crashes at 100K signups — because the real risk is that nobody signs up at all. A bank processing $2 billion daily will never accept that.
As a PM, understand your organisation's tolerance and manage accordingly. Don't bring a bank's process to a startup, or a startup's appetite to a bank.
- Risk-averse: Avoid or mitigate almost everything. Heavy documentation. (Banks, healthcare, aerospace)
- Risk-neutral: Mitigate the big stuff, accept the small stuff. (Most mid-size companies)
- Risk-seeking: Accept most risks, move fast, fix problems as they appear. (Startups, R&D labs)
There Are No Dumb Questions
"What if my boss has a different risk tolerance than the company?"
This happens constantly. In practice, you manage to the person who signs off on your project. Document the risk approach and get explicit buy-in: "Here's our register. Here are the ones we're accepting. Are you comfortable with that?" Get it in writing.
"Is it possible to be too cautious?"
Yes. Over-managing risk is its own risk. A two-week project doesn't need a 50-row register and weekly reviews. Scale the process to the project.
Design a Mitigation Strategy
50 XPKey takeaways
- A risk is an uncertain event that could help or hurt your project. Most are threats; some are opportunities. Both need plans.
- The Risk Matrix (Likelihood x Impact) prioritises your attention. Red zone first. Green zone last.
- Four response strategies: Avoid, Mitigate, Transfer, Accept. Choose based on the risk score and the cost of the response.
- The Risk Register is a living document. Update it regularly. A stale register is worse than no register — it gives false confidence.
- Monitor with triggers and early warning signs. Regular risk reviews catch changing conditions before they become crises.
- Match your risk approach to your organisation's tolerance. A startup and a bank need fundamentally different levels of risk management.
Knowledge Check
1.The Mars Climate Orbiter was lost because one team used imperial units and another used metric. Which risk response strategy would have been most appropriate to prevent this?
2.A risk has a Likelihood of 4 (Likely) and an Impact of 3 (Moderate). What is its Risk Score, and which zone does it fall into?
3.Your project depends on a third-party payment API. You've signed an SLA with financial penalties if the vendor has downtime during your launch window. Which risk response strategy have you used?
4.Which of the following best describes the difference between a risk and an issue?