
Your Incidents Are Product Research. You’re Just Not Reading Them.
James Mitchell
Three months ago, we had an outage. Nothing catastrophic — a slow memory leak in our job queue that finally caught up with us at 2 AM on a Tuesday. The on-call engineer patched it in forty minutes. We wrote the post-mortem, filed the ticket, closed the ticket.
What we didn’t do was ask the obvious question: why were 4,000 jobs queued at 2 AM in the first place?
That question, sitting unanswered in a resolved ticket, turned out to be more valuable than anything we’d learned from three rounds of user interviews that quarter. Those 4,000 jobs were batch exports — a workflow we’d assumed nobody ran anymore because our new real-time sync had “replaced” it. Except it hadn’t. Users had just stopped telling us they were still doing it the old way.
The Feedback Loop You Already Have
Product teams are obsessed with closing feedback loops. NPS surveys, session recordings, user interviews, beta programs. The machinery of listening. And it’s all genuinely useful — but it captures what users say and what they do when they know they’re being watched.
Incidents capture something different: what actually breaks when real people use your product in ways you didn’t anticipate.
That’s not a bug report. That’s ethnographic research delivered to your pagerduty at scale.
The problem is that most teams treat incidents as engineering problems with engineering solutions. Root cause, fix, close. Occasionally a “process improvement” gets logged. But the product implications — the questions about why users were doing that thing — almost never make it out of the incident doc.
What Incidents Are Actually Telling You
Let’s be specific about the signal buried in your post-mortems.
1. Usage patterns you’ve never seen in analytics
Your analytics track the happy path. Incidents reveal the detours. When something breaks at volume, it’s because a lot of people were taking a path your product implicitly discouraged but never actually blocked.
A team building a B2B SaaS for procurement found this out when their export endpoint started timing out for a handful of enterprise accounts. Investigation revealed those accounts were running exports every fifteen minutes as a workaround for a missing webhook feature. Analytics showed “export” as an occasional action. Incidents revealed it as a mission-critical polling mechanism for a segment they’d completely misread.
2. The features you thought were deprecated
Every product has them. The legacy API endpoint. The “old” workflow. The thing you stopped promoting two versions ago. Incidents are often where you discover these aren’t legacy to your users — they’re load-bearing.
The job queue story above is one version of this. Another common variant: you add a new way to do X, metrics show adoption of the new flow, you quietly stop maintaining the old one. Then an incident reveals 30% of your revenue is still running through code you considered deprecated.
3. Integration patterns that exist outside your product
Your users have built things with your product that you’ve never documented, never intended, and probably can’t see in your own data. Zapier flows. Custom scripts. Spreadsheet macros that hit your API. These are invisible until they break.
When they break, the incident tells you something important: users valued this workflow enough to build infrastructure around it. That’s the highest signal of product value you’ll ever see, and it’s sitting in your incident logs.
4. Where your mental model of the product diverges from theirs
This is the subtle one. Sometimes incidents happen because users have a fundamentally different mental model of how something works than your team does. They expect eventual consistency where you give them strong consistency. They expect state to persist where you expire it. They expect an operation to be idempotent when it isn’t.
The incident is the gap made visible. Fix the bug, sure. But the more important fix is updating either your product to match their model or your documentation to close the gap. Most teams only do the former.
A System for Actually Using This Signal
The reason this doesn’t happen isn’t because PMs don’t care. It’s because the information lives in the wrong places, in the wrong format, reviewed by the wrong people.
Post-mortems live in engineering docs. PMs aren’t in the incident channel. Support tickets get triaged by support, not product. The signal exists but it doesn’t travel.
Here’s what actually works:
The product question appendix
Add a single required section to your post-mortem template: Product Implications. Three prompts:
- What were users trying to accomplish when this broke?
- Does this reveal a usage pattern we didn’t know about or didn’t expect at this volume?
- Does the cause of this incident suggest a gap between how we think the product works and how users actually use it?
These questions don’t need deep answers. A sentence each is fine. The point is to make product thinking a mandatory step in incident review, not an optional follow-up that never happens.
Incident office hours for PMs
Pick one incident per week — doesn’t have to be the biggest one, just an interesting one — and have a PM sit with the on-call engineer for fifteen minutes to walk through it. Not to assign blame or second-guess engineering decisions. Just to ask: what were the users doing?
This sounds low-value but it compounds fast. PMs develop intuition for the parts of the system that are under stress. Engineers get better at flagging product-relevant incidents. The information starts flowing.
Tagging incidents with product themes
Your incident tracking system (PagerDuty, OpsGenie, Linear, whatever) almost certainly supports labels or custom fields. Create a taxonomy of product themes — integrations, data volume, legacy workflows, permission model, etc. — and tag incidents accordingly.
After ninety days, run the report. The themes that cluster are your product roadmap inputs. If you have fifteen incidents tagged “data export” and three tagged “permission model,” that’s not ambiguous.
The support-to-product pipeline
Support tickets are pre-incident signals. Most incidents were first reported as support tickets by confused users who couldn’t articulate what was wrong. Build a lightweight path from support triage to product backlog — not every ticket, but a systematic weekly review of tickets that suggest unexpected usage or model confusion.
Reframing the Post-Mortem
The engineering post-mortem has a specific job: prevent the same technical failure from recurring. It does that job well. What I’m not suggesting is turning post-mortems into product reviews or bloating them with speculation.
What I am suggesting is that the post-mortem is one moment where the full reality of how users interact with your system is briefly, unusually visible. The normal filters — instrumentation choices, survey bias, observation effects — are absent. You’re seeing what actually happened.
That’s rare. It’s worth a few more minutes of attention.
The best product teams I’ve watched aren’t necessarily the ones who do the most user research. They’re the ones who extract signal from everything — interviews, analytics, support, sales calls, and incidents. They treat the product as a continuous experiment and they’re promiscuous about sources of evidence.
Incidents are one of the richest sources most teams are systematically ignoring.
The Harder Implication
If you take this seriously, you’ll eventually hit a more uncomfortable truth: some of your incidents aren’t just engineering failures. They’re product failures — places where the product didn’t make the right thing easy enough, so users built workarounds that broke under load.
The memory leak that revealed the batch export problem? The real fix wasn’t in the job queue code. It was in shipping the webhook feature that users had been waiting for, quietly, without telling us, for eighteen months.
That’s a different kind of post-mortem. Harder to write. More important to have.
Your incidents are talking. The question is whether anyone on the product side is in the room to hear them.
Priya Sharma is a product engineer at ProductOS. She writes about the places where engineering decisions and product strategy are secretly the same decision.