Every vulnerability management program has SLAs. Most of them are decorative.
Walk into a security review. There's a slide: "Critical findings remediated within 1 day. High within 7. Medium within 30. Low within 90." Everyone nods. Everyone moves on. Nobody asks what happens when a Critical sits open for 8 days.
That missing answer is the difference between an SLA and a wish.
The two pieces a real SLA needs
A remediation SLA has exactly two components:
- A deadline. Per-severity, measured from when the finding was first observed (not assigned, not triaged — observed).
- An escalation trigger. Something specific that happens when the deadline is missed.
Programs that only have #1 aren't running SLAs. They're running suggestions.
A default deadline table you can adopt
Feel free to argue with these numbers, but start here and adjust:
| Severity | Deadline | Escalation trigger |
|---|---|---|
| Critical | 24 hours | At +24h: email + webhook + Jira reopen. At +48h: page on-call manager. |
| High | 7 days | At +7d: email to assignee's manager + tenant owner. |
| Medium | 30 days | At +30d: email to assignee + a weekly digest. |
| Low | 90 days | At +90d: appears on a quarterly "grooming" report. No page. |
Two things to note. First: Low still has an SLA. It's a long one, but it exists, because the alternative is an open-forever queue. Second: escalation is not optional. If nothing happens when the deadline passes, the deadline isn't real.
What the escalation should actually do
A missed SLA should create observable pressure on a human. In order of increasing intensity:
- Email the assignee. Friendly reminder with a link to the finding.
- Email the assignee's manager + tenant owner. Now it's visible.
- Fire a webhook. Most useful when wired to Slack or a PagerDuty service.
- Reopen the linked Jira issue (if it was closed without remediation) and add a comment with the breach timestamp.
- Page on-call. Reserved for unremediated Criticals past a second threshold (e.g., +48h).
The goal is not to be punitive. The goal is to make sure no finding can quietly rot.
A fallback for tenants that didn't configure escalation
In a multi-tenant platform, most customers will configure SLA policies and forget to set an escalation email list. That's a trap: the feature silently does nothing, and they discover it three breaches later.
The right behavior: when a tenant's SLA policy has no EscalationEmails, the platform falls back to the tenant's Owners (or whatever your equivalent role is). This way the tenant at least gets notified at the role boundary, and they find out about the gap quickly — usually by complaining that "why am I getting these?" The answer — "because you didn't set EscalationEmails" — is exactly the nudge that causes them to fix it.
Backfill on policy changes
Here's a trap most implementations fall into: editing an SLA policy applies only to new findings. Old open findings keep their old deadlines, or worse, no deadline at all.
The fix is mechanical: when SlaPolicy.RemediationDays changes, recompute SlaDueAt on every matching open vulnerability. Same applies when a policy is first created — existing findings that match the policy should get retroactive deadlines.
Without backfill, changing an SLA is a permanent half-measure. The new numbers apply to the future. The existing queue slides forever.
What to actually measure
Once SLAs escalate, the metric that matters is per-severity MTTR. Not total MTTR. A program can have a great total MTTR while every Critical still takes two weeks, because 500 Lows pull the average down.
Report:
- MTTR(Critical), MTTR(High), MTTR(Medium), MTTR(Low), all as rolling 30-day averages
- SLA compliance percentage per severity — "what fraction of findings closed within their deadline?"
- Number of open breaches today, per severity
That's four numbers per severity, 16 total. Those 16 numbers tell you more about your program than any heatmap ever will.
One last rule
If an SLA never breaches, it's set too loose. If every SLA breaches, the deadline is fantasy. You want a program where most findings close within the deadline, a meaningful fraction breach, and the breach count trends downward quarter over quarter.
That is what a working remediation program looks like. Everything else is theater.