Home/Glossary/Incident Severity

What is Incident Severity?

Incident severity is a classification system that ranks incidents by their impact on users and business operations. It determines how urgently your team responds, who gets involved, and how you communicate with stakeholders.

Definition

Incident Severity is a standardized scale (typically P1 through P4 or SEV1 through SEV4) used to classify the impact and urgency of service disruptions. Higher severity indicates greater user impact and demands faster response times, more resources, and broader communication.

For example, a complete production outage affecting all users is a P1/SEV1 (critical), while a cosmetic UI bug affecting a rarely-used feature might be a P4/SEV4 (low).

Severity Levels Explained: P1 Through P4

The most widely adopted framework uses four levels. Here is what each level means and when to apply it:

P1 / SEV1 — Critical

Complete service outage or severe degradation affecting the majority of users. Core functionality is unavailable. Revenue-impacting.

Examples:

  • • Website completely unreachable (HTTP 500 or timeout for all users)
  • • Payment processing system down (transactions failing)
  • • Database corruption causing data loss
  • • Security breach with active data exfiltration

P2 / SEV2 — High

Major feature degradation or outage affecting a significant portion of users. Core functionality is impaired but a workaround may exist.

Examples:

  • • Login system failing for users in one region
  • • API response times degraded to 10x normal (service usable but slow)
  • • Email notifications not being delivered
  • • Dashboard showing stale data (monitoring still works, display delayed)

P3 / SEV3 — Medium

Minor feature degradation or issue affecting a small number of users. Core functionality works. A workaround is readily available.

Examples:

  • • CSV export producing incorrect column ordering
  • • Non-critical background job failing intermittently
  • • Slow page load on a secondary settings page
  • • Single monitoring region producing delayed results

P4 / SEV4 — Low

Cosmetic issue, minor inconvenience, or improvement request. No impact on core functionality or user workflows.

Examples:

  • • Typo in error message text
  • • Minor alignment issue on mobile viewport
  • • Tooltip showing outdated information
  • • Log entries missing non-critical metadata fields

How to Classify Incidents

Severity classification should be based on objective criteria, not gut feeling. Use these dimensions to determine the correct level:

DimensionP1P2P3P4
User ImpactAll or most usersLarge subsetSmall subsetMinimal / none
Core FunctionUnavailableDegradedMinor impactUnaffected
Revenue ImpactDirect lossPotential lossUnlikelyNone
WorkaroundNoneDifficultAvailableNot needed

Tip: When in doubt, classify higher and downgrade later. It is much better to over-respond to a P2 that turns out to be P3 than to under-respond to a P1 classified as P3. You can always de-escalate, but lost time from under-classification cannot be recovered.

Response Expectations by Severity

Each severity level should have clear, documented expectations for response time, staffing, and communication:

P1 — Immediate Response

  • Acknowledge: Within 5 minutes
  • Staffing: All available engineers, incident commander assigned
  • Communication: Status page updated immediately, stakeholder notification within 15 minutes
  • Updates: Every 15-30 minutes until resolved

P2 — Urgent Response

  • Acknowledge: Within 15 minutes
  • Staffing: Dedicated on-call engineer, escalate if needed
  • Communication: Status page update if user-visible, internal notification
  • Updates: Every 30-60 minutes until resolved

P3 — Standard Response

  • Acknowledge: Within 1 hour
  • Staffing: Next available engineer during business hours
  • Communication: Internal ticket, no public communication required
  • Resolution: Within 1-3 business days

P4 — Scheduled Response

  • Acknowledge: Within 1 business day
  • Staffing: Backlog prioritization, normal sprint work
  • Communication: Internal tracking only
  • Resolution: Next sprint or scheduled maintenance window

Escalation Policies by Severity

Escalation ensures incidents get the right attention when initial responders are unavailable or the situation worsens:

P1 Escalation Path

0 min: Primary on-call5 min: Secondary on-call15 min: Engineering manager30 min: VP Engineering

P2 Escalation Path

0 min: Primary on-call15 min: Secondary on-call60 min: Engineering manager

P3/P4 Escalation

Typically no time-based escalation. Handled through normal backlog prioritization. Escalate manually if impact increases.

Severity and SLA Implications

Incident severity directly affects your SLA compliance. Understanding this relationship helps you allocate resources appropriately:

Error Budget Consumption

A 99.9% SLA gives you approximately 43 minutes of downtime per month. A single P1 incident lasting 30 minutes consumes 70% of your monthly error budget. This is why P1 response must be immediate — every minute of a critical outage has outsized impact on your SLA compliance.

P3 and P4 incidents typically do not affect SLA metrics because they do not cause user-facing downtime. However, accumulated P3 issues can signal systemic problems that may lead to P1 incidents if left unaddressed.

SLA Credits and Severity

Many SLA agreements tie credits to severity. A P1 outage exceeding the SLA window might trigger a 10% credit, while a P2 might trigger 5%. Proper severity classification at the start of an incident helps your customer success team communicate accurately with affected customers and manage credit expectations.

Frequently Asked Questions

What is the difference between SEV and P levels?
SEV (Severity) and P (Priority) are often used interchangeably, but they have distinct meanings. Severity describes the technical impact of an incident (how broken is it?). Priority describes the business urgency (how quickly must we fix it?). A SEV1 incident with a workaround might be P2 priority if the workaround is acceptable short-term. Most teams simplify by using a single scale (P1-P4 or SEV1-SEV4) that combines both dimensions.
How many severity levels should we have?
Most organizations use 3 to 5 levels. Four levels (P1 through P4) is the most common approach. Fewer than 3 levels makes it hard to differentiate between a total outage and a minor bug. More than 5 levels creates confusion about which level to assign. Start with 4 and adjust based on your team's needs.
Who decides the severity level of an incident?
Typically, the first responder assigns an initial severity based on predefined criteria (impact scope, user count, revenue impact). This can be adjusted during the incident as more information becomes available. For example, an incident initially classified as P3 might be upgraded to P1 if the scope of impact turns out to be larger than first assessed.
Should severity levels map to response time SLAs?
Yes. Each severity level should have defined response expectations. For example: P1 — acknowledge within 5 minutes, all-hands response. P2 — acknowledge within 15 minutes, dedicated responder. P3 — acknowledge within 1 hour, next available engineer. P4 — address within next business day. These expectations should be documented and agreed upon by the team.
How do severity levels affect status page communication?
Higher severity incidents typically warrant public status page updates. P1 incidents should trigger an immediate status page update notifying users of the issue. P2 may warrant an update depending on user visibility. P3 and P4 are usually handled internally without public communication unless they affect a specific subset of users who should be informed.
Can severity change during an incident?
Absolutely. Severity should be re-evaluated as new information emerges. A P3 incident might escalate to P1 if you discover it affects more users than initially thought. Conversely, a P1 might be downgraded to P2 if a workaround is found that restores partial service. Document severity changes in your incident timeline.
What happens if every incident is classified as P1?
This is a common anti-pattern called severity inflation. When everything is P1, nothing is truly urgent. Teams experience alert fatigue and response times degrade across the board. Combat this by establishing clear, objective criteria for each level (e.g., P1 requires >50% of users impacted AND revenue-impacting) and reviewing classifications in post-incident reviews.

Detect Incidents Before They Escalate

AtomPing monitors your services from multiple regions with 8 check types — HTTP, TCP, ICMP, DNS, TLS, Keyword, Cron, and PageSpeed. Catch P1 issues before your users do. Free plan includes 50 monitors with email alerts.

Start Monitoring Free

We use cookies

We use Google Analytics to understand how visitors interact with our website. Your IP address is anonymized for privacy. By clicking "Accept", you consent to our use of cookies for analytics purposes.