SLA vs SLO vs SLI: What's the Difference?

Clear explanation of SLA, SLO, and SLI with practical examples. How to set service-level objectives, measure indicators, and write enforceable agreements.

2026-03-25 · 8 min · Guide

"Our SLA is 99.9%". Anyone working with cloud services hears this phrase. But ask "How is your SLA different from your SLO?" and confusion sets in. SLA, SLO, SLI — three acronyms that sound similar but mean fundamentally different things. Confusing them means either promising customers what you can't deliver, or not measuring what you should.

Below, each term with real examples, how they relate to each other, and how to use them in practice — from a startup with a single API to an enterprise platform with dozens of microservices.

SLI — Service Level Indicator: what you measure

SLI is a concrete metric you collect. Not an abstraction, not a target, but a number on your dashboard. SLI answers: "How do we objectively measure the quality of this service?"

Availability SLI: Percentage of successful HTTP requests. If out of 1,000,000 requests, 999,100 return 2xx/3xx and 900 return 5xx — availability SLI = 99.91%. HTTP monitoring measures this continuously.

Latency SLI: Response time at a specific percentile. p50 = 120ms means 50% of requests complete faster than 120ms. p99 = 800ms means 99% complete faster than 800ms. Response time tracking provides this data.

Freshness SLI: How current is the data. If a dashboard updates every second but data lags 30 seconds — freshness SLI = 30s. Critical for real-time systems.

Correctness SLI: Percentage of requests that return the correct answer. If a search API returns relevant results 98% of the time — correctness SLI = 98%.

Key rule: SLI should be measured from the user's perspective, not the server's. Internal CPU usage is not an SLI. "Percentage of requests completed faster than 500ms, measured from multiple regions" — that's an SLI.

SLO — Service Level Objective: what you aim for

SLO is a target value for your SLI. An internal team commitment: "We aim for 99.95% availability over a rolling 30 days". SLO has no legal force and implies no penalties — it's your internal quality standard.

Example SLO for API: "99.9% of HTTP requests to /api/v1/* must return 2xx in less than 500ms, measured over a rolling 30 days".

Example SLO for status page: "The status page must be available 99.99% of the time with response time under 200ms".

Example SLO for cron job: "The nightly backup job must complete successfully in 99.5% of runs, execution time no more than 2 hours". Cron job monitoring via heartbeat checks tracks execution.

How to choose the right SLO

A common mistake is setting SLO "as high as possible": 99.999%. This is a trap. Each additional nine exponentially increases cost and complexity. The difference between 99.9% and 99.99% is the difference between "you can deploy once a day" and "you need blue-green deployment with zero-downtime migrations".

SLO	Downtime / month	Downtime / year	Good for
99.0%	7.3 hours	3.65 days	Staging, internal tools
99.5%	3.65 hours	1.83 days	Non-critical B2B SaaS
99.9%	43.8 min	8.76 hours	Production APIs, SaaS
99.95%	21.9 min	4.38 hours	Payment systems, auth
99.99%	4.38 min	52.6 min	Infrastructure, DNS, CDN

Use the uptime calculator for precise calculations of allowed downtime at different SLOs.

SLA — Service Level Agreement: what you promise

SLA is a contract. A legally binding document that tells customers: "We guarantee level X. If we don't deliver, here's the compensation." SLA transforms your SLO from an internal goal into an external commitment with financial consequences.

Typical SLA structure:

1. Service definition — exactly what's covered (API, dashboard, all endpoints?).

2. Metric and threshold — "99.9% monthly uptime" (this is SLI + SLO within the SLA).

3. Measurement method — how uptime is calculated (by minutes? by requests? is maintenance excluded?).

4. Compensation — service credits for violations (typically 10-30% of monthly bill).

5. Exceptions — what doesn't count as a violation (planned maintenance, force majeure, customer-caused issues).

Golden rule: SLA < SLO < SLI target

Your SLA should be lower than your SLO, and your SLO lower than what you actually achieve. If your measured uptime is 99.95%, your SLO should be 99.9%, and your SLA should be 99.5%. This gives you a buffer:

SLI (actual): 99.95% — what you actually achieve.

SLO (target): 99.9% — your internal goal. SLO breach = error budget consumed, freeze risky changes.

SLA (promise): 99.5% — what you promise customers. SLA breach = financial compensation, reputational damage.

Error budget: turning reliability into a resource

Error budget is a concept from Google SRE that changes how we talk about reliability. Instead of "we must be more reliable" — "we have a specific budget for unreliability, and we decide how to spend it".

If SLO = 99.9% monthly, error budget = 0.1% = 43 minutes of downtime per month. Those 43 minutes are a resource you allocate:

Deployments: Each deployment carries risk. If the average deployment causes 2 minutes of degradation, you can deploy ~20 times per month while staying in budget.

Experiments: An A/B test that could break 1% of requests for 10 minutes = 6 seconds from error budget. Acceptable.

Migrations: A database migration with 5 minutes of planned downtime = 11.6% of error budget. Plan ahead.

When budget runs out: Freeze all non-critical changes. Focus only on stability. Automate this with burn rate monitoring.

How to monitor SLI/SLO in practice

Theory without tools is useless. Here's a concrete setup for monitoring your SLIs and tracking SLOs:

Availability SLI: Configure HTTP monitoring for all critical endpoints. AtomPing calculates uptime automatically and generates SLA reports with daily breakdown.

Latency SLI: Response time monitoring from multiple regions. Check not just the average but p95/p99 — the tail of the distribution is what impacts UX.

SSL SLI: SSL monitoring with alerts for approaching expiry. An expired certificate = 100% failure for HTTPS, instant SLO breach.

DNS SLI: DNS monitoring to verify domains resolve to the correct IPs. DNS failure = complete outage for all users.

SLA, SLO, SLI — not bureaucracy. This is the language that engineers, product managers, and business can use to discuss reliability concretely: with numbers, budgets, and measurable consequences. Start small — pick 1-2 SLIs for your primary service, set a realistic SLO, and configure monitoring to measure them continuously.

FAQ

What is the difference between SLA, SLO, and SLI?

SLI (Service Level Indicator) is the actual metric you measure — for example, uptime percentage or response time. SLO (Service Level Objective) is your internal target for that metric — 'we aim for 99.9% uptime.' SLA (Service Level Agreement) is a contractual commitment to your customers with penalties if you fail — 'we guarantee 99.9% uptime, or we refund 10% of your monthly bill.' SLI is what you measure, SLO is what you target, SLA is what you promise.

Do I need all three: SLA, SLO, and SLI?

You always need SLIs — without measuring, you're flying blind. SLOs are strongly recommended for any production service, even internal ones — they create shared expectations across teams. SLAs are optional and typically used for external customers, enterprise contracts, or public-facing services where downtime has financial impact on customers.

What is an error budget?

An error budget is the amount of downtime your SLO allows. If your SLO is 99.9% monthly uptime, your error budget is 0.1% of the month — about 43 minutes. You can 'spend' this budget on deployments, experiments, or migrations. When the budget runs out, you freeze risky changes and focus on stability. Error budgets turn reliability from a vague goal into a concrete, trackable resource.

What SLO should I set for my service?

Start with what you can actually achieve based on historical data. If your measured uptime over the past 3 months is 99.5%, don't set an SLO of 99.99%. A good starting point: 99.9% for production APIs, 99.5% for internal tools, 99.0% for staging/dev environments. Set your SLO slightly below your actual performance to create a realistic error budget.

How do I monitor SLI metrics?

Uptime monitoring tools like AtomPing measure availability SLIs automatically — uptime percentage, response time, and error rates from multiple regions. For latency SLIs, track p50, p95, and p99 response times. For throughput SLIs, monitor requests per second. The key is measuring from the user's perspective (external monitoring), not just from your servers (internal metrics).

What happens when we violate our SLA?

Typical SLA penalties include service credits (5-30% of monthly bill), extended free service, or in extreme cases, contract termination rights for the customer. The real cost is often reputational — customers lose trust, and competitors use your outage in their marketing. This is why SLOs should be stricter than SLAs — you want to catch problems before they become SLA violations.

Start monitoring your infrastructure

Start Free View Pricing