What is an SLA (Service Level Agreement)?

An SLA is a legal contract between a service provider and customer that defines performance commitments. It's the promise you make about how reliable and fast your service will be.

Definition

Service Level Agreement (SLA) is a contractual commitment that defines the expected performance levels, availability, and support response times for a service. It specifies metrics like uptime percentage, response time thresholds, and remedies if targets aren't met.

SLAs are binding agreements—if a provider breaches an SLA, customers are entitled to credits, refunds, or other compensation.

Why SLAs Matter

SLAs serve three critical purposes:

Trust Building

Customers need confidence that your service is reliable. An SLA backed by compensation creates accountability.

Risk Mitigation

Credits compensate customers for downtime impact. This protects customer business and reduces disputes.

Engineering Focus

SLA commitments drive reliability investments. Teams prioritize uptime when they have public targets to meet.

Common SLA Metrics

Most SLAs include these key metrics:

1. Uptime Percentage

The percentage of time your service is available. Example: "99.9% uptime" means the service can be down 43 minutes per month. This is the most important metric.

2. Response Time (Latency)

The maximum acceptable time from request to response. Example: "95% of requests respond within 500ms". This ensures users get fast experiences.

3. Mean Time to Recovery (MTTR)

The average time to fix an outage after detection. Example: "Restore service within 30 minutes of incident detection". This defines how quickly you commit to recovering.

4. Support Response Time

How fast your support team responds to issues. Example: "Critical issues acknowledged within 15 minutes". This is separate from technical MTTR.

5. Error Rate

Percentage of requests that fail. Example: "Error rate shall not exceed 0.1%". This ensures quality, not just availability.

6. Data Backup & Recovery

Frequency of backups and time to restore data. Example: "Daily backups, restore within 1 hour of request". Critical for data-sensitive services.

Common SLA Tiers & Downtime Allowances

Here are the most common SLA tiers you'll see from cloud providers and SaaS companies:

SLA TierUptime %Downtime/MonthDowntime/YearTypical Credit
Basic99%7 hours 12 min3.6 days5-10%
Standard99.9%43 minutes8 hours 45 min10-25%
Premium99.95%21 minutes 36 sec4 hours 22 min25-50%
Enterprise99.99%4 minutes 19 sec52 minutes50-100%

Key insight: Most mature SaaS companies commit to 99.9% (Stripe, GitHub, Slack). 99.99% is typically reserved for enterprise services with redundancy across regions. Costs increase exponentially as you go higher.

Components of a Good SLA

A comprehensive SLA should include:

1. Clear Service Description

What is covered? What isn't? Example: "Uptime for API endpoints only; web dashboard excluded from uptime guarantee".

2. Specific Metrics

Use measurable numbers: "99.9% uptime" not "best-effort uptime". Include measurement methodology: "measured from 5+ external locations".

3. Exclusions

Define what breaches don't count: scheduled maintenance, force majeure, customer misuse, third-party failures, etc.

4. Credit Terms

How much refund/credit for each tier of breach? Example: "99%-99.9% uptime = 5% credit; 98%-99% = 10% credit; <98% = 25% credit".

5. Support Terms

Separate support SLA from technical SLA. Example: "Critical issues supported 24/7 with 15-min response; non-critical within 24 hours".

6. Claim Process

How do customers request credits? Within what timeframe? Example: "Customers must request credits within 30 days of breach with documentation".

Real-World SLA Examples

AWS (Amazon Web Services)

99.99% uptime SLA for EC2, S3, RDS, etc. (only infrastructure—not your code).

Credit: 10% (99%-99.99%), 30% (95%-99%), 100% (<95%). Measured per region.

Stripe (Payment Processing)

99.9% uptime guarantee with transparent uptime tracking. Proven ~99.97% actual uptime.

Credits available but rarely needed—Stripe over-delivers on commitments.

GitHub (Version Control)

99.9% uptime for platform-critical features. Excludes scheduled maintenance windows.

Measured monthly. Support SLA: 1-hour response for critical issues.

Google Workspace (Enterprise)

99.95% uptime for Gmail, Drive, Docs. Different tiers for enterprise vs. consumer.

Credit structure: 5% (99%-99.95%), 10% (95%-99%), 25% (<95%).

How to Monitor Your SLA Compliance

Your own monitoring is biased. When your infrastructure goes down, your monitoring system might too. Here's how to verify SLA compliance:

Use Third-Party External Monitoring

Services like AtomPing check your service from multiple regions. This provides independent, credible proof of your uptime for customer disputes.

Benefits:

  • Credibility: If customers dispute uptime, third-party data is more defensible
  • Visibility: See outages from customer perspective, not just internal metrics
  • Automation: Auto-generate SLA reports for customers and finance
  • Historical data: Track trends and improve SLA targets

AtomPing's SLA Monitoring Features

  • Check service from 25+ global locations every 30 seconds
  • Generate automated SLA compliance reports
  • Create public status pages showing real-time uptime
  • Export uptime data for audits and customer disputes
  • Track response time and error rates, not just uptime

Frequently Asked Questions

What happens if a company fails to meet its SLA?
If a company breaches its SLA, customers are typically eligible for service credits (refunds or discounts). For example, if AWS commits to 99.99% uptime and only achieves 99.95%, customers get a 10% credit on their monthly bill. This incentivizes providers to maintain reliability.
Is an SLA legally binding?
Yes, SLAs are legally binding contracts. If a provider breaches the SLA, customers can claim damages or request service credits. However, most SLAs contain exclusions (like force majeure or scheduled maintenance) that limit liability.
Who should have an SLA?
Any service-based business should have an SLA. This includes SaaS companies (Slack, Stripe, GitHub), cloud providers (AWS, Azure, Google Cloud), hosting providers, and managed services. Even startups should publish SLAs to build customer trust.
What's the difference between SLA and SLO?
SLA (Service Level Agreement) is a contract promising specific metrics. SLO (Service Level Objective) is an internal goal your team commits to. For example: SLA says 99.9% uptime to customers; SLO is 99.95% (internal buffer) to build safety margin.
Can I change my SLA after signing?
You can only change your SLA by mutual agreement with your customers. Some providers allow SLA modifications with notice periods (e.g., 30 days). However, changing it mid-contract typically requires customer consent or triggers a dispute resolution process.
How do I monitor my SLA compliance?
Use third-party monitoring services like AtomPing to track uptime independently. Your own monitoring is biased; external monitoring is more credible. AtomPing checks your service from 25+ regions and provides SLA compliance reports you can share with customers.

Deliver on Your SLA Promises

AtomPing provides the external monitoring and status page tools you need to monitor, track, and communicate your SLA compliance to customers.

Start Your Free SLA Monitoring

We use cookies

We use Google Analytics to understand how visitors interact with our website. Your IP address is anonymized for privacy. By clicking "Accept", you consent to our use of cookies for analytics purposes.