How to Design Health Check Endpoints for Your API

How to build health-check endpoints that catch real problems: liveness vs readiness, dependency checks, status formats, and when not to add a check.

2026-03-26 · 12 min · Technical Guide

A health check endpoint is the monitoring entry point. If it returns 200, your service is considered alive. If 503 — dead. How well-designed this endpoint is determines whether you catch a real outage in 30 seconds or miss degradation that users notice first.

Most teams implement a minimal GET /health → 200 OK. Better than nothing, but only catches complete process death. Real health checks verify dependencies, distinguish liveness from readiness, and provide diagnostic information for fast troubleshooting.

Two Levels of Health Checks

Liveness: "Is the process alive?"

The liveness probe answers one question: is the application running and responding to requests? It doesn't check dependencies — only the process itself. If a liveness probe fails, Kubernetes restarts the container and load balancer stops sending traffic.

Endpoint: GET /health/live

What it checks: Process is running, HTTP server responds

Success response: 200 {"status": "ok"}

Fails when: Deadlock, out of memory, infinite loop, event loop blocked

Liveness should be as lightweight as possible. No database calls, cache hits, or external API requests. If your liveness probe depends on the database, its failure causes cascading container restarts — making things worse.

Readiness: "Ready to handle traffic?"

The readiness probe answers: can this instance process a user request? It checks dependencies: database, cache, queues, external APIs. If readiness fails, the instance is removed from load balancing but not restarted.

Endpoint: GET /health/ready

What it checks: Database is accessible, Redis/cache responds, critical external APIs are reachable

Success response:

{
  "status": "healthy",
  "timestamp": "2026-03-26T10:30:00Z",
  "checks": {
    "database": {"status": "healthy", "latency_ms": 3},
    "redis": {"status": "healthy", "latency_ms": 1},
    "stripe_api": {"status": "healthy", "latency_ms": 45}
  }
}

Degradation response:

{
  "status": "degraded",
  "timestamp": "2026-03-26T10:30:00Z",
  "checks": {
    "database": {"status": "healthy", "latency_ms": 3},
    "redis": {"status": "unhealthy", "error": "connection timeout"},
    "stripe_api": {"status": "healthy", "latency_ms": 45}
  }
}

Designing Dependency Checks

Each dependency in a readiness probe should be checked independently with its own timeout. One slow check shouldn't block the entire endpoint.

Database check

# Python / Django
def check_database():
    try:
        start = time.monotonic()
        with connection.cursor() as cursor:
            cursor.execute("SELECT 1")
        latency = (time.monotonic() - start) * 1000
        return {"status": "healthy", "latency_ms": round(latency)}
    except Exception as e:
        return {"status": "unhealthy", "error": str(e)}

Timeout: 3 seconds. SELECT 1 is the minimal query checking connection pool, network path to database, and basic PostgreSQL/MySQL functionality.

Cache check (Redis/Memcached)

def check_redis():
    try:
        start = time.monotonic()
        redis_client.ping()
        latency = (time.monotonic() - start) * 1000
        return {"status": "healthy", "latency_ms": round(latency)}
    except Exception as e:
        return {"status": "unhealthy", "error": str(e)}

Timeout: 2 seconds. PING checks connection and authentication. If cache is not critical (fallback to database), its failure can be degraded rather than unhealthy.

External API check

def check_stripe():
    try:
        start = time.monotonic()
        response = requests.get(
            "https://api.stripe.com/healthcheck",
            timeout=5
        )
        latency = (time.monotonic() - start) * 1000
        if response.status_code == 200:
            return {"status": "healthy", "latency_ms": round(latency)}
        return {"status": "degraded", "http_status": response.status_code}
    except requests.Timeout:
        return {"status": "unhealthy", "error": "timeout"}
    except Exception as e:
        return {"status": "unhealthy", "error": str(e)}

Caution: External API checks increase health endpoint latency and create third-party dependency. Include only critical dependencies (payment, auth provider). Non-critical ones — check async and cache results for 30-60 seconds.

Classifying Dependencies: Critical vs Degraded

Not all dependencies matter equally. Database is critical: nothing works without it. Email service is degraded: app works, but emails don't send. Correct classification prevents false alarms from non-critical failures.

Critical (→ unhealthy, HTTP 503): Primary database, authentication service, core business logic dependencies

Degraded (→ degraded, HTTP 200): Cache (Redis/Memcached), email service, analytics, non-essential third-party APIs

Unchecked: CDN, logging service, metrics collection — their failure doesn't affect request processing ability

def get_overall_status(checks):
    if any(c["status"] == "unhealthy" for name, c in checks.items()
           if name in CRITICAL_DEPS):
        return "unhealthy", 503
    if any(c["status"] != "healthy" for c in checks.values()):
        return "degraded", 200
    return "healthy", 200

Response Format

Standardized response format simplifies integration with monitoring systems and enables JSON path assertions to verify specific fields.

{
  "status": "healthy",
  "version": "2.4.1",
  "uptime_seconds": 86420,
  "timestamp": "2026-03-26T10:30:00Z",
  "checks": {
    "database": {
      "status": "healthy",
      "latency_ms": 3,
      "type": "postgresql"
    },
    "redis": {
      "status": "healthy",
      "latency_ms": 1,
      "type": "redis"
    },
    "queue": {
      "status": "healthy",
      "latency_ms": 2,
      "pending_jobs": 142,
      "type": "celery"
    }
  }
}

In AtomPing you can configure an HTTP check with JSON path assertion $.status = healthy. This verifies not only that the endpoint responds 200, but that all dependencies are healthy. If the database fails, status changes to unhealthy, assertion fails, and monitoring creates an incident.

Kubernetes Probes: Configuration

In Kubernetes, liveness and readiness probes are configured in the pod manifest. Correct configuration balances detection speed with resilience to brief failures.

spec:
  containers:
    - name: api
      livenessProbe:
        httpGet:
          path: /health/live
          port: 8000
        initialDelaySeconds: 15
        periodSeconds: 10
        timeoutSeconds: 3
        failureThreshold: 3
      readinessProbe:
        httpGet:
          path: /health/ready
          port: 8000
        initialDelaySeconds: 5
        periodSeconds: 10
        timeoutSeconds: 5
        failureThreshold: 2
      startupProbe:
        httpGet:
          path: /health/live
          port: 8000
        initialDelaySeconds: 0
        periodSeconds: 5
        failureThreshold: 30

startupProbe is a third type giving applications time to initialize (up to 150 seconds in the example above). Until startup probe passes, liveness and readiness don't run. Useful for applications with heavy startup (migrations, cache warming, ML model loading).

Common Mistakes

1. Liveness probe with dependency checks

Most common mistake: checking the database in liveness probe. Database fails → liveness fails → Kubernetes restarts all pods → pods start simultaneously → thundering herd on database → database never recovers. Liveness = process only. Dependencies = readiness only.

2. Missing timeout on individual checks

If one dependency check hangs for 30 seconds (e.g., DNS resolution timeout to external API), the entire health endpoint hangs. Kubernetes interprets this as failure. Solution: run checks in parallel with individual timeouts (2-5 seconds each).

3. Heavy health checks

Health endpoint is called every 10-30 seconds by dozens of consumers (Kubernetes, load balancer, external monitoring). If checks execute complex SQL queries or call 5 external APIs, they create noticeable load. Rule: health check endpoint should respond in 50-200ms, period.

4. Exposing secrets in health response

Never include connection strings, API keys, internal IP addresses, or table names in health check responses. Even if the endpoint is "internal only" — leaking one URL to logs exposes your infrastructure.

5. Single /health without liveness/readiness split

One GET /health endpoint forces a choice: check dependencies (risking cascading restarts) or skip them (missing degradation). Separating into /health/live and /health/ready solves this dilemma.

Advanced Patterns

Cached readiness

Instead of checking dependencies on every request, run a background task checking them every 10-15 seconds and caching results in memory. Health endpoint returns cached results instantly. Reduces load and prevents timeouts during high request frequency.

Graceful degradation signaling

Instead of binary healthy/unhealthy, use three states: healthy (everything works), degraded (non-critical dependencies unavailable but service works), unhealthy (critical dependency unavailable). Monitoring can react differently to each: degraded = warning, unhealthy = critical alert.

Deep health vs shallow health

Two endpoints: /health (shallow — fast process check for load balancer, 1-5ms) and /health/deep (full dependency check for monitoring, 50-200ms). Load balancer uses shallow, external monitoring uses deep. This separates consumer needs.

Integration with External Monitoring

Health check endpoint is half the solution. The other half is external monitoring regularly polling this endpoint from different regions and alerting on problems.

Setup in AtomPing:

1. Create HTTP monitor with URL https://api.yourapp.com/health/ready

2. Add JSON path assertion: $.status equals healthy

3. Set response time threshold: 5000ms (health endpoint shouldn't be slow)

4. Interval: 30 seconds

5. Enable quorum confirmation to prevent false alarms

External monitoring checks what Kubernetes probes can't: internet reachability (DNS, routing, TLS), performance from user perspective, and full chain health (CDN → load balancer → app → database).

Checklist: Designing Health Check Endpoints

Architecture: Separate /health/live (liveness) and /health/ready (readiness) endpoints

Dependency checks: Each dependency checked with individual timeout (2-5s)

Classification: Dependencies separated into critical and non-critical

Response format: JSON with overall status, per-dependency status, latency, timestamp

Performance: Health endpoint responds in 50-200ms under normal conditions

Security: No secrets in response, liveness doesn't require auth

Monitoring: Endpoint polled by external monitoring with JSON path assertions

API Monitoring: Complete Guide — How to monitor REST API endpoints

Monitoring Microservices — Health checks in distributed systems

Internal vs External Monitoring — Why you need both approaches

How to Reduce False Alarms — Quorum confirmation and batch anomaly detection

FAQ

What is a health check endpoint?

A health check endpoint is a dedicated API route (typically /health or /healthz) that returns the current operational status of your application. It verifies that the app is running, its dependencies (database, cache, external APIs) are reachable, and critical subsystems function correctly. Monitoring tools poll this endpoint to detect outages.

Should I use /health or /healthz?

/health is more readable and widely understood. /healthz originated in Kubernetes (from Google's convention of appending 'z' to internal endpoints). Both work — pick one and be consistent. Kubernetes specifically supports both. If you're building a public API, /health is the more standard choice.

What should a health check endpoint return?

At minimum: HTTP 200 with a JSON body containing overall status and individual dependency checks (database, cache, queue). Include response time for each dependency. Return HTTP 503 when any critical dependency is unhealthy. Always include a timestamp. Optionally: version number, uptime duration, and region identifier.

Should health checks be authenticated?

The basic liveness endpoint (/health/live) should not require authentication — monitoring tools and load balancers need unauthenticated access. The detailed readiness endpoint (/health/ready) can optionally require authentication if it exposes internal architecture details. Never expose sensitive data (connection strings, credentials) in health check responses.

How often should monitoring tools poll health endpoints?

Every 30 seconds for production services with SLA commitments. Every 1-3 minutes for internal tools and staging environments. Every 5 minutes for non-critical services. The endpoint itself should respond within 5 seconds — if dependency checks take longer, implement timeouts and return partial status.

What's the difference between liveness and readiness probes?

A liveness probe checks 'is the process alive?' — if it fails, the container should be restarted. A readiness probe checks 'can this instance handle traffic?' — if it fails, the instance is removed from the load balancer but not restarted. Your app can be alive (liveness pass) but not ready (readiness fail) during startup or when a dependency is down.

Start monitoring your infrastructure

Start Free View Pricing

Monitoring

Features

Tools

Resources