Pricing Blog Compare Glossary
Login Start Free

What is the Circuit Breaker Pattern?

The circuit breaker pattern is a design pattern that prevents cascading failures in distributed systems. It acts like an electrical circuit breaker: when failures exceed a threshold, it "trips" and stops sending requests to the failing service. This protects the rest of the system from being overwhelmed by requests to a service that can't handle them.

Definition

Circuit breaker pattern is a resilience design pattern that wraps potentially failing operations (API calls, database queries, external service calls) with a state machine that monitors failures and stops calling the service when it becomes unhealthy. Instead of letting requests queue up and timeout, the circuit breaker fails fast by immediately rejecting calls when the circuit is open.

The pattern gets its name from electrical circuit breakers, which automatically switch off power when detecting electrical overload, protecting the rest of the system from damage.

The Three States: Closed, Open, Half-Open

A circuit breaker operates as a state machine with three states:

Closed (Normal Operation)

The circuit is closed and operating normally. Calls flow through to the downstream service. The circuit breaker monitors for failures (exceptions, timeouts, 5xx responses).

Transition to Open: When failure count > threshold (e.g., 5 failures) OR failure rate > threshold (e.g., 50%) — immediately switch to Open.

Open (Failing, Calls Rejected)

The circuit is open. All calls immediately fail without being sent to the downstream service. Clients receive an error (exception or fallback response) instantly instead of waiting for timeout.

Why fail fast? A downstream service timing out at 30 seconds can exhaust thread pools. Failing immediately in milliseconds frees resources and prevents cascading overload.
Transition to Half-Open: After timeout expires (e.g., 60 seconds), switch to Half-Open to test if the service recovered.

Half-Open (Testing Recovery)

The circuit is half-open. A limited number of test requests (usually 1-3) are allowed through to check if the downstream service has recovered. This is the recovery path.

Transition to Closed: If test requests succeed → service recovered → switch back to Closed.
Transition to Open: If test requests fail → service still sick → switch back to Open and wait longer.

State Transition Diagram

Here's how the circuit breaker moves between states:

CLOSED
Normal
Failures exceed threshold
OPEN
Failing
↑ Timeout expires ↓
HALF-OPEN
Testing
Test requests fail
← Back to OPEN
Test requests succeed
→ Back to CLOSED

How Circuit Breaker Prevents Cascading Failures

Consider a system without circuit breakers:

Without Circuit Breaker (Cascading Failure)

  1. Payment service goes down
  2. Web server tries to call payment service, request times out (30 seconds)
  3. 1000 concurrent requests all timeout, blocking 1000 threads
  4. Web server runs out of threads, can't serve ANY requests (including health checks)
  5. Load balancer detects web server is unhealthy, removes it
  6. All traffic concentrated on remaining web servers, same problem cascades
  7. Entire system down despite only payment service failing

With Circuit Breaker (Isolated Failure)

  1. Payment service goes down
  2. Web server calls payment service, gets error
  3. After 5 errors, circuit breaker opens
  4. Future payment calls immediately rejected (fail-fast) in milliseconds
  5. Threads freed up, web server can serve other requests
  6. Web server shows degraded state (payment unavailable) but stays online
  7. Rest of system continues functioning
  8. When payment service recovers, circuit breaker closes and normal flow resumes

Key insight: Circuit breaker trades immediate failure of one service for protecting the entire system. Better to show "Payment unavailable, try again later" than to take down the entire website.

Configuration Parameters

Most circuit breaker implementations expose these configurable parameters:

Failure Threshold

Number of failures or failure rate before opening the circuit. Examples: "5 consecutive failures" or "50% of requests failing in last 100 requests".

failureThreshold: 5 errors or 50% error rate

Timeout Duration

How long to stay in Open state before attempting Half-Open. Too short: hammers recovering service. Too long: users see errors longer than necessary.

timeout: 30-60 seconds (configurable per service)

Half-Open Request Count

Number of test requests allowed in Half-Open state. 1 = conservative, 3-5 = more aggressive testing.

half_open_requests: 1-3 test requests

Success Threshold for Half-Open

How many test requests must succeed before closing the circuit. 1 = close on first success, 3 = require all 3 to succeed.

success_threshold: 1-3 successes

Fallback Response (Optional)

When circuit is open, return a fallback response instead of error. Example: return cached result, default value, or retry queue.

onOpen: return cached data or queued result

Example Implementation (Pseudocode)

Here's a simplified circuit breaker implementation:

class CircuitBreaker:
def __init__(self, failure_threshold=5, timeout=60):
self.state = "CLOSED"
self.failure_count = 0
self.last_open_time = None
self.timeout = timeout
def call(self, func, *args):
if self.state == "OPEN":
if time.now() - self.last_open_time > self.timeout:
self.state = "HALF-OPEN"
else:
raise CircuitBreakerOpenException()
try:
result = func(*args)
if self.state == "HALF-OPEN":
self.state = "CLOSED" # Recovery succeeded
self.failure_count = 0
return result
except Exception as e:
self.failure_count += 1
if self.failure_count >= failure_threshold:
self.state = "OPEN"
self.last_open_time = time.now()
raise

Circuit Breaker Libraries

Modern frameworks provide ready-made circuit breaker implementations:

Library Language Features
Hystrix Java Netflix's original. Monitors, bulkheads, fallbacks. Deprecated but still widely used.
Resilience4j Java Lightweight alternative to Hystrix. Modern API, functional style. Recommended for new Java projects.
Polly .NET/C# Resilience library with circuit breaker, retry, timeout, bulkhead. Fluent API.
pybreaker Python Simple Python circuit breaker. Decorator-based, easy to use.
Go-circuitbreaker Go Multiple Go circuit breaker libraries (gobreaker, hystrix-go). Use with context timeouts.

Recommendation: Use a mature library instead of rolling your own. Circuit breaker logic is subtle (race conditions, state transitions, fallbacks) and battle-tested libraries handle edge cases better.

Related Resilience Patterns

Circuit breaker is one part of a resilience toolkit. Use together with:

Timeout

Set maximum wait time per request. Prevents indefinite hangs. Example: 5-second timeout on API calls.

Retry with Exponential Backoff

Automatically retry failed requests with increasing delays (1s, 2s, 4s, 8s) to allow transient failures to recover.

Bulkhead (Thread Pool Isolation)

Dedicate separate thread pools for different services. If payment API is slow, it doesn't starve other operations.

Fallback (Graceful Degradation)

When circuit is open, return a fallback response: cached data, default value, or queue for later processing.

Rate Limiting / Adaptive Throttling

Limit traffic to downstream services to prevent overwhelming them. Dynamically adjust based on service health.

Best practice: Use circuit breaker + timeout + retry together. Timeout prevents hanging. Circuit breaker prevents cascading. Retry recovers from transient failures.

Monitoring Circuit Breaker State

Circuit breaker state is an operational signal. Monitor it to detect problems early:

Alert: Circuit Breaker Opened

If a circuit breaker opens, a downstream service is having problems. Alert the on-call team and start investigating.

Alert: Circuit Breaker Stuck in Open

If a circuit breaker stays open for > 5 minutes, something is seriously wrong. The downstream service isn't recovering.

Metric: Rejection Rate

Track percentage of requests rejected due to open circuit. High rejection rate = cascading failure prevention kicking in.

Metric: Recovery Time

Measure time from circuit opening to closing. Fast recovery = service recovered quickly. Slow recovery = service stability issue.

Frequently Asked Questions

What is the circuit breaker pattern?

The circuit breaker pattern is a design pattern for handling failures in distributed systems. It wraps a potentially failing operation with a state machine that monitors failures. When failures exceed a threshold, the circuit 'trips' and future calls immediately fail fast instead of attempting the operation. This prevents cascading failures where one slow service takes down the entire system.

What are the three states of a circuit breaker?

Closed (normal, calls pass through), Open (failing, calls rejected immediately), and Half-Open (testing recovery, a few calls allowed to test if the service recovered). The circuit transitions: Closed → Open when failure threshold reached, Open → Half-Open after timeout, Half-Open → Closed if test calls succeed, or back to Open if they fail.

Why is fail-fast better than waiting for timeout?

When a service is down, waiting for a 30-second timeout per request wastes resources and thread pools. If 1000 requests queue up waiting for timeout, you can have 1000 blocked threads consuming memory and CPU. Fail-fast (circuit open) rejects requests immediately, freeing resources for healthy operations. Users get error feedback in milliseconds instead of 30 seconds.

When does a circuit breaker transition to half-open?

After a configurable timeout in the Open state (typically 30-60 seconds). In half-open state, the circuit breaker allows a limited number of test requests through (often just 1) to check if the downstream service has recovered. If test requests succeed, the circuit closes and normal operation resumes. If they fail, it opens again.

How is circuit breaker different from timeout?

Timeout is a per-request setting (fail if > 5 seconds). Circuit breaker is a per-service setting that tracks failure patterns across many requests. Timeout stops a single slow request. Circuit breaker stops calling a sick service entirely, protecting the whole system. Use both: timeouts prevent hanging requests, circuit breakers prevent cascading failures.

Can I use circuit breaker for everything?

No. Circuit breakers are best for calls to external services and databases that might fail or be temporarily unavailable. Don't use them for internal function calls or operations that shouldn't fail. The overhead isn't justified if the service is always healthy.

Definition

Monitor your services and detect when circuit breakers trip. AtomPing's multi-region health checks verify service availability and catch cascading failures early. Real-time alerts when your dependencies go down. Free forever plan includes 50 monitors.

Start Monitoring Free