What is the Circuit Breaker Pattern?
The circuit breaker pattern is a design pattern that prevents cascading failures in distributed systems. It acts like an electrical circuit breaker: when failures exceed a threshold, it "trips" and stops sending requests to the failing service. This protects the rest of the system from being overwhelmed by requests to a service that can't handle them.
Definition
Circuit breaker pattern is a resilience design pattern that wraps potentially failing operations (API calls, database queries, external service calls) with a state machine that monitors failures and stops calling the service when it becomes unhealthy. Instead of letting requests queue up and timeout, the circuit breaker fails fast by immediately rejecting calls when the circuit is open.
The pattern gets its name from electrical circuit breakers, which automatically switch off power when detecting electrical overload, protecting the rest of the system from damage.
The Three States: Closed, Open, Half-Open
A circuit breaker operates as a state machine with three states:
Closed (Normal Operation)
The circuit is closed and operating normally. Calls flow through to the downstream service. The circuit breaker monitors for failures (exceptions, timeouts, 5xx responses).
Open (Failing, Calls Rejected)
The circuit is open. All calls immediately fail without being sent to the downstream service. Clients receive an error (exception or fallback response) instantly instead of waiting for timeout.
Half-Open (Testing Recovery)
The circuit is half-open. A limited number of test requests (usually 1-3) are allowed through to check if the downstream service has recovered. This is the recovery path.
State Transition Diagram
Here's how the circuit breaker moves between states:
How Circuit Breaker Prevents Cascading Failures
Consider a system without circuit breakers:
Without Circuit Breaker (Cascading Failure)
- Payment service goes down
- Web server tries to call payment service, request times out (30 seconds)
- 1000 concurrent requests all timeout, blocking 1000 threads
- Web server runs out of threads, can't serve ANY requests (including health checks)
- Load balancer detects web server is unhealthy, removes it
- All traffic concentrated on remaining web servers, same problem cascades
- Entire system down despite only payment service failing
With Circuit Breaker (Isolated Failure)
- Payment service goes down
- Web server calls payment service, gets error
- After 5 errors, circuit breaker opens
- Future payment calls immediately rejected (fail-fast) in milliseconds
- Threads freed up, web server can serve other requests
- Web server shows degraded state (payment unavailable) but stays online
- Rest of system continues functioning
- When payment service recovers, circuit breaker closes and normal flow resumes
Key insight: Circuit breaker trades immediate failure of one service for protecting the entire system. Better to show "Payment unavailable, try again later" than to take down the entire website.
Configuration Parameters
Most circuit breaker implementations expose these configurable parameters:
Failure Threshold
Number of failures or failure rate before opening the circuit. Examples: "5 consecutive failures" or "50% of requests failing in last 100 requests".
Timeout Duration
How long to stay in Open state before attempting Half-Open. Too short: hammers recovering service. Too long: users see errors longer than necessary.
Half-Open Request Count
Number of test requests allowed in Half-Open state. 1 = conservative, 3-5 = more aggressive testing.
Success Threshold for Half-Open
How many test requests must succeed before closing the circuit. 1 = close on first success, 3 = require all 3 to succeed.
Fallback Response (Optional)
When circuit is open, return a fallback response instead of error. Example: return cached result, default value, or retry queue.
Example Implementation (Pseudocode)
Here's a simplified circuit breaker implementation:
Circuit Breaker Libraries
Modern frameworks provide ready-made circuit breaker implementations:
| Library | Language | Features |
|---|---|---|
| Hystrix | Java | Netflix's original. Monitors, bulkheads, fallbacks. Deprecated but still widely used. |
| Resilience4j | Java | Lightweight alternative to Hystrix. Modern API, functional style. Recommended for new Java projects. |
| Polly | .NET/C# | Resilience library with circuit breaker, retry, timeout, bulkhead. Fluent API. |
| pybreaker | Python | Simple Python circuit breaker. Decorator-based, easy to use. |
| Go-circuitbreaker | Go | Multiple Go circuit breaker libraries (gobreaker, hystrix-go). Use with context timeouts. |
Recommendation: Use a mature library instead of rolling your own. Circuit breaker logic is subtle (race conditions, state transitions, fallbacks) and battle-tested libraries handle edge cases better.
Related Resilience Patterns
Circuit breaker is one part of a resilience toolkit. Use together with:
Timeout
Set maximum wait time per request. Prevents indefinite hangs. Example: 5-second timeout on API calls.
Retry with Exponential Backoff
Automatically retry failed requests with increasing delays (1s, 2s, 4s, 8s) to allow transient failures to recover.
Bulkhead (Thread Pool Isolation)
Dedicate separate thread pools for different services. If payment API is slow, it doesn't starve other operations.
Fallback (Graceful Degradation)
When circuit is open, return a fallback response: cached data, default value, or queue for later processing.
Rate Limiting / Adaptive Throttling
Limit traffic to downstream services to prevent overwhelming them. Dynamically adjust based on service health.
Best practice: Use circuit breaker + timeout + retry together. Timeout prevents hanging. Circuit breaker prevents cascading. Retry recovers from transient failures.
Monitoring Circuit Breaker State
Circuit breaker state is an operational signal. Monitor it to detect problems early:
Alert: Circuit Breaker Opened
If a circuit breaker opens, a downstream service is having problems. Alert the on-call team and start investigating.
Alert: Circuit Breaker Stuck in Open
If a circuit breaker stays open for > 5 minutes, something is seriously wrong. The downstream service isn't recovering.
Metric: Rejection Rate
Track percentage of requests rejected due to open circuit. High rejection rate = cascading failure prevention kicking in.
Metric: Recovery Time
Measure time from circuit opening to closing. Fast recovery = service recovered quickly. Slow recovery = service stability issue.
Frequently Asked Questions
What is the circuit breaker pattern?
The circuit breaker pattern is a design pattern for handling failures in distributed systems. It wraps a potentially failing operation with a state machine that monitors failures. When failures exceed a threshold, the circuit 'trips' and future calls immediately fail fast instead of attempting the operation. This prevents cascading failures where one slow service takes down the entire system.
What are the three states of a circuit breaker?
Closed (normal, calls pass through), Open (failing, calls rejected immediately), and Half-Open (testing recovery, a few calls allowed to test if the service recovered). The circuit transitions: Closed → Open when failure threshold reached, Open → Half-Open after timeout, Half-Open → Closed if test calls succeed, or back to Open if they fail.
Why is fail-fast better than waiting for timeout?
When a service is down, waiting for a 30-second timeout per request wastes resources and thread pools. If 1000 requests queue up waiting for timeout, you can have 1000 blocked threads consuming memory and CPU. Fail-fast (circuit open) rejects requests immediately, freeing resources for healthy operations. Users get error feedback in milliseconds instead of 30 seconds.
When does a circuit breaker transition to half-open?
After a configurable timeout in the Open state (typically 30-60 seconds). In half-open state, the circuit breaker allows a limited number of test requests through (often just 1) to check if the downstream service has recovered. If test requests succeed, the circuit closes and normal operation resumes. If they fail, it opens again.
How is circuit breaker different from timeout?
Timeout is a per-request setting (fail if > 5 seconds). Circuit breaker is a per-service setting that tracks failure patterns across many requests. Timeout stops a single slow request. Circuit breaker stops calling a sick service entirely, protecting the whole system. Use both: timeouts prevent hanging requests, circuit breakers prevent cascading failures.
Can I use circuit breaker for everything?
No. Circuit breakers are best for calls to external services and databases that might fail or be temporarily unavailable. Don't use them for internal function calls or operations that shouldn't fail. The overhead isn't justified if the service is always healthy.
Definition
Monitor your services and detect when circuit breakers trip. AtomPing's multi-region health checks verify service availability and catch cascading failures early. Real-time alerts when your dependencies go down. Free forever plan includes 50 monitors.
Start Monitoring Free