What is Load Balancing?
Load balancing is the process of distributing incoming network traffic across multiple servers to ensure no single server is overwhelmed. It is a fundamental building block for scalable, highly available web applications, improving both performance and reliability.
Definition
Load balancing distributes incoming requests across a pool of backend servers (also called targets, backends, or upstream servers) according to an algorithm. A load balancer sits between clients and servers, acting as a reverse proxy that routes each request to the most appropriate backend.
For example, if you have 3 web servers and 300 concurrent users, a load balancer distributes traffic so each server handles approximately 100 users. If one server fails, the remaining two each handle 150 users without any interruption to service.
How Load Balancing Works
A load balancer operates as an intermediary between clients and your backend servers:
1Client Sends Request
A user's browser sends an HTTP request to your domain. DNS resolves to the load balancer's IP address, not to any individual server. The client connects to the load balancer as if it were the application itself.
2Load Balancer Selects Backend
The load balancer applies its configured algorithm (round-robin, least connections, etc.) to select a healthy backend server. It only considers servers that are currently passing health checks.
3Request Forwarded and Response Returned
The load balancer forwards the request to the selected backend, which processes it and returns a response. The load balancer relays the response back to the client. The client is unaware that multiple servers exist behind the load balancer.
Load Balancing Algorithms
The algorithm determines how the load balancer chooses which backend receives each request:
Round Robin
Requests are distributed sequentially across servers: Server 1, Server 2, Server 3, Server 1, Server 2, and so on. Simple, predictable, and works well when all servers have equal capacity and requests have similar processing costs. The most widely used default algorithm.
Least Connections
Each new request goes to the server with the fewest active connections. This naturally adapts to servers that process requests at different speeds: a slow server accumulates connections while a fast server clears them and receives more traffic. Excellent for workloads with variable request duration.
Weighted Round Robin / Weighted Least Connections
Like their unweighted counterparts but with configurable weights per server. A server with weight 3 receives 3x more traffic than a server with weight 1. Useful when servers have different hardware specs. Also useful during rolling deployments (weight new servers lower initially).
IP Hash
A hash of the client's IP address determines which server receives the request. The same client IP always reaches the same server (as long as the server pool does not change). Provides session persistence without cookies but can create uneven distribution if traffic comes from a few large networks (corporate NATs, mobile carriers).
Least Response Time
Routes requests to the server with the lowest average response time and fewest active connections. This adapts to real-time server performance, sending more traffic to faster servers. Requires the load balancer to actively measure response times from each backend.
Types of Load Balancers
Load balancers are categorized by the network layer they operate at and their deployment model:
| Type | How It Works | Strengths | Examples |
|---|---|---|---|
| L4 (Transport) | Routes by IP/port, no request inspection | Very fast, protocol-agnostic, low overhead | HAProxy (TCP mode), AWS NLB, Linux IPVS |
| L7 (Application) | Inspects HTTP headers, URLs, cookies | Path routing, SSL termination, content-based rules | Nginx, HAProxy (HTTP mode), AWS ALB, Caddy |
| Hardware | Dedicated appliance with custom ASICs | Extreme throughput, dedicated support | F5 BIG-IP, Citrix ADC |
| Software | Runs on commodity servers or VMs | Flexible, cost-effective, easily automated | Nginx, HAProxy, Envoy, Traefik |
Modern trend: Software load balancers running on commodity hardware have largely replaced expensive hardware appliances. Cloud-managed load balancers (AWS ALB/NLB, Google Cloud LB) provide the benefits of both: high performance with no hardware to manage.
Health Checks and Load Balancers
Health checks are the mechanism that turns a load balancer into a failover system. Without health checks, a load balancer would send traffic to crashed servers:
Active Health Checks
The load balancer periodically sends probe requests to each backend (e.g., HTTP GET /health every 10 seconds). If a backend fails a configured number of consecutive checks (e.g., 3), it is removed from the pool. When it starts passing again, it is re-added. This is the most common and reliable approach.
Passive Health Checks
The load balancer monitors actual request traffic. If a backend starts returning errors (5xx responses, connection timeouts), it is marked unhealthy based on the error rate. No separate probe traffic is needed. Works well as a supplement to active checks, catching issues that health endpoints may not reveal.
Important: Load balancer health checks only validate internal connectivity. They cannot tell you if your service is accessible from the user's perspective. Network issues, DNS problems, or SSL certificate errors between the user and the load balancer are invisible to internal checks. This is why external monitoring from multiple regions is essential.
Monitoring Load-Balanced Services
Monitoring a load-balanced service requires testing from the outside in:
- Monitor the service endpoint (through the load balancer): This is what users experience. Monitor the public URL/domain that resolves to the load balancer. AtomPing does this from 10 European locations, catching issues that affect specific regions.
- Monitor individual backends separately: Set up separate monitors for each backend server's direct IP or internal hostname. This reveals which specific servers are having issues, even when the load balancer masks the problem by routing around them.
- Monitor the load balancer itself: Check the load balancer's health and management interface. A degraded load balancer can cause intermittent issues that are difficult to diagnose from endpoint monitoring alone.
- Track response time consistency: If response times vary significantly between requests, it may indicate uneven load distribution or one backend performing worse than others. Response time monitoring helps identify these patterns.
- Verify SSL certificate health: Ensure SSL certificates are valid on the load balancer and, if using end-to-end encryption, on each backend. Use AtomPing's TLS monitoring to track certificate expiry dates.
Pro tip: When all backends are degraded but still passing health checks (e.g., responding slowly but within timeout), the load balancer will not remove any of them. External monitoring catches this by alerting on elevated response times, even when internal health checks report all healthy.
Frequently Asked Questions
What is the difference between L4 and L7 load balancing?▼
Which load balancing algorithm should I use?▼
Do I need a load balancer for two servers?▼
Can a load balancer be a single point of failure?▼
What is sticky sessions / session affinity?▼
How do health checks work in load balancers?▼
How does monitoring help with load-balanced services?▼
Related Glossary Terms
Monitor Load-Balanced Services
AtomPing monitors your services from the user's perspective across 10 European locations. Detect when load balancer health checks miss degraded backends. Free plan includes 50 monitors with email, Slack, Discord, and Telegram alerts.
Start Monitoring Free