What is Load Balancing?

Load balancing is the process of distributing incoming network traffic across multiple servers to ensure no single server is overwhelmed. It is a fundamental building block for scalable, highly available web applications, improving both performance and reliability.

Definition

Load balancing distributes incoming requests across a pool of backend servers (also called targets, backends, or upstream servers) according to an algorithm. A load balancer sits between clients and servers, acting as a reverse proxy that routes each request to the most appropriate backend.

For example, if you have 3 web servers and 300 concurrent users, a load balancer distributes traffic so each server handles approximately 100 users. If one server fails, the remaining two each handle 150 users without any interruption to service.

How Load Balancing Works

A load balancer operates as an intermediary between clients and your backend servers:

1 Client Sends Request

A user's browser sends an HTTP request to your domain. DNS resolves to the load balancer's IP address, not to any individual server. The client connects to the load balancer as if it were the application itself.

2 Load Balancer Selects Backend

The load balancer applies its configured algorithm (round-robin, least connections, etc.) to select a healthy backend server. It only considers servers that are currently passing health checks.

3 Request Forwarded and Response Returned

The load balancer forwards the request to the selected backend, which processes it and returns a response. The load balancer relays the response back to the client. The client is unaware that multiple servers exist behind the load balancer.

Load Balancing Algorithms

The algorithm determines how the load balancer chooses which backend receives each request:

Round Robin

Requests are distributed sequentially across servers: Server 1, Server 2, Server 3, Server 1, Server 2, and so on. Simple, predictable, and works well when all servers have equal capacity and requests have similar processing costs. The most widely used default algorithm.

Least Connections

Each new request goes to the server with the fewest active connections. This naturally adapts to servers that process requests at different speeds: a slow server accumulates connections while a fast server clears them and receives more traffic. Excellent for workloads with variable request duration.

Weighted Round Robin / Weighted Least Connections

Like their unweighted counterparts but with configurable weights per server. A server with weight 3 receives 3x more traffic than a server with weight 1. Useful when servers have different hardware specs. Also useful during rolling deployments (weight new servers lower initially).

IP Hash

A hash of the client's IP address determines which server receives the request. The same client IP always reaches the same server (as long as the server pool does not change). Provides session persistence without cookies but can create uneven distribution if traffic comes from a few large networks (corporate NATs, mobile carriers).

Least Response Time

Routes requests to the server with the lowest average response time and fewest active connections. This adapts to real-time server performance, sending more traffic to faster servers. Requires the load balancer to actively measure response times from each backend.

Types of Load Balancers

Load balancers are categorized by the network layer they operate at and their deployment model:

Type	How It Works	Strengths	Examples
L4 (Transport)	Routes by IP/port, no request inspection	Very fast, protocol-agnostic, low overhead	HAProxy (TCP mode), AWS NLB, Linux IPVS
L7 (Application)	Inspects HTTP headers, URLs, cookies	Path routing, SSL termination, content-based rules	Nginx, HAProxy (HTTP mode), AWS ALB, Caddy
Hardware	Dedicated appliance with custom ASICs	Extreme throughput, dedicated support	F5 BIG-IP, Citrix ADC
Software	Runs on commodity servers or VMs	Flexible, cost-effective, easily automated	Nginx, HAProxy, Envoy, Traefik

Modern trend: Software load balancers running on commodity hardware have largely replaced expensive hardware appliances. Cloud-managed load balancers (AWS ALB/NLB, Google Cloud LB) provide the benefits of both: high performance with no hardware to manage.

Health Checks and Load Balancers

Health checks are the mechanism that turns a load balancer into a failover system. Without health checks, a load balancer would send traffic to crashed servers:

Active Health Checks

The load balancer periodically sends probe requests to each backend (e.g., HTTP GET /health every 10 seconds). If a backend fails a configured number of consecutive checks (e.g., 3), it is removed from the pool. When it starts passing again, it is re-added. This is the most common and reliable approach.

Passive Health Checks

The load balancer monitors actual request traffic. If a backend starts returning errors (5xx responses, connection timeouts), it is marked unhealthy based on the error rate. No separate probe traffic is needed. Works well as a supplement to active checks, catching issues that health endpoints may not reveal.

Important: Load balancer health checks only validate internal connectivity. They cannot tell you if your service is accessible from the user's perspective. Network issues, DNS problems, or SSL certificate errors between the user and the load balancer are invisible to internal checks. This is why external monitoring from multiple regions is essential.

Monitoring Load-Balanced Services

Monitoring a load-balanced service requires testing from the outside in:

Monitor the service endpoint (through the load balancer): This is what users experience. Monitor the public URL/domain that resolves to the load balancer. AtomPing does this from 10 European locations, catching issues that affect specific regions.
Monitor individual backends separately: Set up separate monitors for each backend server's direct IP or internal hostname. This reveals which specific servers are having issues, even when the load balancer masks the problem by routing around them.
Monitor the load balancer itself: Check the load balancer's health and management interface. A degraded load balancer can cause intermittent issues that are difficult to diagnose from endpoint monitoring alone.
Track response time consistency: If response times vary significantly between requests, it may indicate uneven load distribution or one backend performing worse than others. Response time monitoring helps identify these patterns.
Verify SSL certificate health: Ensure SSL certificates are valid on the load balancer and, if using end-to-end encryption, on each backend. Use AtomPing's TLS monitoring to track certificate expiry dates.

Pro tip: When all backends are degraded but still passing health checks (e.g., responding slowly but within timeout), the load balancer will not remove any of them. External monitoring catches this by alerting on elevated response times, even when internal health checks report all healthy.

Frequently Asked Questions

What is the difference between L4 and L7 load balancing?

L4 (Layer 4 / Transport) load balancing routes based on IP addresses and TCP/UDP ports without inspecting the request content. It is fast and efficient. L7 (Layer 7 / Application) load balancing inspects HTTP headers, URLs, cookies, and request content to make routing decisions. L7 enables advanced features like path-based routing, session affinity, and SSL termination, but adds processing overhead.

Which load balancing algorithm should I use?

Round Robin is the simplest and works well when all servers have similar capacity. Least Connections is better when request processing times vary. Weighted algorithms work when servers have different capacities. IP Hash ensures the same client always reaches the same server (useful for session-based applications). Start with Round Robin and switch if you observe uneven load distribution.

Do I need a load balancer for two servers?

Yes, even with two servers a load balancer provides significant benefits: automatic health checking and failover, even traffic distribution, the ability to take one server offline for maintenance without downtime, and a single entry point for SSL termination. Without a load balancer, DNS round-robin provides basic distribution but no health checking.

Can a load balancer be a single point of failure?

Yes, a single load balancer is itself a single point of failure. Production systems use redundant load balancers in an active-passive or active-active pair. Cloud load balancers (AWS ALB/NLB, Google Cloud LB, Azure LB) are inherently redundant across availability zones. For on-premise setups, use keepalived/VRRP to float a virtual IP between two load balancer instances.

What is sticky sessions / session affinity?

Sticky sessions (session affinity) ensure that all requests from the same client are routed to the same backend server. This is needed when server-side session state is stored locally. The load balancer tracks sessions using cookies or source IP. However, sticky sessions reduce the benefits of load balancing and complicate failover. Best practice is to use shared session storage (Redis, database) and avoid sticky sessions entirely.

How do health checks work in load balancers?

Load balancers periodically send health check requests to each backend server (HTTP GET to a health endpoint, TCP connection check, or ICMP ping). If a server fails a configured number of consecutive checks, it is marked unhealthy and removed from the rotation. When it passes checks again, it is re-added. This provides automatic failover at the application layer.

How does monitoring help with load-balanced services?

External monitoring (like AtomPing) tests your service from the user's perspective, through the load balancer. This catches issues that internal health checks miss: load balancer misconfiguration, SSL certificate problems, DNS issues, or cases where all backends are degraded but still passing health checks. Multi-region monitoring also validates geographic load balancing is routing correctly.

Definition

AtomPing monitors your services from the user's perspective across 10 European locations. Detect when load balancer health checks miss degraded backends. Free plan includes 50 monitors with email, Slack, Discord, and Telegram alerts.

Start Monitoring Free

Monitoring

Features

Tools

Resources

What is Load Balancing?

Definition

How Load Balancing Works

1 Client Sends Request

2 Load Balancer Selects Backend

3 Request Forwarded and Response Returned

Load Balancing Algorithms

Round Robin

Least Connections

Weighted Round Robin / Weighted Least Connections

IP Hash

Least Response Time

Types of Load Balancers

Health Checks and Load Balancers

Active Health Checks

Passive Health Checks

Monitoring Load-Balanced Services

Frequently Asked Questions

Definition

Monitoring

Features

Tools

Resources

What is Load Balancing?

Definition

How Load Balancing Works

1 Client Sends Request

2 Load Balancer Selects Backend

3 Request Forwarded and Response Returned

Load Balancing Algorithms

Round Robin

Least Connections

Weighted Round Robin / Weighted Least Connections

IP Hash

Least Response Time

Types of Load Balancers

Health Checks and Load Balancers

Active Health Checks

Passive Health Checks

Monitoring Load-Balanced Services

Frequently Asked Questions

Related Glossary Terms

Definition