Home/Glossary/Latency

What is Network Latency?

Latency is the time delay between a request being sent and a response being received. It is the single most impactful factor in perceived application performance, directly affecting user experience, conversion rates, and search engine rankings.

Definition

Network latency is the time it takes for data to travel from one point to another across a network, typically measured in milliseconds (ms). In web applications, it usually refers to the round-trip time (RTT) for a request to travel from the client to the server and back.

For example, if you send an HTTP request from London to a server in New York and receive a response 120ms later, the round-trip latency is 120ms. This includes network transit time, server processing time, and any queuing delays along the path.

Types of Latency

Latency is not a single measurement but a combination of multiple delays at different layers of the stack:

Propagation Latency

The time for a signal to physically travel through a medium (fiber optic cable, copper wire, or air). This is bounded by the speed of light and is proportional to distance. Light travels through fiber at roughly two-thirds the speed of light in a vacuum, giving approximately 5 microseconds per kilometer of fiber.

Transmission Latency

The time to push all bits of a packet onto the wire. It depends on the packet size and the link bandwidth. Larger payloads on slower links experience higher transmission latency. This is why minimizing payload size (compression, efficient serialization) reduces latency.

Processing Latency

The time a router, switch, or server takes to examine a packet header, check routing tables, and determine the next hop. Modern network equipment processes packets in microseconds, but application-level processing (database queries, computation, API calls to other services) can add significant milliseconds.

Queuing Latency

The time a packet waits in a buffer before being processed. During congestion, packets queue up at routers and servers. This is the most variable component of latency and the primary cause of latency spikes during peak traffic periods.

How to Measure Latency

Accurate latency measurement requires the right tools and methodology. Here are the most common approaches:

Ping (ICMP Echo)

The simplest latency measurement. Sends an ICMP echo request and measures the round-trip time for the reply. Useful for basic network connectivity checks but does not reflect application-level latency since it bypasses TCP/HTTP processing entirely.

ping example.com → 64 bytes: time=24.3ms

Traceroute

Maps the network path between you and a destination, showing the latency at each hop. Invaluable for identifying where latency is introduced along the route. AtomPing offers a traceroute tool for quick network path analysis.

HTTP Response Time & TTFB

Time to First Byte (TTFB) measures the time from sending an HTTP request to receiving the first byte of the response. This captures DNS resolution, TCP handshake, TLS negotiation, and server processing. AtomPing's response time monitoring tracks TTFB across all your endpoints from multiple regions.

Percentile Measurements (p50, p95, p99)

Average latency can be misleading because a few very slow requests skew the mean. Percentile measurements give a more accurate picture: p50 (median) shows typical latency, p95 shows what 95% of users experience, and p99 captures the worst-case scenarios that still affect a meaningful portion of traffic.

What Causes High Latency?

High latency can originate from many points in the request path. Understanding common causes helps you diagnose and resolve issues faster:

CauseLayerImpactDiagnosis
Geographic distanceNetworkFixed baseline latencyTraceroute, multi-region tests
Network congestionNetworkVariable spikes, packet lossTraceroute hop-by-hop analysis
DNS resolutionApplicationAdded delay per requestDNS lookup timing
TLS handshakeTransport1-2 extra round tripsConnection timing breakdown
Slow database queriesApplicationHigh server processing timeTTFB monitoring, query logs
Resource contentionServerIncreased queuingCPU/memory metrics, load testing

How to Reduce Latency

Reducing latency requires a systematic approach targeting each layer of the stack:

1. Reduce Geographic Distance

Deploy servers in regions close to your users. Use CDNs to cache static content at edge locations. For dynamic content, consider multi-region application deployment with load balancing to route users to the nearest region.

2. Optimize Application Code

Reduce server processing time by optimizing database queries (add indexes, avoid N+1 queries), implementing caching layers (Redis, Memcached), and minimizing synchronous external API calls. Even small per-request improvements compound significantly at scale.

3. Minimize Payload Size

Enable compression (gzip, Brotli) for HTTP responses. Optimize images and use modern formats (WebP, AVIF). Minimize JavaScript bundles. Smaller payloads mean less transmission latency and faster page loads. Use AtomPing's website speed test to measure your page payload.

4. Reduce Connection Overhead

Use HTTP/2 or HTTP/3 to multiplex requests over a single connection. Enable TLS 1.3 for faster handshakes (1-RTT vs 2-RTT). Implement connection pooling for database and upstream connections. Use DNS prefetching and preconnect hints in HTML.

5. Implement Caching at Every Layer

Browser caching (Cache-Control headers), CDN caching, application-level caching (in-memory stores), and database query caching each eliminate round trips. The fastest request is one that never reaches your origin server.

Latency Monitoring and Alerting

Continuous latency monitoring from multiple geographic locations is essential for maintaining a fast, reliable service. Without monitoring, you are blind to latency degradation until users complain.

What to Monitor

  • TTFB (Time to First Byte): Captures DNS + TCP + TLS + server processing time. The single most important latency metric for HTTP services.
  • TCP connection time: Reveals network-level latency independent of application performance.
  • DNS resolution time: Slow DNS adds latency to every new connection. Monitor with AtomPing's DNS lookup tool.
  • Regional differences: Multi-region monitoring reveals geographic performance gaps that single-location testing misses.

Pro tip: Set up latency alerts with thresholds that match your SLA commitments. AtomPing monitors from 10 European locations and alerts via email, Slack, Discord, or Telegram when response times exceed your configured thresholds.

Frequently Asked Questions

What is a good latency for a website?
For web applications, latency under 100ms is considered good for API responses, and under 200ms for full page loads is ideal. Users start noticing delays above 300ms, and anything over 1 second significantly impacts user experience and conversion rates. The acceptable threshold depends on the type of interaction and user expectations.
What's the difference between latency and response time?
Latency is the time it takes for a data packet to travel from source to destination (one-way or round-trip). Response time includes latency plus the time the server spends processing the request and generating a response. Response time = network latency + server processing time. Both are measured in milliseconds.
Can latency be zero?
No. Even on the same machine, there is some processing latency. Network latency has a physical lower bound determined by the speed of light through fiber optic cables (approximately 5 microseconds per kilometer). A request traveling from New York to London (5,500 km) has a minimum round-trip latency of about 55ms due to physics alone.
How does CDN reduce latency?
A CDN (Content Delivery Network) caches content at edge servers geographically close to users. Instead of every request traveling to your origin server (potentially thousands of kilometers away), users are served from the nearest edge location. This reduces the physical distance data must travel, directly lowering network latency.
Why does my latency spike at certain times?
Latency spikes typically occur during peak traffic periods when servers or network links become congested. Common causes include: increased user traffic, background batch jobs competing for resources, network congestion at ISP peering points, garbage collection pauses in application servers, or database lock contention under high write loads.
How does multi-region monitoring help with latency?
Multi-region monitoring measures latency from different geographic locations simultaneously, giving you a complete picture of user experience worldwide. AtomPing monitors from 10 European locations, helping you identify if latency issues are regional (affecting one area) or global (affecting all users). This helps you pinpoint whether the problem is in your infrastructure or the network path.
What is tail latency and why does it matter?
Tail latency refers to the slowest requests, typically measured at the 95th or 99th percentile (p95/p99). While your median latency might be 50ms, your p99 could be 500ms, meaning 1% of users experience 10x slower responses. Tail latency matters because at scale, a significant number of users hit these slow paths, and a single slow microservice can cascade delays across the entire request chain.

Monitor Latency Across Regions

AtomPing monitors response time and TTFB from 10 European locations, detecting latency degradation before your users notice. Free forever plan includes 50 monitors with email alerts.

Start Monitoring Free

We use cookies

We use Google Analytics to understand how visitors interact with our website. Your IP address is anonymized for privacy. By clicking "Accept", you consent to our use of cookies for analytics purposes.