Home/Glossary/Heartbeat

What is Heartbeat Monitoring?

Heartbeat monitoring is a push-based approach where your services actively send periodic signals to a monitoring system to prove they are alive and working. If the signal stops arriving, something has gone wrong — and you get alerted immediately.

Definition

A heartbeat is a regular signal sent by a service, job, or process to a monitoring endpoint to confirm it is running and functioning. The monitoring system expects these signals at a defined interval. If a heartbeat is not received within the timeout window, the system assumes the service has failed and triggers an alert.

Think of it like a "dead man's switch" — as long as the signal keeps coming, everything is fine. The moment it stops, the monitoring system takes action. This is also called push-based monitoring, in contrast to pull-based (polling) approaches like traditional health checks.

How Heartbeat Monitoring Works

The heartbeat pattern is simple but effective. Here is the typical flow:

1Create a Heartbeat Endpoint

You create a unique heartbeat URL in your monitoring system. This URL serves as the destination for your service's heartbeat signals. Each monitored service or job gets its own endpoint.

2Service Sends Regular Pings

Your service sends an HTTP request to the heartbeat URL at regular intervals — typically at the end of each successful run. For a cron job, this might be a curl command at the end of the script. For a continuous worker, it might be a periodic ping inside the main loop.

3Monitoring System Tracks Arrivals

The monitoring system records each heartbeat and resets a countdown timer. As long as heartbeats arrive within the expected interval, the service is considered healthy.

4Alert on Missing Heartbeats

If the countdown timer expires without receiving a heartbeat, the monitoring system creates an incident and sends alerts. This means the service has either crashed, hung, or failed to complete its work.

Heartbeat vs Polling-Based Monitoring

Both approaches detect failures, but they work in opposite directions. Understanding when to use each is key:

AspectHeartbeat (Push)Polling (Pull)
DirectionService sends signal to monitorMonitor sends request to service
Best ForCron jobs, workers, firewalled servicesWebsites, APIs, public endpoints
Firewall FriendlyYes — outbound only, no inbound ports neededNo — requires inbound access to the service
DetectsSilent failures, hangs, crashesUnreachable services, slow responses, errors
LimitationRequires service instrumentationCannot reach services behind NAT/firewalls

Best practice: Use both approaches together. Polling-based uptime monitoring for public endpoints, and heartbeat monitoring for internal jobs and background processes. This gives you complete visibility across your infrastructure.

Common Use Cases for Heartbeat Monitoring

Heartbeat monitoring excels in scenarios where polling-based monitoring falls short:

Cron Jobs and Scheduled Tasks

Cron jobs can silently fail for many reasons — the cron daemon crashes, the script errors out, the server reboots and cron is not re-enabled. Adding a heartbeat at the end of each job run ensures you know immediately when a job fails to complete.

# At the end of your cron script:

curl -fsS --retry 3 https://monitor.example.com/heartbeat/abc123

Background Workers and Queue Processors

Message queue consumers, Celery workers, Sidekiq processes — these can hang, leak memory, or stop processing without crashing. A heartbeat inside the processing loop confirms the worker is actively handling jobs, not just sitting idle.

Batch Processing and ETL Pipelines

Data pipelines that run on schedules — ETL jobs, report generation, data synchronization — need assurance that they complete successfully. A heartbeat at the end of each run catches failed transformations, connection timeouts, and partial completions.

Backup Verification

Database backups and file backups are critical but often fail silently. Adding a heartbeat after each successful backup ensures you are alerted immediately when a backup does not complete — before you need to restore from one.

Heartbeat Intervals and Timeout Configuration

Proper timeout configuration is crucial. Too tight and you get false alarms; too loose and you detect failures late.

Expected Interval

How often the heartbeat should arrive. Set this to match your job's schedule or your worker's ping frequency. For a job running every 5 minutes, the expected interval is 5 minutes.

Grace Period

Extra time before marking the service as down. This accounts for job execution time, network delays, and clock drift. A common practice is to set the grace period to 20-50% of the expected interval.

Alert Threshold

How many missed heartbeats before alerting. A threshold of 1 means alert on the first missed heartbeat. A threshold of 2-3 reduces false positives but delays detection.

Example configuration: A cron job runs every 10 minutes. Set the expected interval to 10 minutes, the grace period to 3 minutes (30%), and the alert threshold to 1 missed heartbeat. Total: you are alerted within 13 minutes of a failure.

What Happens When a Heartbeat Is Missed

When the monitoring system detects a missing heartbeat, it follows a defined escalation path:

Grace Period Begins

The expected heartbeat time passes without a signal. The monitoring system enters the grace period, waiting for a delayed heartbeat. No alert is sent yet.

Grace Period Expires

The grace period expires with no heartbeat received. The service is marked as "down" or "late." If the alert threshold is 1, an incident is created immediately. Otherwise, the system waits for the next expected heartbeat.

Incident Created and Alerts Sent

Once the threshold is met, the monitoring system creates an incident and dispatches alerts through configured channels — email, Slack, Discord, Telegram, or webhooks. Your team is notified to investigate.

Recovery on Next Heartbeat

When the service resumes sending heartbeats, the incident is automatically resolved and a recovery notification is sent. This confirms the service is operational again.

Frequently Asked Questions

What is heartbeat monitoring?
Heartbeat monitoring is a technique where a service sends a regular signal ('heartbeat') to a monitoring system to prove it is alive and running. If the monitoring system stops receiving heartbeats within a configured timeout, it assumes the service has failed and triggers an alert. It is the inverse of polling-based monitoring — the service reports in, rather than being checked from outside.
What is the difference between a heartbeat and a health check?
A health check is a pull-based model — the monitoring system actively sends requests to your service. A heartbeat is a push-based model — your service actively sends signals to the monitoring system. Health checks work well for services with stable endpoints, while heartbeats are better for jobs, workers, and services behind firewalls.
What happens when a heartbeat is missed?
When a heartbeat is not received within the expected timeout window, the monitoring system marks the service as potentially down. Most systems wait for multiple missed heartbeats (or a configurable grace period) before creating an incident, to avoid false alarms from brief network issues or momentary delays.
How do I choose the right heartbeat interval?
The interval should match how frequently your job runs and how quickly you need to detect failures. For cron jobs running every 5 minutes, a heartbeat after each run with a timeout slightly longer than the interval (e.g., 7 minutes) works well. For continuous workers, heartbeats every 30-60 seconds are common.
Can I use heartbeat monitoring for cron jobs?
Yes — this is one of the most common use cases. Add a heartbeat ping (an HTTP request to your monitoring endpoint) at the end of your cron job script. If the job fails, hangs, or doesn't run, no heartbeat is sent, and you get alerted. This catches silent failures that log-based monitoring might miss.
What services benefit most from heartbeat monitoring?
Heartbeat monitoring is ideal for: cron jobs and scheduled tasks, background workers and queue processors, batch processing pipelines, backup scripts, services behind firewalls or NAT (where external checks can't reach them), and any process that should run continuously but might silently fail.
Is heartbeat monitoring the same as a keepalive?
They are related but different. A keepalive is a low-level network mechanism (like TCP keepalive) that maintains a connection between two endpoints. A heartbeat is an application-level signal that confirms a service or job is actively running and completing its work. Heartbeats carry more semantic meaning than keepalives.

Monitor Your Services with Heartbeat Checks

AtomPing supports cron job monitoring alongside HTTP, TCP, ICMP, DNS, and TLS checks from multiple regions. Get alerted instantly when your jobs and services fail. Free forever plan includes 50 monitors.

Start Monitoring Free

We use cookies

We use Google Analytics to understand how visitors interact with our website. Your IP address is anonymized for privacy. By clicking "Accept", you consent to our use of cookies for analytics purposes.