What is Heartbeat Monitoring?
Heartbeat monitoring is a push-based approach where your services actively send periodic signals to a monitoring system to prove they are alive and working. If the signal stops arriving, something has gone wrong — and you get alerted immediately.
Definition
A heartbeat is a regular signal sent by a service, job, or process to a monitoring endpoint to confirm it is running and functioning. The monitoring system expects these signals at a defined interval. If a heartbeat is not received within the timeout window, the system assumes the service has failed and triggers an alert.
Think of it like a "dead man's switch" — as long as the signal keeps coming, everything is fine. The moment it stops, the monitoring system takes action. This is also called push-based monitoring, in contrast to pull-based (polling) approaches like traditional health checks.
How Heartbeat Monitoring Works
The heartbeat pattern is simple but effective. Here is the typical flow:
1Create a Heartbeat Endpoint
You create a unique heartbeat URL in your monitoring system. This URL serves as the destination for your service's heartbeat signals. Each monitored service or job gets its own endpoint.
2Service Sends Regular Pings
Your service sends an HTTP request to the heartbeat URL at regular intervals — typically at the end of each successful run. For a cron job, this might be a curl command at the end of the script. For a continuous worker, it might be a periodic ping inside the main loop.
3Monitoring System Tracks Arrivals
The monitoring system records each heartbeat and resets a countdown timer. As long as heartbeats arrive within the expected interval, the service is considered healthy.
4Alert on Missing Heartbeats
If the countdown timer expires without receiving a heartbeat, the monitoring system creates an incident and sends alerts. This means the service has either crashed, hung, or failed to complete its work.
Heartbeat vs Polling-Based Monitoring
Both approaches detect failures, but they work in opposite directions. Understanding when to use each is key:
| Aspect | Heartbeat (Push) | Polling (Pull) |
|---|---|---|
| Direction | Service sends signal to monitor | Monitor sends request to service |
| Best For | Cron jobs, workers, firewalled services | Websites, APIs, public endpoints |
| Firewall Friendly | Yes — outbound only, no inbound ports needed | No — requires inbound access to the service |
| Detects | Silent failures, hangs, crashes | Unreachable services, slow responses, errors |
| Limitation | Requires service instrumentation | Cannot reach services behind NAT/firewalls |
Best practice: Use both approaches together. Polling-based uptime monitoring for public endpoints, and heartbeat monitoring for internal jobs and background processes. This gives you complete visibility across your infrastructure.
Common Use Cases for Heartbeat Monitoring
Heartbeat monitoring excels in scenarios where polling-based monitoring falls short:
Cron Jobs and Scheduled Tasks
Cron jobs can silently fail for many reasons — the cron daemon crashes, the script errors out, the server reboots and cron is not re-enabled. Adding a heartbeat at the end of each job run ensures you know immediately when a job fails to complete.
# At the end of your cron script:
curl -fsS --retry 3 https://monitor.example.com/heartbeat/abc123
Background Workers and Queue Processors
Message queue consumers, Celery workers, Sidekiq processes — these can hang, leak memory, or stop processing without crashing. A heartbeat inside the processing loop confirms the worker is actively handling jobs, not just sitting idle.
Batch Processing and ETL Pipelines
Data pipelines that run on schedules — ETL jobs, report generation, data synchronization — need assurance that they complete successfully. A heartbeat at the end of each run catches failed transformations, connection timeouts, and partial completions.
Backup Verification
Database backups and file backups are critical but often fail silently. Adding a heartbeat after each successful backup ensures you are alerted immediately when a backup does not complete — before you need to restore from one.
Heartbeat Intervals and Timeout Configuration
Proper timeout configuration is crucial. Too tight and you get false alarms; too loose and you detect failures late.
Expected Interval
How often the heartbeat should arrive. Set this to match your job's schedule or your worker's ping frequency. For a job running every 5 minutes, the expected interval is 5 minutes.
Grace Period
Extra time before marking the service as down. This accounts for job execution time, network delays, and clock drift. A common practice is to set the grace period to 20-50% of the expected interval.
Alert Threshold
How many missed heartbeats before alerting. A threshold of 1 means alert on the first missed heartbeat. A threshold of 2-3 reduces false positives but delays detection.
Example configuration: A cron job runs every 10 minutes. Set the expected interval to 10 minutes, the grace period to 3 minutes (30%), and the alert threshold to 1 missed heartbeat. Total: you are alerted within 13 minutes of a failure.
What Happens When a Heartbeat Is Missed
When the monitoring system detects a missing heartbeat, it follows a defined escalation path:
Grace Period Begins
The expected heartbeat time passes without a signal. The monitoring system enters the grace period, waiting for a delayed heartbeat. No alert is sent yet.
Grace Period Expires
The grace period expires with no heartbeat received. The service is marked as "down" or "late." If the alert threshold is 1, an incident is created immediately. Otherwise, the system waits for the next expected heartbeat.
Incident Created and Alerts Sent
Once the threshold is met, the monitoring system creates an incident and dispatches alerts through configured channels — email, Slack, Discord, Telegram, or webhooks. Your team is notified to investigate.
Recovery on Next Heartbeat
When the service resumes sending heartbeats, the incident is automatically resolved and a recovery notification is sent. This confirms the service is operational again.
Frequently Asked Questions
What is heartbeat monitoring?▼
What is the difference between a heartbeat and a health check?▼
What happens when a heartbeat is missed?▼
How do I choose the right heartbeat interval?▼
Can I use heartbeat monitoring for cron jobs?▼
What services benefit most from heartbeat monitoring?▼
Is heartbeat monitoring the same as a keepalive?▼
Related Glossary Terms
Monitor Your Services with Heartbeat Checks
AtomPing supports cron job monitoring alongside HTTP, TCP, ICMP, DNS, and TLS checks from multiple regions. Get alerted instantly when your jobs and services fail. Free forever plan includes 50 monitors.
Start Monitoring Free