Internal vs External Monitoring: Why You Need Both

Internal monitoring sees your infrastructure. External monitoring sees what users see. Learn when to use each, how they complement each other, and how to build a complete monitoring strategy.

2026-03-26 · 10 min · Technical Guide

Prometheus показывает, что CPU на 20%, память на 40%, все поды в Running. Grafana dashboards — зелёные. Но пользователи пишут в support: «сайт не открывается».

Причина: DNS-провайдер лёг. Или Cloudflare кеширует старый certificate. Или ISP в Германии имеет routing problem. Внутренний мониторинг этого не видит — он смотрит изнутри. Нужен взгляд снаружи.

Internal monitoring: взгляд изнутри

Internal monitoring — агенты и экспортёры, работающие внутри вашей инфраструктуры. Они собирают метрики о том, как работают ваши системы.

Что видит:

— CPU, memory, disk, network I/O каждого сервера/контейнера

— Application metrics: request rate, error rate, latency (RED metrics)

— Database performance: query time, connection pool, replication lag

— Queue depths, worker throughput, job failure rates

— Application logs, traces, profiling data

Чего НЕ видит:

— DNS resolution failures у пользователей

— SSL/TLS проблемы (expired cert, chain issues, mixed content)

— CDN outages и cache poisoning

— Load balancer misconfigurations

— ISP routing problems и BGP hijacks

— Firewall rules blocking legitimate traffic

Инструменты: Prometheus + Grafana, Datadog, New Relic APM, CloudWatch, node_exporter, cAdvisor, OpenTelemetry.

External monitoring: взгляд снаружи

External monitoring — агенты в разных географических точках, которые делают то же, что делает пользователь: открывают URL, отправляют запрос, проверяют ответ.

Что видит:

— Доступность сервиса из разных регионов (synthetic checks)

— Response time с точки зрения пользователя (включая DNS, TLS, network latency)

— Корректность ответа (keyword checks, JSON path assertions)

— DNS resolution, SSL certificate validity, TCP connectivity

— Всю цепочку доставки: DNS → CDN → Load Balancer → App → DB → Response

Чего НЕ видит:

— Внутренние метрики (CPU, memory, disk — только косвенно через response time)

— Корневую причину: «503» — это DB down? Container crash? Deployment error?

— Internal services, не exposed наружу

Инструменты: AtomPing (9 check types, quorum confirmation), Pingdom, UptimeRobot, Better Stack, Checkly.

Слепые зоны каждого подхода

Только internal monitoring: «всё зелёное, но сайт лежит»

Сценарий 1: DNS-провайдер обновил NS records с ошибкой. Ваш сервер работает, но домен не резолвится. Prometheus видит 0% CPU usage (потому что никто не дошёл до сервера). Всё «хорошо».

Сценарий 2: Let's Encrypt не обновил сертификат (webhook failed). Приложение работает, но браузеры показывают «Not Secure». Internal monitoring не проверяет TLS chain.

Сценарий 3: CDN (Cloudflare/CloudFront) лёг в EU регионе. US пользователи работают нормально, EU — видят 502. Ваш сервер в порядке.

Только external monitoring: «сайт лежит, но непонятно почему»

Сценарий 1: AtomPing видит HTTP 503 на /api/checkout. Но 503 — это что? Database connection exhausted? OOMKilled container? Rate limited by Stripe? Без internal metrics — диагностика вслепую.

Сценарий 2: Response time вырос с 200ms до 3s. External monitor алертит. Но причина — memory leak с нарастающим GC? Slow query после миграции? Noisy neighbor на shared host? Нужен Grafana dashboard.

Как они работают вместе

Detection → External. AtomPing обнаруживает: checkout API вернул 503. Alert уходит в Slack/Telegram/PagerDuty. On-call инженер получает уведомление.

Diagnosis → Internal. Инженер открывает Grafana. Видит: payment-service pod — OOMKilled 2 минуты назад. Memory usage росла 30 минут (memory leak). Kubernetes перезапустил pod, но новый тоже скоро упадёт.

Fix → Based on diagnosis. Инженер находит memory leak в новом релизе, откатывает deployment. Grafana показывает memory стабилизировалась. AtomPing подтверждает: checkout API снова отвечает 200.

Communication → External-driven. Status page автоматически обновляется из AtomPing. Пользователи видят: «Payments: Resolved».

Что мониторить каждым подходом

Метрика	External	Internal
Сайт доступен?	Primary	—
Response time для пользователя	Primary	Complementary
SSL/TLS валидность	Primary	—
DNS resolution	Primary	—
CPU / Memory / Disk	—	Primary
Database performance	Indirect	Primary
Application errors & logs	—	Primary
Queue depths	—	Primary
Content correctness	Primary	—

Практический план

День 1 (5 минут): AtomPing — HTTP checks на ключевые endpoints, DNS monitor, SSL monitor. Бесплатно, 50 monitors.

День 2 (30 минут): Health check endpoints в каждом сервисе. AtomPing monitors на каждый /health/ready.

Неделя 1: Prometheus + Grafana для infrastructure metrics. node_exporter на серверах.

Неделя 2: Application metrics (request rate, error rate, latency) через OpenTelemetry или Prometheus client.

Месяц 1: Distributed tracing (Jaeger/Tempo), custom Grafana dashboards, на-сервисные alerts.

External monitoring — первая линия обороны. Internal monitoring — вторая. Вместе они дают полную картину: что сломалось (external), почему (internal), и как быстро починили (оба).

Связанные материалы

Полное руководство по uptime monitoring — external monitoring от А до Я

Мониторинг микросервисов — 4 слоя мониторинга для distributed systems

Health Check Endpoint Design — мост между internal и external monitoring

Incident Management Guide — от detection (external) до diagnosis (internal) и resolution

FAQ

What is internal monitoring?

Internal monitoring collects metrics from inside your infrastructure — CPU usage, memory, disk I/O, application logs, database query performance, queue depths. Tools like Prometheus, Grafana, Datadog agents, and CloudWatch run inside your network and observe system internals.

What is external monitoring?

External monitoring checks your services from outside your infrastructure — the same perspective as your users. It sends HTTP requests, DNS queries, TCP connections, and ICMP pings from distributed locations to verify your service is reachable, fast, and returning correct content. AtomPing is an external monitoring tool.

Can I use only internal monitoring?

No. Internal monitoring has a blind spot: it can't detect problems between your infrastructure and your users. DNS resolution failures, CDN outages, TLS certificate issues, ISP routing problems, and load balancer misconfigurations are invisible to internal tools but immediately caught by external monitoring.

Can I use only external monitoring?

For basic needs — yes, external monitoring covers the most critical question: can users reach my service? But when something breaks, external monitoring tells you WHAT failed, not WHY. Internal monitoring provides the diagnostic detail: which server is overloaded, which query is slow, which container is leaking memory.

How do internal and external monitoring work together?

External monitoring detects the problem (checkout API returns 503). Internal monitoring diagnoses the cause (payment-service pod OOMKilled, database connection pool exhausted). Use external monitoring for alerting (wake someone up) and internal monitoring for troubleshooting (find the root cause).

Which should I set up first?

External monitoring. It answers the most important question — are my users affected? — and takes 5 minutes to set up. Internal monitoring requires agents, dashboards, and configuration. Start with AtomPing (external) for immediate coverage, add Prometheus/Grafana (internal) as your team and infrastructure grow.

Start monitoring your infrastructure

Start Free View Pricing

Monitoring

Features

Tools