Prometheus监控体系实战：指标采集与告警配置全指南-瀚煜云服

文章最后更新时间：2026-04-14 12:58:05

Prometheus是云原生时代最流行的监控系统，但概念多、配置杂。本文从实战角度讲解Prometheus，帮你快速搭建监控体系。

一、Prometheus核心概念

为什么用Prometheus？

开源免费：社区活跃，功能强大
多维度数据模型：指标+标签
强大的查询语言：PromQL
生态丰富：支持各种Exporter

核心组件

Prometheus Server：采集和存储指标数据
Exporters：导出各种服务的指标
Alertmanager：告警管理
Grafana：可视化展示

二、环境部署

Docker部署

version: '3'
services:
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'

配置文件

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node'
    static_configs:
      - targets: ['node-exporter:9100']

三、常用Exporter

Node Exporter（系统指标）

# Docker启动
docker run -d   --name node-exporter   -p 9100:9100   prom/node-exporter

# 关键指标
node_cpu_seconds_total   # CPU使用时间
node_memory_MemAvailable_bytes  # 可用内存
node_filesystem_avail_bytes  # 文件系统可用空间
node_network_receive_bytes_total  # 网络接收字节

MySQL Exporter（MySQL指标）

# Docker启动
docker run -d   --name mysql-exporter   -p 9104:9104   -e DATA_SOURCE_NAME="user:password@tcp(localhost:3306)/"   prom/mysqld-exporter

# 关键指标
mysql_global_status_threads_connected  # 连接数
mysql_global_status_questions  # 查询数
mysql_perf_schema_events_statements_summary_total  # 语句执行统计

Redis Exporter（Redis指标）

# Docker启动
docker run -d   --name redis-exporter   -p 9121:9121   -e REDIS_ADDR="redis://localhost:6379"   oliver006/redis_exporter

Nginx Exporter（Nginx指标）

# 启用nginx status模块
location /status {
  stub_status on;
  access_log off;
}

# 启动exporter
docker run -d   --name nginx-exporter   -p 9113:9113   nginx/nginx-prometheus-exporter   -nginx.scrape-uri=http://localhost/status

四、PromQL查询

基础查询

# 查询CPU使用率
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# 查询内存使用率
100 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100

# 查询磁盘使用率
100 - (node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100

聚合函数

# 平均值
avg(http_requests_total)

# 最大值
max(http_requests_total)

# 最小值
min(http_requests_total)

# 总和
sum(http_requests_total)

# 计数
count(http_requests_total)

时间函数

# 过去5分钟变化率
rate(http_requests_total[5m])

# 过去5分钟平均值
avg_over_time(http_requests_total[5m])

# 预测未来趋势
predict_linear(http_requests_total[1h], 4*3600)

五、告警配置

告警规则

# alerts.yml
groups:
- name: node_alerts
  rules:
  - alert: HighCPU
    expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High CPU usage on {{ $labels.instance }}"
      description: "CPU usage is above 80% for more than 5 minutes"

  - alert: HighMemory
    expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 90
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High Memory usage on {{ $labels.instance }}"
      description: "Memory usage is above 90% for more than 5 minutes"

告警配置

# prometheus.yml添加
rule_files:
  - "alerts.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['alertmanager:9093']

Alertmanager配置

# alertmanager.yml
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'email'

receivers:
- name: 'email'
  email_configs:
    - to: 'admin@example.com'
      from: 'alert@example.com'
      smtp_host: 'smtp.example.com'
      smtp_port: 587
      smtp_auth_username: 'alert'
      smtp_auth_password: 'password'

六、Grafana集成

配置数据源

1. 访问 Grafana (http://localhost:3000)
2. Configuration > Data Sources > Add data source
3. 选择 Prometheus
4. URL: http://prometheus:9090
5. Save & Test

常用Dashboard

Node Exporter Full：系统监控面板
MySQL Overview：MySQL监控面板
Redis Dashboard：Redis监控面板
NGINX dashboard：Nginx监控面板

七、常见问题

Q：Prometheus数据存储多大？
A：取决于指标数量和保留时间。一般每指标每月约几MB。

Q：告警频繁触发怎么办？
A：调整for参数增加持续时间，或使用告警抑制/静默功能。

Q：监控几百台服务器怎么配置？
A：使用服务发现（DNS、Consul、Kubernetes）自动发现target。

Q：可以监控Docker容器吗？
A：可以使用cAdvisorExporter，专门采集容器指标。

总结

Prometheus是云原生监控标准。核心组件：Server、Exporters、Alertmanager、Grafana。常用Exporters：Node、MySQL、Redis、Nginx。PromQL：基础查询、聚合函数、时间函数。告警配置：规则定义+Alertmanager告警。掌握这些，监控体系搭建不再困难。

瀚煜云提供Prometheus监控方案及运维服务。

文章版权归作者所有，未经允许请勿转载。

THE END