Docker监控平台prometheus和grafana,监控redis,mysql,docker,服务器信息
Docker监控平台prometheus和grafana,监控redis,mysql,docker,服务器信息
- 一、通过redis_exporter监控redis
- 1.1 下载镜像
- 1.2 运行服务
- 1.3 配置 Prometheus 添加redis监控目标主机
- 1.4 监控Redis集群,配置Prometheus.yml
- 1.5 重启Prometheus
- 1.6 接入Grafana监控展示模板
- 1.7 告警规则
- 二、通过CAdvisor监控Docker
- 2.1 启动CAdvisor
- 2.2 配置Prometheus
- 2.3 重启prometheus容器
- 2.4 Prometheus监控端查看是否介入
- 2.5 访问被监控端cAdvisor捕获的数据
- 2.6 配置Grafana监控图表
- 三、通过mysql_exporter监控Mysql
- 四、通过node_exporter监控服务器信息
- 五、Prometheus监控JVM
相关博文原文地址:
博客园:小哥boy:Prometheus入门到放弃(7)之redis_exporter部署
豌豆代理:Prometheus 监控Docker服务器及Granfanna可视化
简书:baiyongjie:prometheus 监控Docker
博客园:海口-熟练工:参考Redis集群监控方案:Prometheus 监控 Redis 集群的正确姿势
一、通过redis_exporter监控redis
1.1 下载镜像
docker pull oliver006/redis_exporter
1.2 运行服务
docker run -d --name redis_exporter -p 9121:9121 oliver006/redis_exporter --redis.addr redis://172.16.11.51:6379 --redis.password '123456'
参数解释:
- –redis.addr 指定redis地址,由于这里使用docker起的服务,所以不能使用127.0.0.1地址。
- –redis.password redis认证密码,如果没有密码,该参数不需要
Docker方式启动Prometheus:
docker run -d -p 9090:9090 --name=prometheus -v /root/software/prometheus/prometheus-config.yml:/etc/prometheus/prometheus.yml prom/prometheus
1.3 配置 Prometheus 添加redis监控目标主机
- job_name: "redis-sit"
static_configs:
- targets: ['172.16.11.51:9121']
1.4 监控Redis集群,配置Prometheus.yml
切记注意yml的格式缩进问题!!!
- job_name: 'redis_exporter_targets'
static_configs:
- targets:
- redis://172.18.11.139:7000
- redis://172.18.11.139:7001
- redis://172.18.11.140:7002
- redis://172.18.11.140:7003
- redis://172.18.11.141:7004
- redis://172.18.11.141:7005
metrics_path: /scrape
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 172.18.11.139:9121
启动Redis_exporter的命令如上:
只要能连接到一个集群的一个节点,自然就能查询其他节点的指标了。
docker run -d --name redis_exporter -p 9121:9121 oliver006/redis_exporter --redis.addr redis://172.16.11.51:6379 --redis.password '123456'
1.5 重启Prometheus
Docker方式启动Prometheus:
docker run -d -p 9090:9090 --name=prometheus -v /root/software/prometheus/prometheus-config.yml:/etc/prometheus/prometheus.yml prom/prometheus
重启prometheus后,可以看到redis主机已经添加到prometheus监控列表:
1.6 接入Grafana监控展示模板
redis_exporter 监控模板,业界普遍推荐使用Grafana模板库中编号为763的,https://grafana.com/dashboards/763
如下图所示,填写,模板编号,点击加载即可。
1.7 告警规则
groups:
- name: Redis
rules:
- alert: RedisDown
expr: redis_up == 0
for: 5m
labels:
severity: error
annotations:
summary: "Redis down (instance {{ $labels.instance }})"
description: "Redis 挂了啊,mmp\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: MissingBackup
expr: time() - redis_rdb_last_save_timestamp_seconds > 60 * 60 * 24
for: 5m
labels:
severity: error
annotations:
summary: "Missing backup (instance {{ $labels.instance }})"
description: "Redis has not been backuped for 24 hours\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: OutOfMemory
expr: redis_memory_used_bytes / redis_total_system_memory_bytes * 100 > 90
for: 5m
labels:
severity: warning
annotations:
summary: "Out of memory (instance {{ $labels.instance }})"
description: "Redis is running out of memory (> 90%)\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ReplicationBroken
expr: delta(redis_connected_slaves[1m]) < 0
for: 5m
labels:
severity: error
annotations:
summary: "Replication broken (instance {{ $labels.instance }})"
description: "Redis instance lost a slave\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: TooManyConnections
expr: redis_connected_clients > 1000
for: 5m
labels:
severity: warning
annotations:
summary: "Too many connections (instance {{ $labels.instance }})"
description: "Redis instance has too many connections\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: NotEnoughConnections
expr: redis_connected_clients < 5
for: 5m
labels:
severity: warning
annotations:
summary: "Not enough connections (instance {{ $labels.instance }})"
description: "Redis instance should have more connections (> 5)\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: RejectedConnections
expr: increase(redis_rejected_connections_total[1m]) > 0
for: 5m
labels:
severity: error
annotations:
summary: "Rejected connections (instance {{ $labels.instance }})"
description: "Some connections to Redis has been rejected\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
二、通过CAdvisor监控Docker
2.1 启动CAdvisor
docker run --volume=/:/rootfs:ro --volume=/var/run:/var/run:rw \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--volume=/dev/disk/:/dev/disk:ro \
--publish=8081:8080 \
--detach=true \
--name=cadvisor \
google/cadvisor:latest
2.2 配置Prometheus
scrape_configs:
# 添加作业,自定义名称
- job_name: 'docker'
# 静态添加
static_configs:
# 指定监控实例
- targets: ['192.168.21.55:8081']
2.3 重启prometheus容器
docker restart prometheus
2.4 Prometheus监控端查看是否介入
浏览器输入ip:9090查看Prometheus的targets状态,看看数据是否接入正常。
2.5 访问被监控端cAdvisor捕获的数据
http://192.168.0.50:8089/metrics
2.6 配置Grafana监控图表
推荐模板ID:893
三、通过mysql_exporter监控Mysql
使用Docker部署监控系统,Prometheus,Grafana,监控服务器信息及Mysql
51CTO:wfwf1990:使用prometheus的mysql exporter监控mysql
3.1 mysqld exporter方式监控Mysql
mysqld exporter的功能是收集mysql服务器的数据, 并向外提供api接口, 用于prometheus主要获取数据。
在被监控端mysql服务器上创建账号用于mysql exporter收集使用
GRANT REPLICATION CLIENT, PROCESS ON *.* to 'exporter'@'%' identified by '123456';
GRANT SELECT ON performance_schema.* TO 'exporter'@'%';
flush privileges;
在被监控端mysql服务器上安装mysql exporter。
docker run -d --restart=always --name mysqld-exporter -p 9104:9104 -e DATA_SOURCE_NAME="user:password@(hostname:port)/" prom/mysqld-exporter
docker run -d --restart=always --name mysqld-exporter -p 9104:9104 -e DATA_SOURCE_NAME="exporter:123456@(192.168.1.82:3306)/" prom/mysqld-exporter
要查看容器是否报错, 主要是验证exporter与mysql服务端之间正常连接和获取数据;
docker logs -f mysqld-exporter 看有没有报错
3.2 配置alertmanager报警,添加prometheus配置
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- "10.100.110.171:9093"
rule_files:
- /opt/prometheus/rules/mysql*.rules
制定mysql报警规则:
groups:
- name: MySQLStatsAlert
rules:
- alert: MySQL is down
expr: mysql_up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} MySQL is down"
description: "MySQL database is down. This requires immediate action!"
- alert: open files high
expr: mysql_global_status_innodb_num_open_files > (mysql_global_variables_open_files_limit) * 0.75
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} open files high"
description: "Open files is high. Please consider increasing open_files_limit."
- alert: Read buffer size is bigger than max. allowed packet size
expr: mysql_global_variables_read_buffer_size > mysql_global_variables_slave_max_allowed_packet
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} Read buffer size is bigger than max. allowed packet size"
description: "Read buffer size (read_buffer_size) is bigger than max. allowed packet size (max_allowed_packet).This can break your replication."
- alert: Sort buffer possibly missconfigured
expr: mysql_global_variables_innodb_sort_buffer_size <256*1024 or mysql_global_variables_read_buffer_size > 4*1024*1024
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} Sort buffer possibly missconfigured"
description: "Sort buffer size is either too big or too small. A good value for sort_buffer_size is between 256k and 4M."
- alert: Thread stack size is too small
expr: mysql_global_variables_thread_stack <196608
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} Thread stack size is too small"
description: "Thread stack size is too small. This can cause problems when you use Stored Language constructs for example. A typical is 256k for thread_stack_size."
- alert: Used more than 80% of max connections limited
expr: mysql_global_status_max_used_connections > mysql_global_variables_max_connections * 0.8
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} Used more than 80% of max connections limited"
description: "Used more than 80% of max connections limited"
- alert: InnoDB Force Recovery is enabled
expr: mysql_global_variables_innodb_force_recovery != 0
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} InnoDB Force Recovery is enabled"
description: "InnoDB Force Recovery is enabled. This mode should be used for data recovery purposes only. It prohibits writing to the data."
- alert: InnoDB Log File size is too small
expr: mysql_global_variables_innodb_log_file_size < 16777216
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} InnoDB Log File size is too small"
description: "The InnoDB Log File size is possibly too small. Choosing a small InnoDB Log File size can have significant performance impacts."
- alert: InnoDB Flush Log at Transaction Commit
expr: mysql_global_variables_innodb_flush_log_at_trx_commit != 1
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} InnoDB Flush Log at Transaction Commit"
description: "InnoDB Flush Log at Transaction Commit is set to a values != 1. This can lead to a loss of commited transactions in case of a power failure."
- alert: Table definition cache too small
expr: mysql_global_status_open_table_definitions > mysql_global_variables_table_definition_cache
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} Table definition cache too small"
description: "Your Table Definition Cache is possibly too small. If it is much too small this can have significant performance impacts!"
- alert: Table open cache too small
expr: mysql_global_status_open_tables >mysql_global_variables_table_open_cache * 99/100
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} Table open cache too small"
description: "Your Table Open Cache is possibly too small (old name Table Cache). If it is much too small this can have significant performance impacts!"
- alert: Thread stack size is possibly too small
expr: mysql_global_variables_thread_stack < 262144
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} Thread stack size is possibly too small"
description: "Thread stack size is possibly too small. This can cause problems when you use Stored Language constructs for example. A typical is 256k for thread_stack_size."
- alert: InnoDB Buffer Pool Instances is too small
expr: mysql_global_variables_innodb_buffer_pool_instances == 1
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} InnoDB Buffer Pool Instances is too small"
description: "If you are using MySQL 5.5 and higher you should use several InnoDB Buffer Pool Instances for performance reasons. Some rules are: InnoDB Buffer Pool Instance should be at least 1 Gbyte in size. InnoDB Buffer Pool Instances you can set equal to the number of cores of your machine."
- alert: InnoDB Plugin is enabled
expr: mysql_global_variables_ignore_builtin_innodb == 1
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} InnoDB Plugin is enabled"
description: "InnoDB Plugin is enabled"
- alert: Binary Log is disabled
expr: mysql_global_variables_log_bin != 1
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} Binary Log is disabled"
description: "Binary Log is disabled. This prohibits you to do Point in Time Recovery (PiTR)."
- alert: Binlog Cache size too small
expr: mysql_global_variables_binlog_cache_size < 1048576
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} Binlog Cache size too small"
description: "Binlog Cache size is possibly to small. A value of 1 Mbyte or higher is OK."
- alert: Binlog Statement Cache size too small
expr: mysql_global_variables_binlog_stmt_cache_size <1048576 and mysql_global_variables_binlog_stmt_cache_size > 0
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} Binlog Statement Cache size too small"
description: "Binlog Statement Cache size is possibly to small. A value of 1 Mbyte or higher is typically OK."
- alert: Binlog Transaction Cache size too small
expr: mysql_global_variables_binlog_cache_size <1048576
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} Binlog Transaction Cache size too small"
description: "Binlog Transaction Cache size is possibly to small. A value of 1 Mbyte or higher is typically OK."
- alert: Sync Binlog is enabled
expr: mysql_global_variables_sync_binlog == 1
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} Sync Binlog is enabled"
description: "Sync Binlog is enabled. This leads to higher data security but on the cost of write performance."
- alert: IO thread stopped
expr: mysql_slave_status_slave_io_running != 1
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} IO thread stopped"
description: "IO thread has stopped. This is usually because it cannot connect to the Master any more."
- alert: SQL thread stopped
expr: mysql_slave_status_slave_sql_running == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} SQL thread stopped"
description: "SQL thread has stopped. This is usually because it cannot apply a SQL statement received from the master."
- alert: SQL thread stopped
expr: mysql_slave_status_slave_sql_running != 1
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} Sync Binlog is enabled"
description: "SQL thread has stopped. This is usually because it cannot apply a SQL statement received from the master."
- alert: Slave lagging behind Master
expr: rate(mysql_slave_status_seconds_behind_master[1m]) >30
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} Slave lagging behind Master"
description: "Slave is lagging behind Master. Please check if Slave threads are running and if there are some performance issues!"
- alert: Slave is NOT read only(Please ignore this warning indicator.)
expr: mysql_global_variables_read_only != 0
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} Slave is NOT read only"
description: "Slave is NOT set to read only. You can accidentally manipulate data on the slave and get inconsistencies..."
四、通过node_exporter监控服务器信息
Grafana+Prometheus通过node_exporter监控Linux服务器信息
五、Prometheus监控JVM
简书:袁先生的教程:Prometheus监控JVM
使用Prometheus+Grafana监控JVM
Docker监控平台prometheus和grafana,监控redis,mysql,docker,服务器信息的更多相关文章
- Docker 监控平台Prometheus
Prometheus 是一个强大的监控平台,提供了监控数据搜集.存储.处理.可视化和告警一套完整的解决方案. 官方网站:https://prometheus.io
- Prometheus Alertmanager Grafana 监控警报
Prometheus Alertmanager Grafana 监控警报 #node-exporter, Linux系统信息采集组件 #prometheus , 抓取.储存监控数据,供查询指标 #al ...
- 轻量级监控平台之java进程监控脚本
轻量级监控平台之java进程监控脚本 #!/bin/bash #进程监控脚本 #功能需求: 上报机器Java进程的进程ID,对应的端口号service tcp端口号,tomcat http 端口号,以 ...
- 【集群监控】Docker上部署Prometheus+Alertmanager+Grafana实现集群监控
Docker部署 下载 sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.re ...
- 性能测试监控平台:InfluxDB+Grafana+Jmeter
前面的博客介绍了InfluxDB.Telegraf.Grafana的安装和使用方法,这篇博客,介绍下如何利用这些开源工具搭建性能测试监控平台... 前言 性能测试工具jmeter自带的监视器对性能测试 ...
- Linux监控平台介绍、zabbix监控介绍、安装zabbix、忘记Admin密码如何做
7月6日任务 19.1 Linux监控平台介绍19.2 zabbix监控介绍19.3/19.4/19.5 安装zabbix19.6 忘记Admin密码如何做 19.1 Linux监控平台介绍 一般大公 ...
- Linux centosVMware Linux监控平台介绍、zabbix监控介绍、安装zabbix、忘记Admin密码如何做
一.Linux监控平台介绍 cacti.nagios.zabbix.smokeping.open-falcon等等 cacti.smokeping偏向于基础监控,成图非常漂亮 cacti.nagios ...
- Spark应用监控解决方案--使用Prometheus和Grafana监控Spark应用
Spark任务启动后,我们通常都是通过跳板机去Spark UI界面查看对应任务的信息,一旦任务多了之后,这将会是让人头疼的问题.如果能将所有任务信息集中起来监控,那将会是很完美的事情. 通过Spark ...
- 使用Prometheus和Grafana监控emqx集群
以 Prometheus为例: emqx_prometheus 支持将数据推送至 Pushgateway 中,然后再由 Promethues Server 拉取进行存储. 注意:emqx_promet ...
随机推荐
- python实现贴吧顶贴机器人
前言------百度贴吧流量如何?全球最大的中文社区,虽然比不上阿里,腾讯! 此文章仅供交流学习.建议机器人用小号操作,切勿用作商业用途! 测试版本:python 3.7 64位火狐浏览器firefo ...
- 什么是urlencode编码
今天看文章中看到了urlencode,不理解 ,故上网查了查,看到了如下的答案,在此记录下,以加深印象 urlencode编码:就是将字符串以URL编码,一种编码方式,主要为了解决url中中文乱码问题 ...
- hive 将hive表数据查询出来转为json对象和json数组输出
一.将hive表数据查询出来转为json对象输出 1.将查询出来的数据转为一行一行,并指定分割符的数据 2.使用UDF函数,将每一行数据作为string传入UDF函数中转换为json再返回 1.准备数 ...
- 九、kafka伪分布式和集群搭建
伪分布式: 1.先将zk启动,如果是在伪分布式下,kafka已经集成了zk nohup /kafka_2.11-0.10.0.1/bin/zookeeper-server-start.sh /kafk ...
- flume伪分布式安装
flume伪分布式安装: 1.导包:apache-flume-1.7.0-bin.tar.gz 2.配置环境变量:/etc/profile export FLUME_HOME=/yang/apache ...
- 【函数分享】每日PHP函数分享(2021-1-9)
implode() 将一个一维数组的值转化为字符串. string implode ( string $glue , array $pieces ) 参数描述 glue 默认为空的字符串. p ...
- new ArrayList(0) 和 new ArrayList() 和一样吗?
第一感觉是一样的,盲猜后者调用了前者,并传入参数 0.然而,无论是 JDK 7 还是 JDK 8,这两个方法构造的结果都是不一样的.JDK 开发人员在这方面作了优化. JDK 7 在 Java 7 中 ...
- db_install.rsp dbca.rsp netca.rsp 详解【转】
db_install.rsp详解 #################################################################### ## Copyright(c ...
- Java开发手册之编程规约
时隔一年多,再次开始更新博客,各位粉丝们久等了.大家是不是以为我像大多数开发者一样三分钟热度,坚持了一年半载就放弃了,其实不是.在过去的一年时间我学习了<Java编程思想>这本书,因为都是 ...
- 【Oracle】增量备份和全库备份怎么恢复数据库
1差异增量实验示例 1.1差异增量备份 为了演示增量备份的效果,我们在执行一次0级别的备份后,对数据库进行一些改变. 再执行一次1级别的差异增量备份: 执行完1级别的备份后再次对数据库进行更改: 再执 ...