前言

背景

xxx，你过来把squid的日志检查一下，是否做了日志切割；于是乎开启了logrotate没有切割日志的排查旅程，em～～。只能说过程很爽，平时疲于应付繁琐的事情，难得有点时间能一条线慢慢的捋清楚一件事情。现在唯一想做的事情就是慢慢的把技术知识一点一点捋顺了，查漏补缺，然后深入。

排查过程

知道logrotate这个东西，但是仅限于知道，只能一点点的边学习，边排查。重点在于他依赖于定时任务，在这个地方又是另一个知识点了，其实很多东西都是由基础的东西构成的。排查定时任务的时候卡在了两种定时任务Crontab和Systemd timer；其实logrotate使用这两种方式都可以，只是刚开始的时候脑袋太乱了，因此没搞明白。下面就把这两种方式总结一下。

方式一、Crontab模式

刚开始的时候是往Crontab方向排查的，采用由外到内的方式；

1）检查Cron服务的状态

systemctl status cron.service

2）检查执行Logrotate的定时任务

#cat /etc/crontab

17 *    * * *   root    cd / && run-parts --report /etc/cron.hourly

25 6   * * *   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )

47 6    * * 7   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly )

52 6    1 * *   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.monthly )

定时任务不执行一般问题都出现在后面的命令，所以检查一下，手动执行“test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )”，看是否报错，这里我的就是anacron是没有安装。

3）检查系统日志cron是否有在规定的时间执行

cat /var/log/syslog，可以看到定时任务是有在规定时间执行的

Dec  8 06:25:01 (root) CMD (test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ))

4）检查定时任务下logrotate脚本

cat /etc/cron.daily/logrotate

#!/bin/sh

# skip in favour of systemd timer

if [ -d /run/systemd/system ]; then

    exit 0

fi

# this cronjob persists removals (but not purges)

if [ ! -x /usr/sbin/logrotate ]; then

    exit 0

fi

/usr/sbin/logrotate /etc/logrotate.conf

EXITVALUE=$?

if [ $EXITVALUE != 0 ]; then

    /usr/bin/logger -t logrotate "ALERT exited abnormally with [$EXITVALUE]"

fi

exit $EXITVALUE

因为我的squid的定时切割是每天，所以需要/etc/cron.daily；同样可以手动执行（bash -x logrotate）一次这个脚本，看是否能成功，在这里，发现我的脚本在执行第一个检查的时候这一步退出了；然后花了大量的时候查找为什么会做这一步检查，后来经过网上的搜索发现/run/systemd/system这个目录下放着与时间相关的Systemd任务，所以这一步的判断是为了检测是否用了方式二Systemd timer做了定时任务，如果有这个目录，则不再做Crontab的定时任务；

# bash -x /etc/cron.daily/logrotate

if [ -d /run/systemd/system ]; then

    exit 0

fi

因此到了这里发现了logrotate是有两种方式做定时任务的，如果使用Crontab模式，则把 /etc/cron.daily/logrotate这个文件里检测/run/systemd/system目录的这3行注释掉就可以；然后停掉logrotate.timer（systemctl stop logrotate.timer）;

5）检查logrotate服务的状态

systemctl status logrotate.service

● logrotate.service - Rotate log files

     Loaded: loaded (/lib/systemd/system/logrotate.service; static; vendor preset: enabled)

     Active: failed (Result: exit-code) since Wed 2021-12-08 00:00:03 +08; 16h ago

TriggeredBy: ● logrotate.timer

       Docs: man:logrotate(8)

             man:logrotate.conf(5)

   Main PID: 2811680 (code=exited, status=1/FAILURE)

Dec 08 00:00:01 systemd[1]: Starting Rotate log files...

Dec 08 00:00:03 logrotate[2811680]: error: failed to rename /usr/squid/logs/access.log to /usr/squid/logs/access.log-20211208: Read-only file system

这里通过TriggeredBy：logrotate.timer可以发现logrotate确实是通过Systemd timer的方式来做定时任务的。

方式二：Systemd模式

使用Systemd timer模式主要是两个服务，一个是logrotate.service，logrotate.timer

1）查看logrotate.service的状态

systemctl status logrotate.service

● logrotate.service - Rotate log files

     Loaded: loaded (/lib/systemd/system/logrotate.service; static; vendor preset: enabled)

     Active: failed (Result: exit-code) since Wed 2021-12-08 00:00:03 +08; 16h ago

TriggeredBy: ● logrotate.timer

       Docs: man:logrotate(8)

             man:logrotate.conf(5)

   Main PID: 2811680 (code=exited, status=1/FAILURE)

Dec 08 00:00:01 systemd[1]: Starting Rotate log files...

Dec 08 00:00:03 logrotate[2811680]: error: failed to rename /usr/squid/logs/access.log to /usr/squid/logs/access.log-20211208: Read-only file system

TriggeredBy: logrotate.timer，这里可以知道这个服务是由logrotate.timer触发执行的；这里有Read-only file system的报错，主要是Systemd的logrotate.service文件中，对logrotate加了文件读写保护，当 ProtectSystem=full这个参数时，会把 /usr/, /boot, /efi, /etc 挂载为只读，如果是 ProtectSystem=strict 那么整个文件系统都会挂载为只读。刚好我的日志目录就在/usr/下，因此需要再加一个参数：ReadWritePaths=/usr/squid/logs即可，然后systemctl daemon-reload && systemctl restart logrotate.service。这里需要思考的问题是日志目录设置在/usr/下是否合理？

[Unit]

Description=Rotate log files

Documentation=man:logrotate(8) man:logrotate.conf(5)

ConditionACPower=true

[Service]

Type=oneshot

ExecStart=/usr/sbin/logrotate /etc/logrotate.conf

# performance options

Nice=19

IOSchedulingClass=best-effort

IOSchedulingPriority=7

# hardening options

#  details: https://www.freedesktop.org/software/systemd/man/systemd.exec.html

#  no ProtectHome for userdir logs

#  no PrivateNetwork for mail deliviery

#  no ProtectKernelTunables for working SELinux with systemd older than 235

#  no MemoryDenyWriteExecute for gzip on i686

PrivateDevices=true

PrivateTmp=true

ProtectControlGroups=true

ProtectKernelModules=true

ProtectSystem=full

RestrictRealtime=true

ReadWritePaths=/usr/squid/logs

2）检查logrotate.timer的状态

systemctl status logrotate.timer

● logrotate.timer - Daily rotation of log files

     Loaded: loaded (/lib/systemd/system/logrotate.timer; enabled; vendor preset: enabled)

     Active: active (waiting) since Thu 2021-08-05 11:08:00 +08; 4 months 3 days ago

    Trigger: Thu 2021-12-09 00:00:00 +08; 7h left

   Triggers: ● logrotate.service

       Docs: man:logrotate(8)

             man:logrotate.conf(5)

Trigger：可以看出他的触发时间

Triggers：可以看出他将触发的服务

查看logrotate.timer的配置

[Unit]

Description=Daily rotation of log files

Documentation=man:logrotate(8) man:logrotate.conf(5)

[Timer]

OnCalendar=daily

AccuracySec=12h

Persistent=true

#Unit：真正要执行的任务，默认是同名的带有.service后缀的单元

[Install]

WantedBy=timers.target

3）检查logrotate脚本

使用logrotate切割的脚本一般都放在/etc/logrotate.d/ 下，因此我的squid切割也在这个目录下，可以通过logrotate -d /etc/logrotate.d/squid 检测一下，-d表示debug模式；在这里debug时候发现logrotate有一个status状态文件/var/lib/logrotate/status，也就是记录下文件logrotate的时间，在今天有做过rotate的话，那么就不会再一次进行，如果想要测试，可以编辑这个文件你想要测试的文件的时间，比如发现里面有一条日志："/usr/squid/logs/access.log" 2021-12-9-14:33:28 说明今天做了切割了，还想做测试则可以改为："/usr/squid/logs/access.log" 2021-12-7-14:33:28，那么就可以再一次测试了。注意：这个status的状态只能改时间，直接删除这一条是不生效的。

4）验证

重新执行systemctl restart logrotate.service后，就可以看到被切割的日志；如：access.log-20211209等

待加强的知识点

1）anacron与cron的区别？

2）systemd创建定时任务与Cron创建定时任务的区别，优缺点？

3）systemd的配置以及整个systemd需要加强理解，系统学习

4）crontab定时任务的系统学习

5）logrotate系统的学习，配置等

6）看logrotate源码，用Python或Go模拟

很多基础都只是知道的层次，需要下点功夫了。

最后

欢迎大家关注我的公众号，一起交流、学习。

logrotate没有rotate的排查过程的更多相关文章

一次kibana服务失败的排查过程
公司在kubernetes集群上稳定运行数月的kibana服务于昨天下午突然无法正常提供服务,访问kibana地址后提示如下信息: 排查过程: 看到提示后,第一反应肯定是检查elasticsearch ...
记一次生产环境Nginx日志骤增的问题排查过程
摘要:众所周知,Nginx是目前最流行的Web Server之一,也广泛应用于负载均衡.反向代理等服务,但使用过程中可能因为对Nginx工作原理.变量含义理解错误,或是参数配置不当导致Nginx工作异 ...
干货！一次kafka卡顿事故排查过程
由于一次功能上线后,导致某数据量急剧下滑,给我们紧张的呢!排查过程也是个学习过程(这其中有大部分是领导们的功劳,不过分享给大家应该也不犯法吧,ᐓ) 1. 确认问题的真实性? 被数据部门告知,某数据量下 ...
Linux(2)---记录一次线上服务 CPU 100%的排查过程
Linux(2)---记录一次线上服务 CPU 100%的排查过程当时产生CPU飙升接近100%的原因是因为项目中的websocket时时断开又重连导致CPU飙升接近100% .如何排查的呢是通过 ...
神奇的Java僵尸(defunct)进程问题排查过程
现象描述大概1个月多以前在启动脚本中增加了tail -f 用来启动后追踪日志判断是否启动成功后发现无法执行shutdown.sh(卡住利用curl) 然后无奈使用kill -9 但通过ps - ...
Connection refused 排查过程
Connection refused 排查过程 connection refused 排查起因今天在连接 rabbitmq 时,报 Connection refused (如下图),借此机会记 ...
解Bug之路-记一次中间件导致的慢SQL排查过程
解Bug之路-记一次中间件导致的慢SQL排查过程前言最近发现线上出现一个奇葩的问题,这问题让笔者定位了好长时间,期间排查问题的过程还是挺有意思的,正好博客也好久不更新了,就以此为素材写出了本篇文章 ...
解Bug之路-记一次存储故障的排查过程
解Bug之路-记一次存储故障的排查过程高可用真是一丝细节都不得马虎.平时跑的好好的系统,在相应硬件出现故障时就会引发出潜在的Bug.偏偏这些故障在应用层的表现稀奇古怪,很难让人联想到是硬件出了问题, ...
JDBC连接泄露问题的排查过程总结
当前使用的Spring JDBC版本是5.0.0.RC1,HikariCP版本是3.1.0. 今天测试同学反馈在前端页面点击次数多了,就报500错误,数据显示不出来.于是我在后台服务日志中观察发现Hi ...

随机推荐

centos7.1使用kubeadm部署kubernetes 1.16.2的master高可用
机器列表,配置域名解析 cat /etc/hosts192.168.200.210 k8s-master1192.168.200.211 k8s-master2192.168.200.212 k8s- ...
Spring Aop面向切面编程&&自动注入
1.面向切面编程在程序原有纵向执行流程中,针对某一个或某一些方法添加通知,形成横切面的过程叫做面向切面编程 2.常用概念原有功能:切点,pointcut 前置通知:在切点之前执行的功能,befor ...
理解ASP.NET Core - 错误处理(Handle Errors)
注:本文隶属于<理解ASP.NET Core>系列文章,请查看置顶博客或[点击此处查看全文目录](https://www.cnblogs.com/xiaoxiaotank/p/151852 ...
Centos8上安装Nginx
一.Nginx下载官网:http://nginx.org/ 选择稳定版下载:直接右键复制下载地址即可命令: wget http://nginx.org/download/nginx-1.20.2. ...
Go语言程序结构之变量
初识Go语言之变量 var声明创建一个具体类型的变量,然后给它附加一个名字,设置他的初始值,这种声明都是一个通用的形式: var name type = expression 在实际的开发中,为了方便 ...
Cannot connect to runtime process
发生一个或多个错误. 未能启动调试适配器.可以在输出窗口中查看额外的信息. Cannot connect to runtime process, timeout after 10000 ms (rea ...
SpringCloud升级之路2020.0.x版-44.避免链路信息丢失做的设计(1)
本系列代码地址:https://github.com/JoJoTec/spring-cloud-parent 我们在这一节首先分析下 Spring Cloud Gateway 一些其他可能丢失链路信息 ...
Codeforces 1606F - Tree Queries（虚树+树形 dp）
Codeforces 题面传送门 & 洛谷题面传送门显然我们选择删除的点连同 \(u\) 会形成一个连通块,否则我们如果选择不删除不与 \(u\) 在同一连通块中的点,答案一定更优. 注意到 ...
UOJ #76 -【UR #6】懒癌（思维题）
UOJ 题面传送门神仙题. orz czx,czxyyds 首先没有懒癌的狗肯定不会被枪毙,证明显然. 接下来考虑怎样计算一种局面的答案,假设 \(dp_S\) 表示对于有且仅有 \(S\) 中的狗 ...
No.2 R语言在生物信息中的应用—模式匹配
目的: 1. 计算自定义模序在所有蛋白质的匹配位点和次数 2. 输出超过阈值的蛋白质序列到Hit_sequences.fasta 3. Hit_sequences.fasta中序列用小写字母,匹配用大 ...

logrotate没有rotate的排查过程

前言

方式一、Crontab模式

方式二：Systemd模式

待加强的知识点

最后

logrotate没有rotate的排查过程的更多相关文章

随机推荐

热门专题