原创:扣钉日记(微信公众号ID:codelogs),欢迎分享,转载请保留出处。

简介

Linux上有大量的问题诊断工具,如perf、bcc等,但这些诊断工具,虽然功能强大,但却需要很高的权限才可以使用。

而0x.tools这个工具提供了一个很好的思路,通过采样/proc目录来诊断问题,对被测量程序几乎无性能影响,且只要与目标进程拥有同等级的权限,即可正常使用。

不要小看这个权限区别,在互联网大厂,开发同学一般只能获取到一个受限于容器内的shell环境,想要获取机器的root权限几乎是不可能的。

安装

# 下载源码
$ git clone https://github.com/tanelpoder/0xtools.git # 安装编译器
$ yum install -y make gcc # 编译并安装程序
$ make && make install

实际上0x.tools里的工具大多数是脚本,如psn工具是python脚本,因此直接将代码clone下来,然后执行bin/psn也是可以的。

psn工具

psn工具用来观测系统中当前活跃的线程正在做什么,如线程在做什么系统调用、写什么文件、阻塞在哪个内核函数下?

查看活跃线程

[tanel@linux01 ~]$ psn

Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools]
Sampling /proc/stat for 5 seconds... finished. === Active Threads ================================================ samples | avg_threads | comm | state
-------------------------------------------------------------------
10628 | 3542.67 | (kworker/*:*) | Disk (Uninterruptible)
37 | 12.33 | (oracle_*_l) | Running (ON CPU)
17 | 5.67 | (oracle_*_l) | Disk (Uninterruptible)
2 | 0.67 | (xcapture) | Running (ON CPU)
1 | 0.33 | (ora_lg*_xe) | Disk (Uninterruptible)
1 | 0.33 | (ora_lgwr_lin*) | Disk (Uninterruptible)
1 | 0.33 | (ora_lgwr_lin*c) | Disk (Uninterruptible) samples: 3 (expected: 100)
total processes: 10470, threads: 11530
runtime: 6.13, measure time: 6.03

如上,默认情况下,psn采样/proc目录下每个线程的/proc/$pid/stat文件,采样5秒钟,将R(正在运行)或D(不可中断休眠)状态的线程的数据记录下来,并做汇总。

由于R或D状态的线程都是活跃线程,被采样到的次数越多,则越说明这些线程运行得更慢或更频繁。

查看线程读写文件

[tanel@linux01 ~]$ sudo psn -G syscall,filenamesum

Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools]
Sampling /proc/syscall, stat for 5 seconds... finished. === Active Threads ======================================================================================================= samples | avg_threads | comm | state | syscall | filenamesum
--------------------------------------------------------------------------------------------------------------------------
2027 | 506.75 | (kworker/*:*) | Disk (Uninterruptible) | [kernel_thread] |
1963 | 490.75 | (oracle_*_l) | Disk (Uninterruptible) | pread64 | /data/oracle/LIN*C/soe_bigfile.dbf
87 | 21.75 | (oracle_*_l) | Running (ON CPU) | [running] |
13 | 3.25 | (kworker/*:*) | Running (ON CPU) | [running] |
4 | 1.00 | (oracle_*_l) | Running (ON CPU) | read | socket:[*]
2 | 0.50 | (collectl) | Running (ON CPU) | [running] |
1 | 0.25 | (java) | Running (ON CPU) | futex |
1 | 0.25 | (ora_ckpt_xe) | Disk (Uninterruptible) | pread64 | /data/oracle/XE/control*.ctl
1 | 0.25 | (ora_m*_linprd) | Running (ON CPU) | [running] |
1 | 0.25 | (ora_m*_lintes) | Running (ON CPU) | [running] |

通过-G可以指定需要查看的列,syscall表示线程正在执行的系统调用,filenamesum表示正在读写的文件,一般来说,线程处于D状态时在做文件io操作,如果D状态线程频繁出现,那么我们肯定想知道线程正在读写哪个文件。

查看线程的内核栈

[tanel@linux01 ~]$ sudo psn -p -G syscall,wchan,kstack

Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools]
Sampling /proc/wchan, stack, syscall, stat for 5 seconds... finished. === Active Threads ======================================================================================================================================================================================================================================================================================================================================================================================= samples | avg_threads | comm | state | syscall | wchan | kstack
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
281 | 140.50 | (kworker/*:*) | Disk (Uninterruptible) | [kernel_thread] | blkdev_issue_flush | ret_from_fork_nospec_begin()->kthread()->worker_thread()->process_one_work()->dio_aio_complete_work()->dio_complete()->generic_write_sync()->xfs_file_fsync()->xfs_blkdev_issue_flush()->blkdev_issue_flush()
211 | 105.50 | (kworker/*:*) | Disk (Uninterruptible) | [kernel_thread] | call_rwsem_down_read_failed | ret_from_fork_nospec_begin()->kthread()->worker_thread()->process_one_work()->dio_aio_complete_work()->dio_complete()->generic_write_sync()->xfs_file_fsync()->xfs_ilock()->call_rwsem_down_read_failed()
169 | 84.50 | (oracle_*_li) | Disk (Uninterruptible) | pread64 | call_rwsem_down_write_failed | system_call_fastpath()->SyS_pread64()->vfs_read()->do_sync_read()->xfs_file_aio_read()->xfs_file_dio_aio_read()->touch_atime()->update_time()->xfs_vn_update_time()->xfs_ilock()->call_rwsem_down_write_failed()
64 | 32.00 | (kworker/*:*) | Disk (Uninterruptible) | [kernel_thread] | xfs_log_force_lsn | ret_from_fork_nospec_begin()->kthread()->worker_thread()->process_one_work()->dio_aio_complete_work()->dio_complete()->generic_write_sync()->xfs_file_fsync()->xfs_log_force_lsn()
24 | 12.00 | (oracle_*_li) | Disk (Uninterruptible) | pread64 | call_rwsem_down_read_failed | system_call_fastpath()->SyS_pread64()->vfs_read()->do_sync_read()->xfs_file_aio_read()->xfs_file_dio_aio_read()->__blockdev_direct_IO()->do_blockdev_direct_IO()->xfs_get_blocks_direct()->__xfs_get_blocks()->xfs_ilock_data_map_shared()->xfs_ilock()->call_rwsem_down_read_failed()
5 | 2.50 | (oracle_*_li) | Disk (Uninterruptible) | pread64 | do_blockdev_direct_IO | system_call_fastpath()->SyS_pread64()->vfs_read()->do_sync_read()->xfs_file_aio_read()->xfs_file_dio_aio_read()->__blockdev_direct_IO()->do_blockdev_direct_IO()
3 | 1.50 | (oracle_*_li) | Running (ON CPU) | [running] | 0 | system_call_fastpath()->SyS_pread64()->vfs_read()->do_sync_read()->xfs_file_aio_read()->xfs_file_dio_aio_read()->__blockdev_direct_IO()->do_blockdev_direct_IO()
2 | 1.00 | (kworker/*:*) | Disk (Uninterruptible) | [kernel_thread] | call_rwsem_down_write_failed | ret_from_fork_nospec_begin()->kthread()->worker_thread()->process_one_work()->dio_aio_complete_work()->dio_complete()->xfs_end_io_direct_write()->xfs_iomap_write_unwritten()->xfs_ilock()->call_rwsem_down_write_failed()
2 | 1.00 | (kworker/*:*) | Running (ON CPU) | [running] | 0 | ret_from_fork_nospec_begin()->kthread()->worker_thread()->process_one_work()->dio_aio_complete_work()->dio_complete()->generic_write_sync()->xfs_file_fsync()->xfs_blkdev_issue_flush()->blkdev_issue_flush()
2 | 1.00 | (oracle_*_li) | Disk (Uninterruptible) | io_submit | call_rwsem_down_write_failed | system_call_fastpath()->SyS_io_submit()->do_io_submit()->xfs_file_aio_read()->xfs_file_dio_aio_read()->touch_atime()->update_time()->xfs_vn_update_time()->xfs_ilock()->call_rwsem_down_write_failed()
1 | 0.50 | (java) | Running (ON CPU) | futex | futex_wait_queue_me | system_call_fastpath()->SyS_futex()->do_futex()->futex_wait()->futex_wait_queue_me()
1 | 0.50 | (ksoftirqd/*) | Running (ON CPU) | [running] | 0 | ret_from_fork_nospec_begin()->kthread()->smpboot_thread_fn()
1 | 0.50 | (kworker/*:*) | Disk (Uninterruptible) | [kernel_thread] | worker_thread | ret_from_fork_nospec_begin()->kthread()->worker_thread()
1 | 0.50 | (kworker/*:*) | Disk (Uninterruptible) | [kernel_thread] | worker_thread | ret_from_fork_nospec_begin()->kthread()->worker_thread()->process_one_work()->dio_aio_complete_work()->dio_complete()->generic_write_sync()->xfs_file_fsync()->xfs_blkdev_issue_flush()->blkdev_issue_flush()
1 | 0.50 | (ora_lg*_xe) | Disk (Uninterruptible) | io_submit | inode_dio_wait | system_call_fastpath()->SyS_io_submit()->do_io_submit()->xfs_file_aio_write()->xfs_file_dio_aio_write()->inode_dio_wait()
1 | 0.50 | (oracle_*_li) | Disk (Uninterruptible) | [running] | 0 | -

同理,通过wchan字段可以查看线程阻塞在什么内核方法上,而kstack字段则可以查看线程阻塞时的内核调用栈是什么。

psn的原理

其实psn和ps命令一样,是通过遍历/proc目录来获取线程信息的,如下:

state:取自/proc/$pid/stat文件。

syscall:取自/proc/$pid/syscall文件。

wchan:取自/proc/$pid/wchan文件。

kstack:取自/proc/$pid/stack文件。

与perf、bcc等工具的区别是,读取这些文件只需要与进程同等级的权限即可,不需要使用root账号。

其它工具

除了psn外,0x.tools里面还有一些其它工具,如xcapture、schedlat等,这里就不一一介绍了,感兴趣可以访问 https://0x.tools/ 查看。

另外,由于psn是通过遍历/proc目录实现的,因此我们也可自己编写脚本来实现同样的功能,如下:

active_thread_kstack(){
# 打印当前系统活跃java线程的内核栈
ps h -Lo pid,tid,s,pcpu,comm,wchan:32,min_flt,maj_flt -C java|grep '[RD] '| awk '
BEGIN{
syscall_files["/usr/include/asm/unistd_64.h"]=1;
syscall_files["/usr/include/x86_64-linux-gnu/asm/unistd_64.h"]=1;
syscall_files["/usr/include/asm-x86_64/unistd.h"]=1;
for(tfile in syscall_files){
cmd="test -f "tfile
if(system(cmd)==0){
hfile=tfile;
break;
}
}
if(hfile){
while (getline <hfile){
if($0 ~ /__NR_/){
syscall_map[$3]=gensub(/__NR_/,"","g",$2)
}
}
}else{
syscall_str="0:read 1:write 2:open 3:close 4:stat 5:fstat 6:lstat 7:poll 8:lseek 9:mmap 10:mprotect 11:munmap 12:brk 13:rt_sigaction 14:rt_sigprocmask 15:rt_sigreturn 16:ioctl 17:pread64 18:pwrite64 19:readv 20:writev 21:access 22:pipe 23:select 24:sched_yield 25:mremap 26:msync 27:mincore 28:madvise 29:shmget 30:shmat 31:shmctl 32:dup 33:dup2 34:pause 35:nanosleep 36:getitimer 37:alarm 38:setitimer 39:getpid 40:sendfile 41:socket 42:connect 43:accept 44:sendto 45:recvfrom 46:sendmsg 47:recvmsg 48:shutdown 49:bind 50:listen 51:getsockname 52:getpeername 53:socketpair 54:setsockopt 55:getsockopt 56:clone 57:fork 58:vfork 59:execve 60:exit 61:wait4 62:kill 63:uname 64:semget 65:semop 66:semctl 67:shmdt 68:msgget 69:msgsnd 70:msgrcv 71:msgctl 72:fcntl 73:flock 74:fsync 75:fdatasync 76:truncate 77:ftruncate 78:getdents 79:getcwd 80:chdir 81:fchdir 82:rename 83:mkdir 84:rmdir 85:creat 86:link 87:unlink 88:symlink 89:readlink 90:chmod 91:fchmod 92:chown 93:fchown 94:lchown 95:umask 96:gettimeofday 97:getrlimit 98:getrusage 99:sysinfo 100:times 101:ptrace 102:getuid 103:syslog 104:getgid 105:setuid 106:setgid 107:geteuid 108:getegid 109:setpgid 110:getppid 111:getpgrp 112:setsid 113:setreuid 114:setregid 115:getgroups 116:setgroups 117:setresuid 118:getresuid 119:setresgid 120:getresgid 121:getpgid 122:setfsuid 123:setfsgid 124:getsid 125:capget 126:capset 127:rt_sigpending 128:rt_sigtimedwait 129:rt_sigqueueinfo 130:rt_sigsuspend 131:sigaltstack 132:utime 133:mknod 134:uselib 135:personality 136:ustat 137:statfs 138:fstatfs 139:sysfs 140:getpriority 141:setpriority 142:sched_setparam 143:sched_getparam 144:sched_setscheduler 145:sched_getscheduler 146:sched_get_priority_max 147:sched_get_priority_min 148:sched_rr_get_interval 149:mlock 150:munlock 151:mlockall 152:munlockall 153:vhangup 154:modify_ldt 155:pivot_root 156:_sysctl 157:prctl 158:arch_prctl 159:adjtimex 160:setrlimit 161:chroot 162:sync 163:acct 164:settimeofday 165:mount 166:umount2 167:swapon 168:swapoff 169:reboot 170:sethostname 171:setdomainname 172:iopl 173:ioperm 174:create_module 175:init_module 176:delete_module 177:get_kernel_syms 178:query_module 179:quotactl 180:nfsservctl 181:getpmsg 182:putpmsg 183:afs_syscall 184:tuxcall 185:security 186:gettid 187:readahead 188:setxattr 189:lsetxattr 190:fsetxattr 191:getxattr 192:lgetxattr 193:fgetxattr 194:listxattr 195:llistxattr 196:flistxattr 197:removexattr 198:lremovexattr 199:fremovexattr 200:tkill 201:time 202:futex 203:sched_setaffinity 204:sched_getaffinity 205:set_thread_area 206:io_setup 207:io_destroy 208:io_getevents 209:io_submit 210:io_cancel 211:get_thread_area 212:lookup_dcookie 213:epoll_create 214:epoll_ctl_old 215:epoll_wait_old 216:remap_file_pages 217:getdents64 218:set_tid_address 219:restart_syscall 220:semtimedop 221:fadvise64 222:timer_create 223:timer_settime 224:timer_gettime 225:timer_getoverrun 226:timer_delete 227:clock_settime 228:clock_gettime 229:clock_getres 230:clock_nanosleep 231:exit_group 232:epoll_wait 233:epoll_ctl 234:tgkill 235:utimes 236:vserver 237:mbind 238:set_mempolicy 239:get_mempolicy 240:mq_open 241:mq_unlink 242:mq_timedsend 243:mq_timedreceive 244:mq_notify 245:mq_getsetattr 246:kexec_load 247:waitid 248:add_key 249:request_key 250:keyctl 251:ioprio_set 252:ioprio_get 253:inotify_init 254:inotify_add_watch 255:inotify_rm_watch 256:migrate_pages 257:openat 258:mkdirat 259:mknodat 260:fchownat 261:futimesat 262:newfstatat 263:unlinkat 264:renameat 265:linkat 266:symlinkat 267:readlinkat 268:fchmodat 269:faccessat 270:pselect6 271:ppoll 272:unshare 273:set_robust_list 274:get_robust_list 275:splice 276:tee 277:sync_file_range 278:vmsplice 279:move_pages 280:utimensat 281:epoll_pwait 282:signalfd 283:timerfd_create 284:eventfd 285:fallocate 286:timerfd_settime 287:timerfd_gettime 288:accept4 289:signalfd4 290:eventfd2 291:epoll_create1 292:dup3 293:pipe2 294:inotify_init1 295:preadv 296:pwritev 297:rt_tgsigqueueinfo 298:perf_event_open 299:recvmmsg 300:fanotify_init 301:fanotify_mark 302:prlimit64 303:name_to_handle_at 304:open_by_handle_at 305:clock_adjtime 306:syncfs 307:sendmmsg 308:setns 309:getcpu 310:process_vm_readv 311:process_vm_writev 312:kcmp 313:finit_module 314:sched_setattr 315:sched_getattr 316:renameat2 317:seccomp 318:getrandom 319:memfd_create 320:kexec_file_load 321:bpf 322:execveat 323:userfaultfd 324:membarrier 325:mlock2 326:copy_file_range 327:preadv2 328:pwritev2 329:pkey_mprotect 330:pkey_alloc 331:pkey_free 332:statx 333:io_pgetevents 334:rseq 424:pidfd_send_signal 425:io_uring_setup 426:io_uring_enter 427:io_uring_register 428:open_tree 429:move_mount 430:fsopen 431:fsconfig 432:fsmount 433:fspick 434:pidfd_open 435:clone3"
split(syscall_str, syscall_arr, " ")
for(i in syscall_arr){
split(syscall_arr[i], idname_arr, ":")
syscall_map[idname_arr[1]]=idname_arr[2]
}
} syscall_with_fd_map["read"]=1;
syscall_with_fd_map["write"]=1;
syscall_with_fd_map["pread64"]=1;
syscall_with_fd_map["pwrite64"]=1;
syscall_with_fd_map["fsync"]=1;
syscall_with_fd_map["fdatasync"]=1;
syscall_with_fd_map["recvfrom"]=1;
syscall_with_fd_map["sendto"]=1;
syscall_with_fd_map["recvmsg"]=1;
syscall_with_fd_map["sendmsg"]=1;
syscall_with_fd_map["epoll_wait"]=1;
syscall_with_fd_map["ioctl"]=1;
syscall_with_fd_map["accept"]=1;
syscall_with_fd_map["accept4"]=1;
special_fd_map["0"]="(stdin)";
special_fd_map["1"]="(stdout)";
special_fd_map["2"]="(stderr)";
}
{
RS="^$";
getline wchan <("/proc/"$1"/task/"$2"/wchan");
close("/proc/"$1"/task/"$2"/wchan");
getline stack <("/proc/"$1"/task/"$2"/stack");
close("/proc/"$1"/task/"$2"/stack");
getline syscall <("/proc/"$1"/task/"$2"/syscall");
close("/proc/"$1"/task/"$2"/syscall");
split(syscall, syscall_arr, /\s+/);
syscall_id=syscall_arr[1]
syscall_name=syscall_map[syscall_id];
if(syscall_name in syscall_with_fd_map){
fd=strtonum(syscall_arr[2])
cmd="readlink /proc/"$1"/fd/"fd;
cmd|getline filename;
close(cmd);
if(fd in special_fd_map){
filename=filename special_fd_map[syscall_id]
}
}
printf "pid:%s,tid:%s,stat:%s,pcpu:%s,comm:%s,wchan:%s,min_flt:%s,maj_flt:%s,syscall:%s,filename:%s\n",$1,$2,$3,$4,$5,wchan,$7,$8,syscall_name,filename;
print stack;
RS="\n"
}'
}

这样,我们不用安装0x.tools,就也能得到类似于psn命令的功能了!

往期内容

Linux命令拾遗-入门篇

Linux命令拾遗-文本处理篇

Linux命令拾遗-软件资源观测

mysql的timestamp会存在时区问题?

真正理解可重复读事务隔离级别

字符编码解惑

容器内的Linux诊断工具0x.tools的更多相关文章

  1. 60,000毫秒内对Linux的性能诊断效的方法

    转载于:http://www.itxuexiwang.com/a/liunxjishu/2016/0225/168.html?1456484140 60,000 毫秒内对 Linux 的性能诊断 当你 ...

  2. Linux 下的一个全新的性能测量和调式诊断工具 Systemtap, 第 2 部分: DTrace

    DTrace的原理本系列文章详细地介绍了一个 Linux 下的全新的调式.诊断和性能测量工具 Systemtap 和它所依赖的基础 kprobe 以及促使开发该工具的先驱 DTrace 并给出实际使用 ...

  3. Linux 下的一个全新的性能测量和调式诊断工具 Systemtap,第 1 部分: kprobe

    kprobe 的原理.编程接口.局限性和使用注意事项 本系列文章详细地介绍了一个Linux下的全新的调式.诊断和性能测量工具Systemtap和它所依赖的基础kprobe以及促使开发该工具的先驱DTr ...

  4. [翻译]60,000毫秒内对Linux进行性能诊断

    原文链接:http://techblog.netflix.com/2015/11/linux-performance-analysis-in-60s.html 原文作者:Brendan Gregg,L ...

  5. Linux Performance Analysis and Tools(Linux性能分析和工具)

    首先来看一张图: 上面这张神一样的图出自国外一个Lead Performance Engineer(Brendan Gregg)的一次分享,几乎涵盖了一个系统的方方面面,任何人,如果没有完善的计算系统 ...

  6. [转]linux 系统监控、诊断工具之 IO wait

    1.问题: 最近在做日志的实时同步,上线之前是做过单份线上日志压力测试的,消息队列和客户端.本机都没问题,但是没想到上了第二份日志之后,问题来了: 集群中的某台机器 top 看到负载巨高,集群中的机器 ...

  7. Linux 系统监控和诊断工具:lsof

    1.lsof 简介 lsof 是 Linux 下的一个非常实用的系统级的监控.诊断工具. 它的意思是 List Open Files,很容易你就记住了它是 “ls + of”的组合~ 它可以用来列出被 ...

  8. Linux 下的一个全新的性能测量和调式诊断工具 Systemtap, 第 3 部分: Systemtap

    Systemtap的原理,Systemtap与DTrace比较,以及安装要求和安装步骤本系列文章详细地介绍了一个Linux下的全新的调式.诊断和性能测量工具Systemtap和它所依赖的基础kprob ...

  9. linux 系统监控、诊断工具之 lsof 用法简介

    1.lsof 简介 lsof 是 Linux 下的一个非常实用的系统级的监控.诊断工具. 它的意思是 List Open Files,很容易你就记住了它是 "ls + of"的组合 ...

随机推荐

  1. pip 安装更新卸载 pip/yum换源

    pip安装:sudo apt-get install python3-pip pip更新:sudo pip3 install --upgrade pip pip卸载:sudo apt-get remo ...

  2. 什么是 Spring 配置文件?

    Spring 配置文件是 XML 文件.该文件主要包含类信息.它描述了这些类是如何 配置以及相互引入的.但是,XML 配置文件冗长且更加干净.如果没有正确规划 和编写,那么在大项目中管理变得非常困难.

  3. 用maven建立一个工程3

    在文件夹里面创建一个新文件夹把工程建立在里面

  4. runnable 和 callable 有什么区别

    callable 有返回值,并允许抛出异常 runnable 没有返回值,不会抛出异常

  5. 解决Project出来的问题

    问题显现: 解决办法: 恢复默认布局

  6. Noob渗透笔记

    靶机下载地址:https://www.vulnhub.com/entry/noob-1,746/ kali ip 信息收集 依旧我们先使用nmap扫描确定一下靶机ip nmap -sP 192.168 ...

  7. 攻防世界upload1

    upload1 进入环境就一个上传,我们先上传一个普通的木马文件看看 木马内容 <?php @eval($_POST["cmd"]); ?> 估计是前端校验我们查看源码 ...

  8. 自动驾驶运动规划-Reeds Shepp曲线

    自动驾驶运动规划-Reeds Shepp曲线 相比于Dubins Car只允许车辆向前运动,Reeds Shepp Car既允许车辆向前运动,也允许车辆向后运动. Reeds Shepp Car运动规 ...

  9. Linux编程 | 使用 make

    目录 简单的 makefile 文件 常规的 makefile 文件 常用参数 make 内置规则 后缀和模式规则 make 管理函数库 在Linux 环境中,make 是一个非常重要的编译命令.不管 ...

  10. 《剑指offer》面试题2:实现Singleton 模式

    面试题2:实现Singleton 模式 题目:设计一个类,我们只能生成该类的一个实例.   只能生成一个实例的类是实现了Singleton (单例)模式的类型.由于设计模式在面向对象程序设计中起着举足 ...