Advanced Awk for Sysadmins
转:http://www.linuxforu.com/2011/06/advanced-awk-for-sysadmins/
By Vishal Bhatia on June 1, 2011 in How-Tos, Sysadmins, Tools / Apps · 0 Comments

To begin with, let’s take a look at how some of the advanced string-manipulation functions are used. One of the most useful functions available in Awk is the split function. It takes three parameters: a source string, an array to store the split elements (starting from index 1) and the optional field separator fs. The fs can also be a regular-expression. The split function is very useful whenever there is a need for more than one delimiter; for example, if the Squid log has entries like the following:
1300063856.476 199 127.0.0.1 TCP_MISS/204 396 GET http://www.google.co.in/csi? - DIRECT/74.125.71.104 text/html1300063865.415 28710 127.0.0.1 TCP_MISS/200 18303 CONNECT www.google.com:443 - DIRECT/74.125.71.104 - |
To fetch the FQDN or IP for a URL from each log entry, using the default fs of " ", check for the occurrence of the pattern GET in the 6th field — and if found, split the 7th field into an array url, using the delimiter as /, and print the 3rd index of url:
awk '$6~/GET/{split($7,url,"/"); print url[3]}' /var/log/squid/access.logwww.gmail.commail.google.comgoogle.co.inwww.google.co.in |
Another very important string manipulation function is gsub(r, t, s). As the man page states, it substitutes t for occurrences of the regular expression r in the string s. If s is not given, $0 is used.
Awk also provides a sub(), which is the same as gsub, except that it only replaces the first occurrence. For example, to replace every occurrence of one or more spaces in the file with a single tab, you could use the following code:
awk '{gsub(/ +/,"\t"); print $0}' <file_name> |
Besides these, there are several other string-manipulation functions like index, length, match,strtonum, substr, toupper, tolower, etc.
Time conversion
Now, let’s have a look at the time conversion functions available in Awk, which provides three very useful functions to fetch or manipulate time: systime(), strftime() and mktime().
systime(): Returns the current time-stamp, for example,awk 'BEGIN{print systime()}'.strftime(): Returns the time in the specified format (similar to what is used for the date command). For example,%d-%m-%Y %H:%M:%Scan be used to return time in the “DD-MM-YYYY HH24:MI:SS” format.mktime(): It takes adatespecof the form YYYY MM DD HH MM SS[ DST] and returns a time-stamp of the same form as returned bysystime().
The following code snippet can be used to convert the time-stamp provided in Squid logs to the standard date format:
awk '{$1=strftime("%d-%m-%Y %H:%M:%S",$1); print $0}' /var/log/squid/access.log |
Often, systems administrators need to figure out a date which is a few days prior to or after the current date, which is helpful for archiving or purging log files older than a particular date. Some simple logic, along with the strftime and systime functions, can make this task very trivial.
Since Linux stores time as the number of seconds since epoch (midnight, January 1, 1970 GMT), and there are 86,400 seconds in a day, we can easily calculate the date a few days prior to or after the current date, using the code snippets below, which return the dates 5 days after and 8 days prior to the current date:
awk -v days="5" 'BEGIN{print strftime("%Y-%m-%d",systime()+(days*86400))}'awk -v days="-8" 'BEGIN{print strftime("%Y-%m-%d",systime()+(days*86400))}' |
Arrays in Awk
Awk’s support of associative arrays, combined with its ability to treat individual columns as fields, provides a very powerful mechanism for analysing and processing data. This allows for running SQL-like grouping functions on columns of text. Before having a look at practical examples, let’s briefly discuss arrays in Awk.
Just like other variables in Awk, we don’t have to explicitly define arrays or the type of data they would contain. An array index can be any integer or a string, or even both for a single array. Thus, an array variable could contain both the number “10″ and the string “error” as its indexes. Awk doesn’t support multidimensional arrays, but we will shortly see how we can combine different fields to emulate that functionality.
Example uses of arrays
To get a count of the connections opened by a client to access the SSH service on your server, you’d use the netstat -antp command, whose output contains the following seven fields:
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program nametcp 0 0 ::ffff:192.168.56.101:22 ::ffff:192.168.56.1:53063 ESTABLISHED 2655/2tcp 0 0 ::ffff:192.168.56.101:22 ::ffff:192.168.56.1:53064 ESTABLISHED 2696/4 |
To process this, let’s use:
netstat -antp | awk -v IGNORECASE=2 -v prt=22 '{ split(,laddr,":"); split(,faddr,":"); if(laddr[5]==prt)sum[faddr[4]]+=1}END{for(ip in sum) print ip"\t"sum[ip]}'192.168.56.101 1192.168.56.1 2 |
In this case, split the 4th field (destination-address) and the 5th (source-address), using the : as a separator; take the 5th (destination port) and the 4th (source IP) indexes respectively from the resultant arrays, and generate a sum on a per-source-IP basis.
We have not considered the state of the connection — whether it’s OPEN, ESTABLISHED orTIME_WAIT. However, if the need is to consider the connection state, too, then we can emulate multidimensional array functionality by combining multiple fields to form an array index.
For example, by changing the port to 80, and adding the 6th field to the array index at the time of the sum calculation, we can get the source IP and connection-state-wise sum for the connections established to the Web server:
netstat -antp | awk -v IGNORECASE=2 -v prt=22 '{ split(,laddr,":"); split(,faddr,":"); if(laddr[5]==prt)sum[faddr[4]":"]+=1}END{for(ip in sum) print ip"\t"sum[ip]}'192.168.56.1:TIME_WAIT 1 |
User-defined functions
Awk also allows for the creation of user-defined functions, which can be used within the script. Functions can be declared anywhere within the code, even after they have been used. This is because Awk reads the entire program before executing it.
#! /bin/sh#User defined functions exampleawk '{nodecount()} # Execute the user-defined nodecount() for each row to update the node count$0~/<clusternode /,/<\/clusternode>/{ print } # Print the rows between "<clusternode " and "</clusternode>"function nodecount() #Function definition for nodecount{if($0~/clusternode name/){count+=1}}END{print "Node Count="count}' $1 #Print the count value in the END pattern which is executed oncesh test.sh cluster.conf <clusternode name="node-01.example.com" nodeid="1"> <fence> </fence> </clusternode> <clusternode name="node-02.example.com" nodeid="2"> <fence> </fence> </clusternode> <clusternode name="node-03.example.com" nodeid="3"> <fence> </fence> </clusternode>Node Count=3 |
In the example above, we have created a script, test.sh and passed the cluster.conf file as the input to it. The script uses the nodecount() function before declaring it, and executes it for every row. A counter is updated for each row containing the entry “clusternode name”. Then it prints the information for each node, using the range-pattern <clusternode >,</clusternode>. In the end, the count is displayed in the END block. This example also demonstrates how range-patterns in Awk can be used for parsing XML files.
Range-patterns: The range-pattern is a very useful tool for extracting information from text-files. For example, to get the table definitions from a MySQL dump, the range-pattern create table `,; can be used, since ; marks the end of a table definition:
awk -v IGNORECASE=1 '/CREATE TABLE `/,/;/' dump_file_name |
Similarly, if we know the text specifying the beginning (create table `<tablename>) and ending (drop table `<next table>) of a table of data in the MySQL dump file, we can use that as a range pattern to extract the data for that particular table. We can pass the table name as a variable, tname, to the script:
awk -v IGNORECASE=1 -v tname=host 'BEGIN{str="^CREATE TABLE `"}$0~str""tname,/drop table/{ if($0!~/drop table/)print}' dump_file_name |
However, for a scenario where some text is contained within tags specified by the same markers (for example, table data between multiple CREATE TABLE statements within a dump file), the next statement, along with a marker variable, can be used to extract the data. The script below demonstrates this:
#! /bin/shawk -v IGNORECASE=1 -v tname=$1 ' #Variable tname provides the table nameBEGIN{str="^CREATE TABLE `";defstart=0} #Specify the pattern to match and set the marker variable to default$0~str""tname{ defstart=!defstart; print;next} #If row matches the table name, toggle the variable, print the row and switch to next row without further processing.defstart==1{if($0~str||$0~/drop table/){exit}else{print}} #If the marker variable is set, print the rows till the next row with the create or drop table statement' $2 |
Let’s pass two arguments to this script; the first one is the table name, and the second the dump file.
In the BEGIN block, we need to specify the pattern (begins with create table `) and set the marker variable to match the table name to 0. When the pattern and table name are matched in a row, we can toggle the variable, print the row, and use the next statement to jump to the next row of the dump file, without processing further statements in the script. Since the marker variable is now set, further statements will be executed for each row, till the pattern is found again in the dump file. Thus, we will get all the data contained within the markers.
In this article, we have seen how the functionality provided by Awk can quickly come to the aid of systems administrators whenever they need to parse, summarise or extract certain data from text files. Awk is fairly easy to learn if you already know C. For those who are interested in learning more about Awk, a very detailed user guide is available here.
Advanced Awk for Sysadmins的更多相关文章
- Awk by Example--转载
原文地址: http://www.funtoo.org/Awk_by_Example,_Part_1?ref=dzone http://www.funtoo.org/Awk_by_Example,_P ...
- [笔记]The Linux command line
Notes on The Linux Command Line (by W. E. Shotts Jr.) edited by Gopher 感觉博客园是不是搞了什么CSS在里头--在博客园显示效果挺 ...
- 《Advanced Bash-scripting Guide》学习(六):从/etc/fstab中读行
本文所选的例子来自于<Advanced Bash-scripting Gudie>一书,译者 杨春敏 黄毅 ABS书上的例子: 代码块和I/O重定向 #!/bin/bash #从/etc/ ...
- 《Advanced Bash-scripting Guide》学习(四):一个显示时间日期登录用户的脚本
本文所选的例子来自于<Advanced Bash-scripting Gudie>一书,译者杨春敏 黄毅 编写一个脚本,显示时间和日期,列出所有的登录用户,显示系统的更新时间.然后这个脚本 ...
- awk命令简介
awk是一个强大的文本分析工具,相对于grep的查找,sed的编辑,awk在其对数据分析并生成报告时,显得尤为强大.简单来说awk就是把文件逐行的读入,以空格为默认分隔符将每行切片,切开的部分再进行各 ...
- awk使用说明
原文地址:http://www.cnblogs.com/verrion/p/awk_usage.html Awk使用说明 运维必须掌握的三剑客工具:grep(文件内容过滤器),sed(数据流处理器), ...
- awk应用
h3 { color: rgb(255, 255, 255); background-color: rgb(30,144,255); padding: 3px; margin: 10px 0px } ...
- 3.awk数组详解及企业实战案例
awk数组详解及企业实战案例 3.打印数组: [root@nfs-server test]# awk 'BEGIN{array[1]="zhurui";array[2]=" ...
- shell——awk
awk -F"分隔符" "command" filename awk -F":" '{print $1}' /etc/passwd 字段引用 ...
随机推荐
- UIActivityViewController 自定义选项
UIActivityViewController 自定义选项 重写 UIActivity 类 建议下载github上源码学习一下 https://github.com/samvermette/SVWe ...
- Cocos2d-JS v3.0 alpha
Cocos2d-JS是整合了Cocos2d-html5 v3.0 alpha和Cocos2d-x JSBinding的新JS引擎仓库.整合之后的核心优势在于Html5和JSB的开发流程及API现在变得 ...
- html5 canvas 移动小方块
<!doctype html> <html> <head> <meta charset="utf-8"> <title> ...
- 转】启动tomcat时 错误: 代理抛出异常 : java.rmi.server.ExportException: Port already in use: 1099的解决办法
原博文出自于:http://www.cnblogs.com/xdp-gacl/p/5288399.html 感谢! 一.问题描述 今天一来公司,在IntelliJ IDEA 中启动Tomcat服务 ...
- Innodb中的事务隔离级别和锁的关系(转)
原文:http://tech.meituan.com/innodb-lock.html 前言: 我们都知道事务的几种性质,数据库为了维护这些性质,尤其是一致性和隔离性,一般使用加锁这种方式.同时数据库 ...
- 什么是ADB
ADB, Android Debug Bridge, 是一个client-server程序,可以用来和安卓设备交流 Client: 用来发送命令的,client运行在开发机器上(电脑cmd, adb ...
- TCP三次握手及四次挥手详细图解(未完)
TCP是主机对主机层的传输控制协议,提供可靠的连接服务,采用三次握手确认建立一个连接: (完成三次握手,客户端与服务器开始传送数据) 所谓三次握手(Three-way Handshake),是指建立一 ...
- GRUB加密
在 /etc/grub.conf 内添加password=密码(也可使用加密的密码password= --md5 加密过的密码) 如何获得加密密码? 那就是grub-md5-crypt命令 简单流程如 ...
- HDU 5705 Clock (精度控制,暴力)
题意:给定一个开始时间和一个角度,问你下一个时刻时针和分针形成这个角度是几点. 析:反正数量很小,就可以考虑暴力了,从第一秒开始暴力,直到那个角度即可,不会超时的,数目很少,不过要注意精度. 代码如下 ...
- Vmware 虚拟的Linux系统如何与宿主主机共享上网
学校局域网内的机器是经过一个计费登陆客户端Gmon上网的,我前两天刚用Vmware虚拟了一个Linux Guest OS 用作测试用,在Vmware的VM>>Settings 里 ...