[工具开发] Perl 爬虫脚本--从美国国家漏洞数据库抓取实时信息

一、简介

美国国家漏洞数据库收集了操作系统，应用软件的大量漏洞信息，当有新的漏洞出现时，它也会及时发布出来．

由于信息量巨大，用户每次都需要到它的网站进行搜索，比较麻烦．如果能有个工具，每天自动分析它发布的漏洞数据库，如果发现有所需要的新漏洞信息，通过邮件自动发送给公司的系统或者安全管理员就好了．

下面我写的这个工具就是起到这个作用的．图片是工具自动发送的邮件截图：

它每天都会根据用户设置的关键字自动抓取 NVD 数据，然后和前一天抓取的数据进行对比分析，当发现今天有新的数据时就发送邮件给用户，否则不发送．

二、效果截图

三、源代码

#!/usr/bin/perl -w

#hahp@qq.com

use 5.10.;

use strict;

use LWP::Simple;

use Net::SMTP;

use MIME::Base64;

use Encode qw/ decode encode /;

my $REC_DIR = '/home/hupeng/nvd';

my @query_keywords = qw/ kernel tomcat apache spring /;

my $TO_DAY = `date +%Y-%m-%d`;

my $LAST_DAY = `date +%Y-%m-%d -d '-1 days'`;

my $THIS_MONTH = `date +%m -d '-1 months'`;

my $NEXT_MONTH = `date +%m`;

my $THIS_YEAR = `date +%Y`;

my $NEXT_YEAR = `date +%Y -d '+1 months'`;

chomp($TO_DAY);

chomp($LAST_DAY);

chomp($THIS_MONTH);

chomp($THIS_YEAR);

chomp($NEXT_MONTH);

chomp($NEXT_YEAR);

$THIS_MONTH =~ s/^+//g;

my $nvdfile_lastday = "$REC_DIR/nvd_$LAST_DAY.txt";

my $nvdfile_today = "$REC_DIR/nvd_$TO_DAY.txt";

my $nvd_url_pre = 'http://web.nvd.nist.gov/view/vuln/detail?vulnId=';

#my $sev_base = 'MEDIUM_HIGH';

my $theSmtpServer = 'XXXX';

my $theSmtpUser = 'XXXX';

my $theSmtpPasswd = 'XXXXX';

my $theSmtpSend = 'XXXXX';

my @theSmtpTo = ('hupeng@test2.com','hupeng@test.com');

my $theSmtpSubject = 'NVD 新记录 '.$TO_DAY;

my $query_keywords_str = &arr2str0(@query_keywords);

my $theSmtpBody = '<p>NVD 新记录</p><br><p>关键字：'.$query_keywords_str.'</p><br>';

sub str2arr {

        my ($str) = @_;

        $str =~ s/^\n|\n$//g;

        my @arr = split /\n/,$str;

        @arr = sort(@arr);

    #@arr = keys %{{ map { $_ => 1 } @arr }};

        return @arr;

}

sub arr2str {

        my @arr = @_;

        my $str = '';

        @arr = sort(@arr);

        foreach(@arr){

                $str = $str.$_."\n";

        }

        return $str;

}

sub arr2str0 {

        my @arr = @_;

        my $str = '';

        @arr = sort(@arr);

        foreach(@arr){

                $str = $str.$_.', ';

        }

    $str =~ s/,\ $//g;

        return $str;

}

sub getContent {

    my ($query_keywords) = @_;

    my @content = ();

    foreach my $query_keyword (@query_keywords){

        #my $url = "http://web.nvd.nist.gov/view/vuln/search-results?adv_search=true\&cves=on\&query=$query_keyword\&pub_date_start_month=$start_month\&pub_date_start_year=$start_year\&cvss_sev_base=$sev_base\&cve_id=";

        #my $url = "http://web.nvd.nist.gov/view/vuln/search-results?adv_search=true\&cves=on\&query=$query_keyword";

        my $url = "http://web.nvd.nist.gov/view/vuln/search-results?adv_search=true\&cves=on\&query=$query_keyword\&pub_date_start_month=$THIS_MONTH\&pub_date_start_year=$THIS_YEAR\&cve_id=";

        my $tmpStr = get($url);

        my @tmpArr = &str2arr($tmpStr);

        $tmpStr = '';

        foreach(@tmpArr){

            my $str = $_;

            chomp($str);

            $str =~ s/\s+//g;

            if( $str =~ m/BodyPlaceHolder_cplPageContent_plcZones_lt_zoneCenter_VulnerabilitySearchResults_VulnResultsRepeater_[\w]+(Anchor_.*$)/ ){

                push(@content,$query_keyword.$."\n");

            }

        }

        @content = keys %{{ map { $_ =>  } @content }};

        @content = sort(@content);

        @tmpArr = ();

    }

    return @content;

}

sub getNvd {

        my ($nvd_file) = @_;

    my $maxnvd = '';

    my @nvds = ();

    my %result = ('maxnvd'=>'','nvds'=>[]);

    if( open(FILE, "$nvd_file") ){

        while(<FILE>){

            push(@nvds, $_);

        }

        close FILE;

        foreach(@nvds){

            if( $_ gt $maxnvd ){

                $maxnvd = $_;

            }

        }

    }

    $result{'maxnvd'} = $maxnvd;

    $result{'nvds'} = [@nvds];

    @nvds = ();

    return %result;

}

sub putNvd {

    my ($content,$nvd_file) = @_;

    if ( open(FILE, "> $nvd_file") ){

        foreach (@$content){

            if ($_ =~ m/[\w-]+Anchor_[\d]+">([\w-]+)<\/a>/){

                print FILE $1."\n";

            }

        }

        close FILE;

    }

}

sub getNewNvdRds {

    my ($maxNvd_lastday,$nvdsToday,$content) = @_;

    my @newNvds = ();

    foreach (@{$nvdsToday}){

        my $nvd = '';

        if( $_ gt $maxNvd_lastday){

            my $str = $_;

            chomp($str);

            foreach my $ln1 (@{$content}){

                if( $ln1 =~ m/^([\w-]+Anchor_[\d]+\">)$str<\/a>$/ ){

                    my $nvdID = $1;

                    foreach my $ln2 (@{$content}){

                        if( $ln2 =~ m/^$nvdID([\d.]+)<\/a>([\w]+)$/ ){

                            $nvd = '<a href="'.$nvd_url_pre.$str.'">'.$str.'</a>  CVSS Severity:  '.encode('UTF-8',$1).'  '.encode('UTF-8',$2).'<br>';

                        }

                    }

                }

            }

            push(@newNvds,$nvd);

        }

    }

    return @newNvds;

}

# get max value of last day

my %tmpHsh = ();

%tmpHsh = &getNvd($nvdfile_lastday);

my $maxNvd_lastday = $tmpHsh{'maxnvd'};

# get content of today

# nvd 记录的详细信息

my @content = &getContent(@query_keywords);

# put values of today

&putNvd([@content],$nvdfile_today);

# get max value of today

%tmpHsh = &getNvd($nvdfile_today);

my $maxNvd_today = $tmpHsh{'maxnvd'};

# get all values of today

my @nvdsToday = @{$tmpHsh{'nvds'}};

%tmpHsh = ();

# find new values

# 排版后新记录的详细信息

my @newNvdRds = &getNewNvdRds($maxNvd_lastday,[@nvdsToday],[@content]);

# send email

my $count = @newNvdRds;

if( $count ){

    $theSmtpBody .= &arr2str(@newNvdRds);

    $theSmtpBody .= '<br><br>'.$TO_DAY.'<br><br>';

    my $theSmtp = Net::SMTP->new($theSmtpServer,Timeout=>10);

    $theSmtp->auth($theSmtpUser,$theSmtpPasswd);

    $theSmtp->mail($theSmtpSend);

    $theSmtp->to(@theSmtpTo);

    $theSmtp->data();

    $theSmtp->datasend("To: @theSmtpTo\n");

    $theSmtp->datasend("Content-Type:text/html;charset=UTF-\n");

    $theSmtp->datasend("Subject:=?UTF-?B?".encode_base64($theSmtpSubject, '')."?=\n\n");

    $theSmtp->datasend("\n");

    $theSmtp->datasend($theSmtpBody);

    $theSmtp->dataend();

    $theSmtp->quit;

}

[工具开发] Perl 爬虫脚本--从美国国家漏洞数据库抓取实时信息的更多相关文章

网络爬虫：使用Scrapy框架编写一个抓取书籍信息的爬虫服务
上周学习了BeautifulSoup的基础知识并用它完成了一个网络爬虫( 使用Beautiful Soup编写一个爬虫系列随笔汇总 ), BeautifulSoup是一个非常流行的Python网 ...
网络爬虫: 从allitebooks.com抓取书籍信息并从amazon.com抓取价格(3): 抓取amazon.com价格
通过上一篇随笔的处理,我们已经拿到了书的书名和ISBN码.(网络爬虫: 从allitebooks.com抓取书籍信息并从amazon.com抓取价格(2): 抓取allitebooks.com书籍信息 ...
scrapy爬虫学习系列五：图片的抓取和下载
系列文章列表: scrapy爬虫学习系列一:scrapy爬虫环境的准备: http://www.cnblogs.com/zhaojiedi1992/p/zhaojiedi_python_00 ...
网络爬虫: 从allitebooks.com抓取书籍信息并从amazon.com抓取价格(2): 抓取allitebooks.com书籍信息及ISBN码
这一篇首先从allitebooks.com里抓取书籍列表的书籍信息和每本书对应的ISBN码. 一.分析需求和网站结构 allitebooks.com这个网站的结构很简单,分页+书籍列表+书籍详情页. ...
网络爬虫: 从allitebooks.com抓取书籍信息并从amazon.com抓取价格(1): 基础知识Beautiful Soup
开始学习网络数据挖掘方面的知识,首先从Beautiful Soup入手(Beautiful Soup是一个Python库,功能是从HTML和XML中解析数据),打算以三篇博文纪录学习Beautiful ...
(9)分布式下的爬虫Scrapy应该如何做-关于ajax抓取的处理(一)
转载请注明出处:http://www.cnblogs.com/codefish/p/4993809.html 最近在群里频繁的被问到ajax和js的处理问题,我们都知道,现在很多的页面都是用动态加载的 ...
python3爬虫再探之豆瓣影评数据抓取
一个关于豆瓣影评的爬虫,涉及:模拟登陆,翻页抓取.直接上代码: import re import time import requests import xlsxwriter from bs4 imp ...
【asp.net爬虫】asp.NET分页控件抓取第n页数据 javascript:__doPostBack
最近在模拟HTTP请求抓取数据,但是服务器是asp.net开发的分页控件代码 <tr> <td align="left">共&nbsp210&am ...
Python爬虫入门教程 29-100 手机APP数据抓取 pyspider
1. 手机APP数据----写在前面继续练习pyspider的使用,最近搜索了一些这个框架的一些使用技巧,发现文档竟然挺难理解的,不过使用起来暂时没有障碍,估摸着,要在写个5篇左右关于这个框架的教程 ...

随机推荐

SAP自定义打印机纸张
1 执行Spad,->点击"完全管理"->点击"设备类型" 2 点击"页格式" 3 新增两个页格式, 点击更改->创建 ...
matlab 已知函数值纵坐标值（Y值）获得对应的横坐标
clear all;clc; x=-pi/2:pi/50:pi; y=sin(x); plot(x,y); grid on; fm=max(y) id=find(y==fm); xm=x(id) 转自 ...
uva562 Dividing coins 01背包
link:http://uva.onlinejudge.org/index.php?option=com_onlinejudge&Itemid=8&page=show_problem& ...
UpdatePanel与$.function()同时使用问题
在.NET中使用了UpdatePanel,里面的输入框使用了jQuery的日历选择器,接下来介绍下两者同时使用的一些细节及问题的解决方法,感兴趣的各位可以参考下哈今天,在.NET中使用了Update ...
虚拟化_KVM
一.KVM介绍 1.KVM全称kernel vitual machine,是针对包含虚拟化扩展(InterVT或AMD-V)的x86硬件上的完全原生的虚拟化解决方案 2.KVM是以色列Qumranet ...
JavaScript判断IE各版本最完美解决方案
https://github.com/nioteam/jquery-plugins/issues/12 jQuery在1.9版本之前,提供了一个浏览器对象检测的属性$.browser,使用率极高.但是 ...
IE 下加载jQuery
转:http://www.iitshare.com/ie8-not-use-native-json.html 解决在IE8中无法使用原生JSON的问题起因在项目中要将页面上的js对象传给后台, ...
Python爬虫学习笔记——豆瓣登陆(三)
之前是不会想到登陆一个豆瓣会需要写三次博客,修改三次代码的. 本来昨天上午之前的代码用的挺好的,下午时候,我重新注册了一个号,怕豆瓣大号被封,想用小号爬,然后就开始出问题了,发现无法模拟登陆豆瓣了,开 ...
<C Traps and Pitfalls>笔记
//------------------------------------------------------------------------------ 2.1 理解函数的声明: 编写一个独立 ...
objective-c new关键字
xxx *a = [xxx new] 等价于 xxx *a = [[xxx alloc]init] ,但如果类的构造函数带参数就不能使用new了. 练习了下<Objective-C 基础教程&g ...

[工具开发] Perl 爬虫脚本--从美国国家漏洞数据库抓取实时信息

[工具开发] Perl 爬虫脚本--从美国国家漏洞数据库抓取实时信息的更多相关文章

随机推荐

热门专题