python2.7 urllib2 爬虫

# _*_ coding:utf-8 _*_

import urllib2
import cookielib
import random
import re
from bs4 import BeautifulSoup
import datetime

dax = datetime.datetime.now().strftime('%Y-%m-%d')
print(dax)

url = 'http://ww=singlemessage&isappinstalled=0'

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
request = urllib2.Request(url)
headers = [
'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; 360SE)',
'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Maxthon 2.0)',
'Opera/9.80 (Windows NT 6.1; U; en) Presto/2.8.131 Version/11.11',
'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0',
'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50'
]

hds = random.choice(headers)
# print(hds)
request.add_header('User-Agent','%s' % hds)
#response = urllib2.urlopen("http://www.hn1m=singlemessage&isappinstalled=0")
response = urllib2.urlopen(request)
cont = response.read()
#print(cont)

soup = BeautifulSoup(cont,'html.parser',from_encoding='utf-8')
# print(soup)
# listyj = soup.find_all('dl')
# for listyjx in listyj:
# print(listyjx.name,listyjx.attrs,listyjx.gettext())
# # if dax in listyjx:
# # print(listyjx)

python2.7 urllib2 爬虫的更多相关文章

python2下经典爬虫（第一卷）
python2.7的爬虫个人认为比较经典在此我将会用书中的网站http://example.webscraping.com作为案例爬虫第一步:进行背景调研了解网站的结构资源在网站的robots.t ...
Python2和Python3 爬虫转换
由于Python3的不断完善,很多新入Python的小伙伴选择了Python3的阵营,很多人选择了爬虫这一热门话题,但是网络上大部分教程都是Python2 教程,Python3这一块做了些许的改动,对 ...
关于urllib、urllib2爬虫伪装的总结
站在网站管理的角度,如果在同一时间段,大家全部利用爬虫程序对自己的网站进行爬取操作,那么这网站服务器能不能承受这种负荷?肯定不能啊,如果严重超负荷则会时服务器宕机(死机)的,对于一些商业型的网站,宕机 ...
[Selenium2+python2.7][Scrap]爬虫和selenium方式下拉滚动条获取简书作者目录并且生成Markdown格式目录
预计阅读时间: 15分钟环境: win7 + Selenium2.53.6+python2.7 +Firefox 45.2 (具体配置参考 http://www.cnblogs.com/yoyok ...
python2与python3爬虫中get与post对比
python2中的urllib2改为python3中的urllib.request 四种方式对比: python2的get: # coding=utf-8 import urllib import u ...
python2.x urllib2和urllib的使用
1.最简单用法 urlopen(url, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT,...) import urllib2 import ur ...
Python2 基于urllib2 的HTTP请求类
一个利用urllib2模块编写的下载器,虽然有了requests模块,但是毕竟标准库 import urllib2,random class strong_down(): def __init__(s ...
python2中urllib2模块带cookies使用方法
#!/usr/bin/python # coding=utf-8 #############方式1######################### import urllib2 cookie = & ...
python3--网络爬虫--爬取图片
网上大多爬虫仍旧是python2的urllib2写的,不过,坚持用python3(3.5以上版本可以使用异步I/O) 相信有不少人爬虫第一次爬的是Mm图,网上很多爬虫的视频教程也是爬mm图,看了某人的 ...

随机推荐

position实现分层和遮罩层功能
很多网站,当点了一个按钮后,弹出一个窗口,底层变透明不可选,就是用到层的概念,至少三层第一层,底层原始层第二层,遮罩层,用到positon: fixed; top bottom left righ ...
linux 服务器之间配置免密登录
客户机:172.16.1.2 远程机:172.16.1.3 1.远程机 a.允许root用户通过22端口登录 vi /etc/ssh/sshd_config PORT 22 PermitRootLog ...
Laravel 输出最后一条sql
$queries = DB::getQueryLog(); $last_query = end($queries); print_r( $last_query);
QT中webkit去掉默认的右键菜单
在qt设计师中,选择webview,按下图选择那一行设置contextMenuPolicy属性:
C++11--智能指针shared_ptr，weak_ptr，unique_ptr <memory>
共享指针 shared_ptr /*********** Shared_ptr ***********/ // 为什么要使用智能指针,直接使用裸指针经常会出现以下情况 // 1. 当指针的生命长于所指 ...
WhereHows前后端配置文件
前端: # This is the main configuration file for the application. # ~~~~~ # Secret key # ~~~~~ # The se ...
Flashbuilder的bug FlashBuilder 1119: 访问可能未定义的属性 on (通过 static 类型
FlashBuilder 1119: 访问可能未定义的属性 on (通过 static 类型当此问题出现的时候无论刷新清理注释删除乃至重启电脑都无济于事. 解决方法:备份此类到另外一个地 ...
高CPU排查方法分享
1 软件性能较差,占用CPU较多,往往是由于某段代码逻辑算法不佳导致,那如何在数以千计的函数中找到问题函数呢?2 在使用!runaway命令比较不同时间各线程占用CPU时间,找到CPU时间增涨较多的线 ...
C与C++的部分区别
1.函数无形参情况 void test() { } int main() { test(,); ; } 在C语言中形参括号没有参数时代表接受任意多的参数,而在C++语言中代表void(无参数) 所以上 ...
廖雪峰Java3异常处理-1错误处理-4自定义异常
JDK已有的异常: RuntimeException * NullPointerException * IndexOutOfBoundsException * SecurityException * ...

python2.7 urllib2 爬虫

python2.7 urllib2 爬虫的更多相关文章

随机推荐

热门专题