python--爬取http://www.kuaidaili.com/并保存为xls

代码如下：

复制在python3上先试试吧^_^

# -*- coding: utf-8 -*-

"""

Created on Mon Jun 12 13:27:59 2017

@author: admin

"""

import urllib.request

import os

import re

from bs4 import BeautifulSoup

import xlwt

os.chdir(r'C:\Users\admin\Desktop') #把文件储存至桌面

url='http://www.kuaidaili.com/'     #网页地址

req=urllib.request.Request(url)     #打开

req.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36')

#增加User-Agent，更改访问的信息，别让后台太关注

response=urllib.request.urlopen(req)    #继续打开

html=response.read().decode('utf-8')    #编码,变为str格式

soup=BeautifulSoup(html,'lxml')   #这个模块太好用了，这个模块也是这个脚本的核心

ww = soup.find_all('tbody')

ww=str(ww)

#rr = re.findall(r'<td (.*)</td>',ww)

#print (ww)      #这几行代码都没有用，但可以发现，ww是不能直接拿来

#print(rr)       #进行检索的，需要要str转一下格式

biaoti=re.findall(r'"(.*)"',ww)

biaoti = set(biaoti)            #把标题去冗余

biaoti=list(biaoti)

biaoti=list(reversed(biaoti))   #所有元素翻转，

item = biaoti.pop(0)

biaoti.insert(2, item)

 # 弹出第一个元素，并作为第三个元素插入，来进行位置的调换

list_name=[]

result=[]

for guanjianzi in biaoti:

#    if rr[i].find(guanjianzi) != -1:

    list_name=re.findall(r'"%s">(.*)</td>'%guanjianzi,ww)

    list_name.insert(0,guanjianzi)      #插入标题

    result.extend(list_name)

hh=[]

for i in range(0,len(result),11): #由一个列表变为

      hh.append(result[i:i+11])   #一个有很多个列表组成的嵌套列表

workbook=xlwt.Workbook()

worksheet=workbook.add_sheet('sheet1',cell_overwrite_ok = True)

for i in range(len(hh)):

    for e in range(len(hh[i])):

         worksheet.write(e,i,hh[i][e])

workbook.save('123.xls')

python--爬取http://www.kuaidaili.com/并保存为xls的更多相关文章

python爬取某个网站的图片并保存到本地
python爬取某个网站的图片并保存到本地 #coding:utf- import urllib import re import sys reload(sys) sys.setdefaultenco ...
使用Python爬取微信公众号文章并保存为PDF文件(解决图片不显示的问题)
前言第一次写博客,主要内容是爬取微信公众号的文章,将文章以PDF格式保存在本地. 爬取微信公众号文章(使用wechatsogou) 1.安装 pip install wechatsogou --up ...
Python 爬取所有51VOA网站的Learn a words文本及mp3音频
Python 爬取所有51VOA网站的Learn a words文本及mp3音频 #!/usr/bin/env python # -*- coding: utf-8 -*- #Python 爬取所有5 ...
python爬取网站数据
开学前接了一个任务,内容是从网上爬取特定属性的数据.正好之前学了python,练练手. 编码问题因为涉及到中文,所以必然地涉及到了编码的问题,这一次借这个机会算是彻底搞清楚了. 问题要从文字的编码讲 ...
python爬取某个网页的图片-如百度贴吧
python爬取某个网页的图片-如百度贴吧作者:vpoet mail:vpoet_sir@163.com 注:随意copy,不用告诉我 #coding:utf-8 import urllib imp ...
Python:爬取乌云厂商列表，使用BeautifulSoup解析
在SSS论坛看到有人写的Python爬取乌云厂商,想练一下手,就照着重新写了一遍原帖:http://bbs.sssie.com/thread-965-1-1.html #coding:utf- im ...
使用python爬取MedSci上的期刊信息
使用python爬取medsci上的期刊信息,通过设定条件,然后获取相应的期刊的的影响因子排名,期刊名称,英文全称和影响因子.主要过程如下: 首先,通过分析网站http://www.medsci.cn ...
python爬取免费优质IP归属地查询接口
python爬取免费优质IP归属地查询接口具体不表,我今天要做的工作就是: 需要将数据库中大量ip查询出起归属地刚开始感觉好简单啊,毕竟只需要从百度找个免费接口然后来个python脚本跑一晚上就o ...
Python爬取豆瓣指定书籍的短评
Python爬取豆瓣指定书籍的短评 #!/usr/bin/python # coding=utf-8 import re import sys import time import random im ...

随机推荐

iOS5 and iOS6都只支持横屏的方法
If your app uses a UINavigationController, then you should subclass it and set the class in IB. You ...
剑指offer44 扑克牌顺序
注意一个边界条件:必须是连续的,如果前后两个数是一样的也不满足条件 class Solution { public: bool IsContinuous( vector<int> numb ...
C++值传递、引用传递和指针传递
#include<iostream> using namespace std; //值传递 void change1(int n){ cout<<"值传递--函数操作 ...
浏览器window产生的缓存九种解决办法
浏览器缓存(Browser Caching)是浏览器端保存数据用于快速读取或避免重复资源请求的优化机制,有效的缓存使用可以避免重复的网络请求和浏览器快速地读取本地数据,整体上加速网页展示给用户.浏览器 ...
Bootstrap历练实例：表单帮助文件
Bootstrap表单控件可以在输入框input上有一个块级帮助文本,为了添加一个占用整个宽度的内容块,请在input后添加help-block. 实例: <!DOCTYPE html>& ...
javase(12)_集合框架_Queue
一.Queue Queye接口体系图体系分析: Deque实现类:ArrayDeque, LinkedList(数组和链表实现双向队列) BlockingDeque实现类:LinkedBlockin ...
iOS dateformatter设置GMT格式时间--iOS开发系列---项目中成长的知识四
今天在项目中开始接手客户端的签名这个模块,签名这个会在项目结束过后再单独写一下自己的心得! 今天讲讲在签名的过程中我们需要向服务器传送一个Date值,格式要求是格林威治时间,也就是GMT时间! 格式要 ...
UVa-10474-大理石在哪
lower_bound()的作用是查找"大于或等于x的第一个位置",但是返回的是地址,所以减去数组的首地址就是偏移量了,也就是整型数字. #include <iostream ...
CentOS6、7安装MySQL5.7全教程
CentOS6.7安装MySQL5.7全教程做开发总得用到数据吧,Linux作为服务器,总得有一个数据库来存储测试用的数据,所以呢,这里附上CentOS6.7安装MySQL5.7的教程喔~ 用到的工 ...
systemverilog之OOP
what is oop terminology an example class default methods for classes static attibute assigment and c ...

python--爬取http://www.kuaidaili.com/并保存为xls

python--爬取http://www.kuaidaili.com/并保存为xls的更多相关文章

随机推荐

热门专题