phantomjs 抓取房产信息

　　　　抓取https://sf.taobao.com/item_list.htm信息

　　　　driver=webdriver.PhantomJS(service_args=['--ssl-protocol=any'])

　　　　or

　　　　driver = webdriver.PhantomJS( service_args=['--ignore-ssl-errors=true'])
　　　　cur_driver=webdriver.PhantomJS(service_args=['--ssl-protocol=any', '--load-images=false']) 
　　　　
　　　　service_args=['--load-images=false']

　　抓取代码

# coding=utf-8

import os

import re

from selenium import webdriver

# from selenium.common.exceptions import TimeoutException

import selenium.webdriver.support.ui as ui

import time

from datetime import datetime

from selenium.webdriver.common.action_chains import ActionChains

import IniFile

# from threading import Thread

from pyquery import PyQuery as pq

import LogFile

import mongoDB

import urllib

class taobao(object):

    def __init__(self):

　　　　　　

        self.driver = webdriver.PhantomJS(service_args=['--ssl-protocol=any'])

        self.driver.set_page_load_timeout(10)

        self.driver.maximize_window()

        self.url ='https://sf.taobao.com/item_list.htm'

    def scrapy_date(self):

        try:

            self.driver.get(self.url)

            selenium_html = self.driver.execute_script("return document.documentElement.outerHTML")

            doc = pq(selenium_html)

            Elements = doc('ul[class="sf-pai-item-list"]').find('li[class="pai-item pai-status-doing"]')

            for element in Elements.items():

                priceinfo = element('div[class="info-section"]').find('p').text().encode('utf8').strip()

                title = element('div[class="header-section "]').find('p').text().encode('utf8').strip()

                print title

                print priceinfo

                print '--------------------------------------------------------------------------------'

        except Exception, e:

            print e.message

        finally:

            pass

obj = taobao()

obj.scrapy_date()

　　抓取结果

phantomjs 抓取房产信息的更多相关文章

NodeJS + PhantomJS 抓取页面信息以及截图
利用PhantomJS做网页截图经济适用,但其API较少,做其他功能就比较吃力了.例如,其自带的Web Server Mongoose最高只能同时支持10个请求,指望他能独立成为一个服务是不怎么实际的 ...
[Python爬虫] 之十一：Selenium +phantomjs抓取活动行中会议活动信息
一.介绍本例子用Selenium +phantomjs爬取活动行(http://www.huodongxing.com/search?qs=数字&city=全国&pi=1)的资讯信息 ...
C#使用Selenium+PhantomJS抓取数据
本文主要介绍了C#使用Selenium+PhantomJS抓取数据的方法步骤,具有很好的参考价值,下面跟着小编一起来看下吧手头项目需要抓取一个用js渲染出来的网站中的数据.使用常用的httpclie ...
[Python爬虫] 之十：Selenium +phantomjs抓取活动行中会议活动
一.介绍本例子用Selenium +phantomjs爬取活动树(http://www.huodongshu.com/html/find_search.html?search_keyword=数字) ...
网络爬虫: 从allitebooks.com抓取书籍信息并从amazon.com抓取价格(3): 抓取amazon.com价格
通过上一篇随笔的处理,我们已经拿到了书的书名和ISBN码.(网络爬虫: 从allitebooks.com抓取书籍信息并从amazon.com抓取价格(2): 抓取allitebooks.com书籍信息 ...
网络爬虫: 从allitebooks.com抓取书籍信息并从amazon.com抓取价格(2): 抓取allitebooks.com书籍信息及ISBN码
这一篇首先从allitebooks.com里抓取书籍列表的书籍信息和每本书对应的ISBN码. 一.分析需求和网站结构 allitebooks.com这个网站的结构很简单,分页+书籍列表+书籍详情页. ...
PHP快速抓取快递信息
<?php header("Content-type:text/html;charset=utf-8"); /** * Express.class.php 快递查询类 * @ ...
CasperJS基于PhantomJS抓取页面
CasperJS基于PhantomJS抓取页面 Casperjs是基于Phantomjs的,而Phantom JS是一个服务器端的 JavaScript API 的 WebKit. CasperJS是 ...
.net抓取网页信息 - Jumony框架使用1
往往在实际开发中,经常会用到一些如抓取网站信息之类的的操作,往往大家采用的是用一些正则的方式获取,但是有时候正则是很死板的,我们常常试想能不能使用jquery的选择器,获取符合自己要求的元素,然后进行 ...

随机推荐

基于rest_framework和redis实现购物车的操作，结算，支付
前奏: 首先,要在主机中安装redis,windows中安装,下载一个镜像,直接进行下一步的安装,安装成功后,在cmd中输入redis-cli 安装python的依赖库: redis 和 ...
使用python获取网易云音乐无损音频教程
博客园主页:http://www.cnblogs.com/handoing/ github项目:https://github.com/handoing/get-163-music 环境:Python ...
apache 把404页面的url转发给php脚本处理
# .htaccess1 RewriteCond %{REQUEST_FILENAME} !-f 2 RewriteRule ^(.*)$ map.php?host=%{HTTP_HOST}& ...
Mysql Sql Explain
1.使用mysql explain的原因在我们php程序员的日常写代码中,有时候会发现我们写的sql语句运行的特别慢,导致响应时间特别长,这种情况在高并发的情况下,我们的网站会直接崩溃,为什么双十一 ...
phpstorm如何进行文件或者文件夹重命名
1.phpstorm的重构 1.1重命名在phpstorm中,右键点击我们要进行修改的文件,然后又一项重构,我们就可以进行对文件的重命名. 接下来点击重命名进行文件或者文件夹的重新命名. 在框中输入 ...
Linux操作命令（三）
本次实验将介绍 Linux 命令中 more.less.head.tail 命令的用法. more less head tail 1.more ·more功能类似cat,cat命令是将整个文件的内容从 ...
AndroidManifest.xml文件详解（application）
http://blog.csdn.net/think_soft/article/details/7557101 语法(SYNATX): <application android:allowTas ...
webpack2 热加载js 文件
如果只要普通的热加载只要如下配置就好了 package.json { "devDependencies": { "webpack": "^2.6.1 ...
Redux 洋葱模型理解
下面的代码会输出: A middleware1 开始C middleware2 开始E middleware3 开始======= G =======F middleware3 结束D middlew ...
[BZOJ4455][ZJOI2016]数星星(容斥DP)
4455: [Zjoi2016]小星星 Time Limit: 10 Sec Memory Limit: 512 MBSubmit: 707 Solved: 419[Submit][Status] ...

phantomjs 抓取房产信息

phantomjs 抓取房产信息的更多相关文章

随机推荐

热门专题