selenium 速查手册 python版

1.安装与配置

pip install selenium

基本使用selenium都是为了动态加载网页内容用于爬虫，所以一般也会用到phantomjs

mac下如果要配置phantomjs环境的话

echo $PATH

ln -s <phantomjs地址> <PATH中任一路径>

至于chromeDriver，配置方法类似，下载地址：

https://sites.google.com/a/chromium.org/chrom selenium import webdriver

2.代码样例

from selenium import webdriver

from selenium.common.exceptions import TimeoutException

from selenium.webdriver.support.ui import WebDriverWait # available since 2.4.0

from selenium.webdriver.support import expected_conditions as EC # available since 2.26.0

# Create a new instance of the Firefox driver

driver = webdriver.Firefox()

# go to the google home page

driver.get("http://www.google.com")

# the page is ajaxy so the title is originally this:

print driver.title

# find the element that's name attribute is q (the google search box)

inputElement = driver.find_element_by_name("q")

# type in the search

inputElement.send_keys("cheese!")

# submit the form (although google automatically searches now without submitting)

inputElement.submit()

try:

    # we have to wait for the page to refresh, the last thing that seems to be updated is the title

    WebDriverWait(driver, 10).until(EC.title_contains("cheese!"))

    # You should see "cheese! - Google Search"

    print driver.title

finally:

    driver.quit()

3.api速查

3.1定位元素

3.1.1 通过id查找：

element = driver.find_element_by_id("coolestWidgetEvah")

or

from selenium.webdriver.common.by import By

element = driver.find_element(by=By.ID, value="coolestWidgetEvah")

3.1.2 通过class查找

cheeses = driver.find_elements_by_class_name("cheese")

or

from selenium.webdriver.common.by import By

cheeses = driver.find_elements(By.CLASS_NAME, "cheese")

3.1.3 通过标签名称查找

target_div = driver.find_element_by_tag_name("div")

or

from selenium.webdriver.common.by import By

target_div = driver.find_element(By.TAG_NAME, "div")

3.1.4 通过name属性查找

btn = driver.find_element_by_name("input_btn")

or

from selenium.webdriver.common.by import By

btn = driver.find_element(By.NAME, "input_btn")

3.1.5 通过链接的内容查找

next_page = driver.find_element_by_link_text("下一页")

or

from selenium.webdriver.common.by import By

next_page = driver.find_element(By.LINK_TEXT, "下一页")

3.1.6 通过链接的部分内容查找

next_page = driver.find_element_by_partial_link_text("去下一页")

or

from selenium.webdriver.common.by import By

next_page = driver.find_element(By.PARTIAL_LINK_TEXT, "下一页")

3.1.7 通过css查找

cheese = driver.find_element_by_css_selector("#food span.dairy.aged")

or

from selenium.webdriver.common.by import By

cheese = driver.find_element(By.CSS_SELECTOR, "#food span.dairy.aged")

3.1.8 通过xpath查找

inputs = driver.find_elements_by_xpath("//input")

or

from selenium.webdriver.common.by import By

inputs = driver.find_elements(By.XPATH, "//input")

3.1.9 通过js查找

labels = driver.find_elements_by_tag_name("label")

inputs = driver.execute_script(

    "var labels = arguments[0], inputs = []; for (var i=0; i < labels.length; i++){" +

    "inputs.push(document.getElementById(labels[i].getAttribute('for'))); } return inputs;", labels)

3.2 获取元素的文本信息

element = driver.find_element_by_id("element_id")

element.text

3.3 修改userAgent

profile = webdriver.FirefoxProfile()

profile.set_preference("general.useragent.override", "some UA string")

driver = webdriver.Firefox(profile)

3.4 cookies

# Go to the correct domain

driver.get("http://www.example.com")

# Now set the cookie. Here's one for the entire domain

# the cookie name here is 'key' and its value is 'value'

driver.add_cookie({'name':'key', 'value':'value', 'path':'/'})

# additional keys that can be passed in are:

# 'domain' -> String,

# 'secure' -> Boolean,

# 'expiry' -> Milliseconds since the Epoch it should expire.

# And now output all the available cookies for the current URL

for cookie in driver.get_cookies():

    print "%s -> %s" % (cookie['name'], cookie['value'])

# You can delete cookies in 2 ways

# By name

driver.delete_cookie("CookieName")

# Or all of them

driver.delete_all_cookies()

最后放一个自己的代码样例好了，完成的功能为找到搜索框输入搜索关键词然后点击搜索按钮，然后打开每个搜索结果并且输出网页源代码

# coding=utf-8

import time

from selenium import webdriver

from selenium.common.exceptions import TimeoutException

from selenium.webdriver.support.ui import WebDriverWait # available since 2.4.0

from selenium.webdriver.support import expected_conditions as EC # available since 2.26.0

# Create a new instance of the Firefox driver

driver = webdriver.Chrome()

# go to the home page

driver.get("http://www.zjcredit.gov.cn")

#获得当前窗口句柄

nowhandle = driver.current_window_handle

print driver.title

# find the element that's name attribute is qymc (the search box)

inputElement = driver.find_element_by_name("qymc")

print inputElement

# type in the search

inputElement.send_keys(u"同花顺")

driver.find_element_by_name("imageField").click();

# submit the form (compare with google we can found that the search is not a standard form and can not be submitted, we do click instead)

# inputElement.submit()

try:

    # overlap will happen if we do not move the page to the bottom

    # the last link will be under another unrelevant link if we do not scroll to the bottom

    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    #find all link and click them

    for item in driver.find_elements_by_xpath('//*[@id="pagetest2"]/div/table/tbody/tr/td/a'):

        item.click()

        time.sleep(10)

    #获取所有窗口句柄

    allhandles=driver.window_handles

    #在所有窗口中查找新开的窗口

    for handle in allhandles:

        if handle!=nowhandle:

            #这两步是在弹出窗口中进行的操作，证明我们确实进入了

            driver.switch_to_window(handle)

            print driver.page_source

        #返回到主窗口页面

           driver.switch_to_window(nowhandle)

finally:

    driver.quit()

添加一个阅读材料好了，写的挺好的

http://www.cnblogs.com/tobecrazy/p/4570494.html

selenium 速查手册 python版的更多相关文章

《zw版·Halcon-delphi系列原创教程》 zw版-Halcon常用函数Top100中文速查手册
<zw版·Halcon-delphi系列原创教程> zw版-Halcon常用函数Top100中文速查手册 Halcon函数库非常庞大,v11版有1900多个算子(函数). 这个Top版,对 ...
实用掌中宝--HTML&CSS常用标签速查手册 PDF扫描版
实用掌中宝--HTML&CSS常用标签速查手册内容推荐: 本书第一篇以语法和实例相结合的形式,详细讲解了HTML语言中各个元素及其属性的作用.语法和显示效果:第二篇从CSS基本概念开始,分别 ...
Pandas速查手册中文版
本文翻译自文章: Pandas Cheat Sheet - Python for Data Science ,同时添加了部分注解. 对于数据科学家,无论是数据分析还是数据挖掘来说,Pandas是一个非 ...
SSM 项目从搭建爬坑到 CentOS 服务器部署 - 速查手册
SSM 项目从搭建爬坑到 CentOS 服务器部署 - 速查手册提示: (1)CSDN 博客左边有操作工具条上有文章目录 (2)SSM 指 Spring,Spring MVC,MyBatis Mav ...
程序员 & 设计师都能用上的 75 份速查手册
分享75份开发人员和设计师会用到的速查手册,由 vikas 收集整理,包括:jQuery.HTML.HTML5.CSS.CSS3.JavaScript.Photoshop .git.Linux.Jav ...
三、Pandas速查手册中文版
本文翻译自文章:Pandas Cheat Sheet - Python for Data Science,同时添加了部分注解. 对于数据科学家,无论是数据分析还是数据挖掘来说,Pandas是一个非常重 ...
pandas速查手册(中文版)
本文翻译自文章:Pandas Cheat Sheet - Python for Data Science 对于数据科学家,无论是数据分析还是数据挖掘来说,Pandas是一个非常重要的Python包.它 ...
【转】Pandas速查手册中文版
本文翻译自文章:Pandas Cheat Sheet - Python for Data Science,同时添加了部分注解. 对于数据科学家,无论是数据分析还是数据挖掘来说,Pandas是一个非常重 ...
Docker常用命令速查手册（华贵铂金版）
原创声明:作者:Arnold.zhao 博客园地址:https://www.cnblogs.com/zh94 Docker常用命令速查手册搜索仓库镜像 docker search nginx 获取 ...

随机推荐

使用仓库管理器——Sonatype Nexus的九大理由
目前有很多组织使用了一些工具依赖于Maven仓库,但他们并没有采用一个仓库管理器,对于这一点我十分惊讶.可能没人提出来这一点,没人站出来告诉别人使用一个仓库管理器能带来什么好处.我经常能从很多不使用M ...
进程间的通讯(IPC)方式
内存映射为什么要进行进程间的通讯(IPC (Inter-process communication)) 数据传输:一个进程需要将它的数据发送给另一个进程,发送的数据量在一个字节到几M字节之间共享数据 ...
System.FormatException: Index (zero based) must be greater than or equal to zero and less than the size of the argument list
static void Main(string[] args) { StringBuilder sb = new StringBuilder(); string test = "124454 ...
(C#) Action, Func, Predicate 等泛型委托
(转载网络文章) (1). delegate delegate我们常用到的一种声明 Delegate至少0个参数,至多32个参数,可以无返回值,也可以指定返回值类型. 例:public del ...
输出一个对象的所有属性的值,可以不用反射机制，用JSON处理更方便
String r = ""; ObjectMapper mapper = new ObjectMapper(); r = mapper.writeValueAsString(cre ...
spring学习笔记2(转)
1.在Java开发领域,spring相对于EJB来说是一种轻量级的,非侵入性的Java开发框架,曾经有两本很畅销的书<Expert one-on-one J2EE Design and Deve ...
unity客户端与c++服务器之间的简单通讯_1
// 服务器 # pragma once using namespace std; # include <iostream> # include <string> # incl ...
[rm] Linux 防止"rm -rf /" 误删除
一.缘由: 最近看到这则新闻,很是悲伤,因为我最近也在用ansible:然而这一错误源自Ansible上糟糕的代码设计,这款Linux实用工具被用于在多台不同服务器上自动执行脚本. 开发者解释到,实际 ...
jstack使用－倒出线程堆栈
jstack用于打印出给定的java进程ID或core file或远程调试服务的Java堆栈信息,如果是在64位机器上,需要指定选项"-J-d64",Windows的jstack使 ...
OOP三个基本特征：封装、继承、多态
面向对象的三个基本特征是:封装.继承.多态. 封装封装最好理解了.封装是面向对象的特征之一,是对象和类概念的主要特性. 封装,也就是把客观事物封装成抽象的类,并且类可以把自己的数据和方法只让可信的类 ...

selenium 速查手册 python版

selenium 速查手册 python版的更多相关文章

随机推荐

热门专题