笔记-selenium+chrome headless

1. selenium+chrome headless

phantomjs与selenium分手了，建议使用其它无头浏览器。

chrome也提供了无头浏览器，找到对应版本搭建测试环境。

1.1. 常规使用

先上代码，下面是常用调用方式。

from selenium.webdriver.chrome.options import Options

url = 'https://www.guazi.com/bj/buy/'

urls = ['https://www.taobao.com/','https://www.tmall.com/','https://www.csdn.net/']

time1 = time.time()

try:

cookie_t = {}

chrome_option = Options()

chrome_option.add_argument('--headless')

#chrome_option.add_argument('--disable-gpu')

browser = webdriver.Chrome(chrome_options=chrome_option)

browser.get(url)

cookie_t['antipas'] = browser.get_cookie('antipas')['value']

print(cookie_t)

for _ in urls:

browser.get(_)

time.sleep(3)

with open('xxx.txt','a+',encoding='utf-8') as fi:

fi.write(browser.page_source)

browser.close()

except:

print('error')

finally:

browser.quit()

time2 = time.time()

print(time2-time1)

爬虫的代码有一点需要注意，需要操作事件的时候最好不要直接用相应的方法，比如click。最好嵌入js脚本的方式进行调用。因为爬虫的代码执行速度很快，前端元素结构往往反应不过来，从而找出元素不可见或者不存在的错误。

其它常用设置项：

# 设置代理

chromeOptions.add_argument("--proxy-server=http://202.20.16.82:10152")

# 一定要注意，=两边不能有空格，不能是这样--proxy-server = http://202.20.16.82:10152

browser = webdriver.Chrome(chrome_options = chromeOptions)

1.2. 更多设置及操作项

1.2.1. 对于浏览器窗口的操作

在浏览器中有些操作是使用系统原生的确认框，这时就无法通过定位元素的方式来操作我们需要的步骤。这种情况就要去操作浏览器的窗口来实现。

1.弹出窗口为Confirm类型

选择确认：

Alert al = driver.switchTo().alert();

al.accept();

选择取消：

Alert al = driver.switchTo().alert();

al.dismiss();

2.弹出窗口为Alert类型

Alert al = driver.switchTo().alert();

al.accept();

3.放大浏览器窗口

driver.manage().window().maximize();

4.关闭浏览器窗口

driver.quit();

driver.close();

5.刷新/前进/后退浏览器

driver.navigate().refresh();

driver.navigate().forward();

driver.navigate().back();

quit和close的区别在于，quit关闭整个浏览器的窗口；close关闭浏览器标签页。

1.2.2. 程序等待方式

在使用selenium的过程中，等待web加载时，通常要等待下一个元素出现再进行操作，这个过程中需要用到等待。selenium中有3种等待：webDriverWait()、implicitly_wait()、sleep().

1）sleep()：强制等待，设置固定的休眠时间。任何情况下都等待设置的时间。

//引入前导入相应的包，单位为毫秒；

sleep(5);

2）implicitly_wait()：隐式等待，等待一个元素被发现、命令完成，超出了设置的时间则跑出异常；

//设置脚本在查找元素时的最大等待时间

WebDriver driver = new ChromeDriver();

driver.manage().timeouts().implicitlyWait(15, TimeUnit.SECONDS);

3）webDriverWait()：显示等待，明确要等待的元素在指定时间之内没找到,那么就抛出Exception.

//设置等待的时长，最长10S

WebDriverWait wait = new WebDriverWait(driver, 10); wait.until(ExpectedConditions.presenceOfElementLocated(By.xpath("//div[@id='appContentContainer']/div/div/div[1]/div[2]/div/div/button")));