selenium WebDriver 截取网站的验证码

在做爬虫项目的时候，有时候会遇到验证码的问题，由于某些网站的验证码是动态生成的，即使是同一个链接，在不同的时间访问可能产生不同的验证码，

一刚开始的思路就是打开这个验证码的链接，然后通过java代码get请求保存验证码图片到本地，然后用打码工具解析验证码，将验证码自动输入验证框就

可以把验证码的问题解决了，但是问题来，每次的请求同一个地址，产生的验证码图片是不一样的，所以这种方法行不通。所以只能将图片先用selenium WebDriver

截取到本地，然后用打码工具解析ok ,自动填写验证，很好把验证码的问题解决了。

package com.entrym.main;

import java.awt.image.BufferedImage;

import java.io.File;

import java.io.IOException;

import java.util.ArrayList;

import java.util.Date;

import java.util.HashMap;

import java.util.List;

import java.util.Set;

import javax.imageio.ImageIO;

import org.apache.commons.io.FileUtils;

import org.apache.commons.lang3.StringUtils;

import org.json.JSONObject;

import org.jsoup.Jsoup;

import org.jsoup.nodes.Document;

import org.jsoup.nodes.Element;

import org.openqa.selenium.By;

import org.openqa.selenium.Cookie;

import org.openqa.selenium.OutputType;

import org.openqa.selenium.Point;

import org.openqa.selenium.TakesScreenshot;

import org.openqa.selenium.WebDriver;

import org.openqa.selenium.WebElement;

import org.openqa.selenium.chrome.ChromeDriver;

import org.openqa.selenium.support.ui.ExpectedCondition;

import org.openqa.selenium.support.ui.WebDriverWait;

import com.entrym.crawler.util.verifyCode.Captcha;

import com.entrym.crawler.util.verifyCode.DamaUtil;

import com.entrym.domain.SogouInfo;

import com.entrym.domain.Wxinfo;

import com.entrym.util.ConfigUtil;

import com.entrym.util.DateUtil;

import com.entrym.util.HttpUtils;

import com.google.gson.Gson;

import com.vdurmont.emoji.EmojiParser;

public class WebTest {

	private static final String GET_TITLE="/titles/getxiaoshuo";

        private static final String PATH=new File("config/config.properties").getAbsolutePath();

	private static final String CHROME_HOME=new File("config/chromedriver.exe").getAbsolutePath();

	private static final String CHROME_HOME_LINUX=new File("config/chromedriver").getAbsolutePath();

	private static final String BASEURL=ConfigUtil.reads(PATH, "baseurl");

	public static void main(String[] args) throws IOException {

			WebDriver driver=null;

//			System.setProperty("webdriver.gecko.driver", FIREFOX_HOME);

				System.out.println(PATH);

			String osname=System.getProperty("os.name").toLowerCase();

			if(osname.indexOf("linux")>=0){

				System.setProperty("webdriver.chrome.driver", CHROME_HOME_LINUX);

//				driver = new MarionetteDriver();

			}else{

				System.setProperty("webdriver.chrome.driver", CHROME_HOME);

//				driver = new MarionetteDriver();

			}

			driver=new ChromeDriver();

			driver.get("http://weixin.sogou.com/antispider/?from=%2fweixin%3Ftype%3d2%26query%3dz+%26ie%3dutf8%26s_from%3dinput%26_sug_%3dy%26_sug_type_%3d");

			WebElement ele = driver.findElement(By.id("seccodeImage"));

			// Get entire page screenshot

			File screenshot = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);

			BufferedImage  fullImg = ImageIO.read(screenshot);

			// Get the location of element on the page

			Point point = ele.getLocation();

			// Get width and height of the element

			int eleWidth = ele.getSize().getWidth();

			int eleHeight = ele.getSize().getHeight();

			// Crop the entire page screenshot to get only element screenshot

			BufferedImage eleScreenshot= fullImg.getSubimage(point.getX(), point.getY(),

			    eleWidth, eleHeight);

			ImageIO.write(eleScreenshot, "png", screenshot);

			// Copy the element screenshot to disk

			File screenshotLocation = new File("D:/captcha/test.png");

			FileUtils.copyFile(screenshot, screenshotLocation);

			WebElement classelement = driver.findElement(By.className("p2"));

			String errorText=classelement.getText();

			System.out.println("输出的内容是"+classelement.getText());

			if(errorText.indexOf("用户您好，您的访问过于频繁，为确认本次访问为正常用户行为")>=0){

				System.out.println("*********************");

				DamaUtil util=new DamaUtil();

		            System.out.println("===================");

		            String code="";           //验证码

					Captcha captcha=new Captcha();

					captcha.setFilePath("test.png");

					code = DamaUtil.getCaptchaResult(captcha);

					System.out.println("打码处理出来的验证码是"+code);

					WebElement elementsumbit = driver.findElement(By.id("seccodeInput"));

			        // 输入关键字

					elementsumbit.sendKeys(code);

					try {

						Thread.sleep(1000);

					} catch (InterruptedException e) {

						// TODO Auto-generated catch block

						e.printStackTrace();

					}

			        // 提交 input 所在的  form

					elementsumbit.submit();

					System.out.println("成功");

			}

		}

}

以上就代码，关键的代码在Stack Overflow得到的，不得不说谷歌还是很强大的

喜欢呼呼的文章的朋友，可以关注呼呼的个人公众号：

driver.get("http://www.google.com");

WebElement ele = driver.findElement(By.id("hplogo"));

// Get entire page screenshot

File screenshot = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);

BufferedImage  fullImg = ImageIO.read(screenshot);

// Get the location of element on the page

Point point = ele.getLocation();

// Get width and height of the element

int eleWidth = ele.getSize().getWidth();

int eleHeight = ele.getSize().getHeight();

// Crop the entire page screenshot to get only element screenshot

BufferedImage eleScreenshot= fullImg.getSubimage(point.getX(), point.getY(),

    eleWidth, eleHeight);

ImageIO.write(eleScreenshot, "png", screenshot);

// Copy the element screenshot to disk

File screenshotLocation = new File("C:\\images\\GoogleLogo_screenshot.png");

FileUtils.copyFile(screenshot, screenshotLocation);

以上就是关键的截取代码，在国外的链接是http://stackoverflow.com/questions/13832322/how-to-capture-the-screenshot-of-a-specific-element-rather-than-entire-page-usin
感兴趣的小伙伴可以研究一下

selenium WebDriver 截取网站的验证码的更多相关文章

使用 mitmdump 进行 selenium webDriver绕过网站反爬服务的方法 pdd某宝可用
安装: pip install mitmproxy 新建一个脚本脚本代码: from mitmproxy import ctx injected_javascript = ''' // over ...
selenium webdriver 相关网站
ITeye:http://shijincheng0223.iteye.com/blog/1481446 http://ztreeapi.iteye.com/blog/1750554 http://sm ...
Selenium WebDriver对cookie进行处理绕过登录验证码
现在几乎所有登录页面都会带一个验证码,做起自动化这块比较麻烦, 所以要绕过网站的验证码. 首先需要手动登录一次你的测试网站,去chrome的F12里获取这个网站的cookie信息,找到对应的保存登录信 ...
（java）selenium webdriver爬虫学习--爬取阿里指数网站的每个分类的top50 相关数据；
主题:java 爬虫--爬取'阿里指数'网站的每个分类的top50 相关数据: 网站网址为:http://index.1688.com/alizs/top.htm?curType=offer& ...
selenium webdriver (python)的基本用法一
阅在线 AIP 文档:http://selenium.googlecode.com/git/docs/api/py/index.html目录一.selenium+python 环境搭建........ ...
python利用selenium库识别点触验证码
利用selenium库和超级鹰识别点触验证码(学习于静谧大大的书,想自己整理一下思路) 一.超级鹰注册:超级鹰入口 1.首先注册一个超级鹰账号,然后在超级鹰免费测试地方可以关注公众号,领取1000积分 ...
一行js代码识别Selenium+Webdriver及其应对方案
有不少朋友在开发爬虫的过程中喜欢使用Selenium + Chromedriver,以为这样就能做到不被网站的反爬虫机制发现. 先不说淘宝这种基于用户行为的反爬虫策略,仅仅是一个普通的小网站,使用一行 ...
Selenium+Webdriver被检测识别出来的应对方案
在写爬虫,面对很多js 加载的页面,很多人束手无策,更多的人喜欢用Senlenium+ Webdriver,古语有云:道高一尺魔高一丈.已淘宝为首,众多网站都针对 Selenium的js监测机制, 比 ...
利用selenium库自动执行滑动验证码模拟登陆
破解流程 #1.输入账号.密码,然后点击登陆 #2.点击按钮,弹出没有缺口的图 #3.针对没有缺口的图片进行截图 #4.点击滑动按钮,弹出有缺口的图 #5.针对有缺口的图片进行截图 #6.对比两张图片 ...

随机推荐

warpAffine仿射变换
仿射变换,其实就是不同的坐标系的相互转换,用于图像的平移和旋转. 首先看一下官方的api描述. https://docs.opencv.org/2.4/modules/imgproc/doc/geom ...
SpringBoot电商项目实战 — Redis实现分布式锁
最近有小伙伴发消息说,在Springboot系列文第二篇,zookeeper是不是漏掉了?关于这个问题,其实我在写第二篇的时候已经考虑过,但基于本次系列文章是实战练习,在项目里你能看到Zookeepe ...
基于mybatisPlus的特殊字符校验
要实现以下代码前提是导入Mybatis-plus的jar: * @author WENGKAIBO505 */ @Target({ElementType.FIELD, ElementType.METH ...
HBase 系列（十一）—— Spring/Spring Boot + Mybatis + Phoenix 整合
一.前言使用 Spring+Mybatis 操作 Phoenix 和操作其他的关系型数据库(如 Mysql,Oracle)在配置上是基本相同的,下面会分别给出 Spring/Spring Boot ...
python 36 进程池、线程池
目录 1. 死锁与递归锁 2. 信号量Semaphor 3. GIL全局解释器锁:(Cpython) 4. IO.计算密集型对比 4.1 计算密集型: 4.2 IO密集型 5. GIL与Lock锁的区 ...
python 实现多个线程间消息队列传递，一个简单的列子
#-*-coding:utf8-*-"""Producer and consumer models: 1. There are many producers and co ...
安排：《蚂蚁花呗1234面：Redis+分布式架构+MySQL+linux+红黑树》
前言: 大厂面试机会难得,为了提高面试通关率,建议朋友们在面试前先复盘自己的知识栈,依据掌握程度划分重要.优先级,系统地去学习!如果不准备充分就去参加面试,既会失去进入大厂的机会,更是对自己的不负责. ...
设计模式（C#）——05适配器模式
推荐阅读: 我的CSDN 我的博客园 QQ群:704621321 自然界有一条规则--适者生存.意思是生物要使用自然界的变化:在程序界中则需要新环境调用现存对象.那么,如何在新环境中 ...
六大设计原则（C#）
为什么要有设计原则,我觉得一张图片就可以解释这一切一.单一职责原则(SRP) 对于一个类而言,应该只有一个发生变化的原因.(单一职责不仅仅是指类) 如果一个模块需要修改,它肯定是有原因的,除此原因之 ...
过滤掉Abp框架不需要记录的日志
该文章是系列文章基于.NetCore和ABP框架如何让Windows服务执行Quartz定时作业的其中一篇. 问题 ABP.WindowsService/Demo.MyJob/4.0.0该项目不仅 ...

selenium WebDriver 截取网站的验证码

selenium WebDriver 截取网站的验证码的更多相关文章

随机推荐

热门专题