找了几个,只有这个靠谱,用的是httpclient4,另外还需要commons-lang和jsoup包

http://jsoup.org/

http://www.oschina.net/code/snippet_128625_12592?p=2

————————————————————————————————————————————————————————————

如题:
支用用jsoup解析页面非常方便,当时jsoup做登录就比较麻烦,反正我不知道怎么做。
HttpClient做登录比较方便因此用HttpClient摸得登录获取html内容用jsoup做解析是一个非常完美的组合
替换自己的163邮箱看一下吧。

HttpClientHelper 封装

import java.io.IOException;
import java.security.cert.CertificateException;
import java.security.cert.X509Certificate;
import java.util.ArrayList;
import java.util.List;
import java.util.Map; import javax.net.ssl.SSLContext;
import javax.net.ssl.TrustManager;
import javax.net.ssl.X509TrustManager; import org.apache.commons.lang.StringUtils;
import org.apache.http.Header;
import org.apache.http.HttpResponse;
import org.apache.http.NameValuePair;
import org.apache.http.client.HttpClient;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.conn.ClientConnectionManager;
import org.apache.http.conn.scheme.Scheme;
import org.apache.http.conn.scheme.SchemeRegistry;
import org.apache.http.conn.ssl.SSLSocketFactory;
import org.apache.http.cookie.Cookie;
import org.apache.http.impl.client.BasicCookieStore;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.message.BasicHeader;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.protocol.BasicHttpContext;
import org.apache.http.protocol.HttpContext;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
/**
 * HttpClient 封装
 * 
 * @author bangis.wangdf
 */
public class HttpClientHelper {     private static Logger    LOG              = LoggerFactory.getLogger(HttpClientHelper.class);
    private HttpClient       httpclient       = new DefaultHttpClient();
    private HttpContext      localContext     = new BasicHttpContext();
    private BasicCookieStore basicCookieStore = new BasicCookieStore();                          // cookie存储用来完成登录后记录相关信息     private int              TIME_OUT         = 3;                                              // 连接超时时间     public HttpClientHelper() {
        instance();
    }     /**
     * 启用cookie存储
     */
    private void instance() {
        httpclient.getParams().setIntParameter("http.socket.timeout", TIME_OUT * 1000);
        localContext.setAttribute("http.cookie-store", basicCookieStore);// Cookie存储
    }     /**
     * @param ssl boolean=true 支持https网址,false同默认构造
     */
    public HttpClientHelper(boolean ssl) {
        instance();
        if (ssl) {
            try {
                X509TrustManager tm = new X509TrustManager() {                     public void checkClientTrusted(X509Certificate[] xcs, String string) throws CertificateException {
                    }                     public void checkServerTrusted(X509Certificate[] xcs, String string) throws CertificateException {
                    }                     public X509Certificate[] getAcceptedIssuers() {
                        return null;
                    }
                };
                SSLContext ctx = SSLContext.getInstance("TLS");
                ctx.init(null, new TrustManager[] { tm }, null);
                SSLSocketFactory ssf = new SSLSocketFactory(ctx);
                ClientConnectionManager ccm = httpclient.getConnectionManager();
                SchemeRegistry sr = ccm.getSchemeRegistry();
                sr.register(new Scheme("https", ssf, 443));
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }     /**
     * @param url
     * @param headers 指定headers
     * @return
     */
    public HttpResult get(String url, Header... headers) {
        HttpResponse response;
        HttpGet httpget = new HttpGet(url);
        if (headers != null) {
            for (Header h : headers) {
                httpget.addHeader(h);
            }
        } else {// 如不指定则使用默认
            Header header = new BasicHeader(
                                            "User-Agent",
                                            "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1;  .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2)");
            httpget.addHeader(header);
        }
        HttpResult httpResult = HttpResult.empty();
        try {
            response = httpclient.execute(httpget, localContext);
            httpResult = new HttpResult(localContext, response);
        } catch (IOException e) {
            LOG.error(" get ", e);
            httpget.abort();
        }
        return httpResult;
    }     public HttpResult post(String url, Map<String, String> data, Header... headers) {
        HttpResponse response;
        HttpPost httppost = new HttpPost(url);
        String contentType = null;
        if (headers != null) {
            int size = headers.length;
            for (int i = 0; i < size; ++i) {
                Header h = (Header) headers[i];
                if (!(h.getName().startsWith("$x-param"))) {
                    httppost.addHeader(h);
                }
                if ("Content-Type".equalsIgnoreCase(h.getName())) {
                    contentType = h.getValue();
                }
            }         }
        if (contentType != null) {
            httppost.setHeader("Content-Type", contentType);
        } else if (data != null) {
            httppost.setHeader("Content-Type", "application/x-www-form-urlencoded");
        }         List<NameValuePair> formParams = new ArrayList<NameValuePair>();
        for (String key : data.keySet()) {
            formParams.add(new BasicNameValuePair(key, (String) data.get(key)));
        }
        HttpResult httpResult = HttpResult.empty();
        try {
            UrlEncodedFormEntity entity = new UrlEncodedFormEntity(formParams, "UTF-8");
            httppost.setEntity(entity);
            response = httpclient.execute(httppost, localContext);
            httpResult = new HttpResult(localContext, response);
        } catch (IOException e) {
            LOG.error(" post ", e);
            httppost.abort();
        } finally {
        }
        return httpResult;
    }     public String getCookie(String name, String... domain) {
        String dm = "";
        if (domain != null && domain.length >= 1) {
            dm = domain[0];
        }
        for (Cookie c : basicCookieStore.getCookies()) {
            if (StringUtils.equals(name, c.getName()) && StringUtils.equals(dm, c.getDomain())) {
                return c.getValue();
            }
        }
        return null;
    }     public void pringCookieAll() {
        for (Cookie c : basicCookieStore.getCookies()) {
            System.out.println(c);
        }
    }
}

对HttpClient返回的结果进一步封装

import java.io.IOException;
import java.security.cert.CertificateException;
import java.security.cert.X509Certificate;
import java.util.ArrayList;
import java.util.List;
import java.util.Map; import javax.net.ssl.SSLContext;
import javax.net.ssl.TrustManager;
import javax.net.ssl.X509TrustManager; import org.apache.commons.lang.StringUtils;
import org.apache.http.Header;
import org.apache.http.HttpResponse;
import org.apache.http.NameValuePair;
import org.apache.http.client.HttpClient;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.conn.ClientConnectionManager;
import org.apache.http.conn.scheme.Scheme;
import org.apache.http.conn.scheme.SchemeRegistry;
import org.apache.http.conn.ssl.SSLSocketFactory;
import org.apache.http.cookie.Cookie;
import org.apache.http.impl.client.BasicCookieStore;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.message.BasicHeader;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.protocol.BasicHttpContext;
import org.apache.http.protocol.HttpContext;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
/**
 * 对HttpClient返回的结果进一步封装
 * @author bangis.wangdf
 *
 */
public class HttpResult {
    
    private static Logger LOG = LoggerFactory.getLogger(HttpResult.class);
    
    private static Pattern headerCharsetPattern = Pattern.compile(
            "charset=((gb2312)|(gbk)|(utf-8))", 2);
    private static Pattern pattern = Pattern
            .compile(
                    "<meta[^>]*content=(['\"])?[^>]*charset=((gb2312)|(gbk)|(utf-8))\\1[^>]*>",
                    2);
    private String headerCharset;
    private String headerContentType;
    private String headerContentEncoding;
    private List<Header> headers;
    private String metaCharset;
    private byte[] response;
    private String responseUrl;
    private int statuCode = -1;
    private static final int BUFFER_SIZE = 4096;     public static HttpResult empty() {
        return new HttpResult();
    }     public String getHeaderCharset() {
        return this.headerCharset;
    }     public String getHeaderContentType() {
        return this.headerContentType;
    }     public final List<Header> getHeaders() {
        return this.headers;
    }     public String getHtml() {
        try {
            return getText();
        } catch (UnsupportedEncodingException e) {
            LOG.error("[AGDS-SPIDER]" + e.getMessage(), e);
        }
        return "";
    }
    
    public String getHtml(String encoding) {
        try {
            return getText(encoding);
        } catch (UnsupportedEncodingException e) {
            LOG.error("[AGDS-SPIDER]" + e.getMessage(), e);
        }
        return "";
    }     public String getMetaCharset() {
        return this.metaCharset;
    }     public byte[] getResponse() {
        return Arrays.copyOf(this.response, this.response.length);
    }     public String getResponseUrl() {
        return this.responseUrl;
    }     public int getStatuCode() {
        return this.statuCode;
    }     public String getText() throws UnsupportedEncodingException {
        return getText("");
    }     public String getText(String encoding) throws UnsupportedEncodingException {
        if (this.response == null){
            return "";
        }
        String encodingStr = encoding;
        if (StringUtils.isBlank(encoding)){
            encodingStr = this.metaCharset;
        }         if (StringUtils.isBlank(encoding)){
            encodingStr = this.headerCharset;
        }         if (StringUtils.isBlank(encoding)){
            encodingStr = "UTF-8";
        }         return new String(this.response, encodingStr);
    }     private String getCharsetFromMeta() {
        StringBuilder builder = new StringBuilder();
        String charset = "";
        for (int i = 0; (i < this.response.length) && ("".equals(charset)); ++i) {
            char c = (char) this.response[i];
            switch (c) {
            case '<':
                builder.delete(0, builder.length());
                builder.append(c);
                break;
            case '>':
                if (builder.length() > 0){
                    builder.append(c);
                }
                String meta = builder.toString();                 if (meta.toLowerCase().startsWith("<meta")){
                    charset = getCharsetFromMeta(meta);
                }
                break;
            case '=':
            default:
                if (builder.length() > 0){
                    builder.append(c);
                }
            }         }         return charset;
    }     private String getCharsetFromMeta(String meta) {
        if (StringUtils.isBlank(meta)){
            return "";
        }
        Matcher m = pattern.matcher(meta);
        if (m.find()){
            return m.group(2);
        }
        return "";
    }     private void getHttpHeaders(HttpResponse httpResponse) {
        String headerName = "";
        String headerValue = "";
        int index = -1;         Header[] rspHeaders = httpResponse.getAllHeaders();
        for (int i = 0; i < rspHeaders.length; ++i) {
            Header header = rspHeaders[i];
            this.headers.add(header);             headerName = header.getName();
            if ("Content-Type".equalsIgnoreCase(headerName)) {
                headerValue = header.getValue();
                index = headerValue.indexOf(';');
                if (index > 0){
                    this.headerContentType = headerValue.substring(0, index);
                }
                Matcher m = headerCharsetPattern.matcher(headerValue);
                if (m.find()){
                    this.headerCharset = m.group(1);
                }
            }             if ("Content-Encoding".equalsIgnoreCase(headerName)){
                this.headerContentEncoding = header.getValue();
            }
        }
    }     private void getResponseUrl(HttpContext httpContext) {
        HttpHost target = (HttpHost) httpContext
                .getAttribute("http.target_host");         HttpUriRequest req = (HttpUriRequest) httpContext
                .getAttribute("http.request");         this.responseUrl = target.toString() + req.getURI().toString();
    }     public HttpResult(HttpContext httpContext, HttpResponse httpResponse) {
        this.headers = new ArrayList<Header>();         this.statuCode = httpResponse.getStatusLine().getStatusCode();         if (httpContext != null) {
            getResponseUrl(httpContext);
        }         if (httpResponse != null) {
            getHttpHeaders(httpResponse);
            try {
                if (("gzip".equalsIgnoreCase(this.headerContentEncoding))
                        || ("deflate".equalsIgnoreCase(this.headerContentEncoding))) {
                    GZIPInputStream is = new GZIPInputStream(httpResponse.getEntity().getContent());
                    ByteArrayOutputStream os = new ByteArrayOutputStream();
                    byte[] buffer = new byte[BUFFER_SIZE];
                    int count = 0;
                    while ((count = is.read(buffer)) > 0){
                        os.write(buffer, 0, count);
                    }
                    this.response = os.toByteArray();
                    os.close();
                    is.close();
                }else{
                    this.response = EntityUtils.toByteArray(httpResponse.getEntity());
                }
            } catch (Exception e) {
                LOG.error("[AGDS-SPIDER]" + e.getMessage(), e);
            }
            if (this.response != null){
                this.metaCharset = getCharsetFromMeta();
            }
        }
    }     private HttpResult() {
    }
}

Mail163Test

import java.text.MessageFormat;
import java.util.HashMap;
import java.util.Map; import org.apache.http.Header;
import org.apache.http.message.BasicHeader;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document; public class Mail163Test {
    public static final String SESSION_INIT = "http://mail.163.com";
    public static final String LOGIN_URL = "https://ssl.mail.163.com/entry/coremail/fcg/ntesdoor2?df=webmail163&from=web&funcid=loginone&iframe=1&language=-1&net=t&passtype=1&product=mail163&race=-2_-2_-2_db&style=-1&uid=";
    public static final String MAIL_LIST_URL = "http://twebmail.mail.163.com/js4/s?sid={0}&func=mbox:listMessages";
    /**
     * @param args
     */
    public static void main(String[] args) {
        HttpClientHelper hc = new HttpClientHelper(true);
        HttpResult lr = hc.get(SESSION_INIT);// 目的是得到 csrfToken 类似
        // 拼装登录信息
        Map<String, String> data = new HashMap<String, String>();
        data.put("url2", "http://mail.163.com/errorpage/err_163.htm");
        data.put("savelogin", "0");
        data.put("username", "bangis");
        data.put("password", "*******");
        lr = hc.post(LOGIN_URL, data,setHeader());// 执行登录
        Document doc = Jsoup.parse(lr.getHtml());
        String sessionId=doc.select("script").html().split("=")[2];
        sessionId = sessionId.substring(0,sessionId.length()-2);
        data.clear();
        data.put("var", "<?xml version=\"1.0\"?><object><int name=\"fid\">1</int><boolean name=\"skipLockedFolders\">false</boolean><string name=\"order\">date</string><boolean name=\"desc\">true</boolean><int name=\"start\">0</int><int name=\"limit\">50</int><boolean name=\"topFirst\">true</boolean><boolean name=\"returnTotal\">true</boolean><boolean name=\"returnTag\">true</boolean></object>");
        lr = hc.post(MessageFormat.format(MAIL_LIST_URL, sessionId),
                data,setQueryHeader(sessionId));// 执行登录
        System.out.println(lr.getHtml());
    }
    
    public static Header[] setHeader() {
        Header[] result = { 
                new BasicHeader("User-Agent","Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"), 
                new BasicHeader("Accept-Encoding","gzip, deflate"),
                new BasicHeader("Accept-Language","zh-CN"),
                new BasicHeader("Cache-Control","no-cache"),
                new BasicHeader("Connection","Keep-Alive"),
                new BasicHeader("Content-Type","application/x-www-form-urlencoded"),
                new BasicHeader("Host","ssl.mail.163.com"),
                new BasicHeader("Referer","http://mail.163.com/"),
                new BasicHeader("Accept","text/html, application/xhtml+xml, */*")
                
        };
        return result;
    }
    public static Header[] setQueryHeader(String sessionId) {
        Header[] result = { 
                new BasicHeader("User-Agent","Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"), 
                new BasicHeader("Accept-Encoding","gzip, deflate"),
                new BasicHeader("Accept-Language","zh-CN"),
                new BasicHeader("Cache-Control","no-cache"),
                new BasicHeader("Connection","Keep-Alive"),
                new BasicHeader("Content-Type","application/x-www-form-urlencoded"),
                new BasicHeader("Host","twebmail.mail.163.com"),
                new BasicHeader("Referer","http://twebmail.mail.163.com/js4/index.jsp?sid="+sessionId),
                new BasicHeader("Accept","text/javascript")
                
        };
        return result;
    }
}

HttpClient+jsoup登录+解析 163邮箱的更多相关文章

  1. python登录网易163邮箱,爬取邮件

    from common import MyRequests,LoggerUntil,handle_exception myRequests.update_headers({ 'Accept':'tex ...

  2. MY_使用selenium自动登录126/163邮箱并发送邮件

    转自:https://www.cnblogs.com/yin-tao/p/7244082.html 我使用的是python2.7.13+selenium ps:几天之前,我曾多次尝试写这段代码,但是在 ...

  3. selenium数据驱动模式实现163邮箱的登录及添加联系人自动化操作

    项目结构如下: 要求python3.0 selenium3.0 下面是代码: appModubles:addContactPersonActtion.py和LoginAction.py addCont ...

  4. [Python爬虫] Selenium实现自动登录163邮箱和Locating Elements介绍

    前三篇文章介绍了安装过程和通过Selenium实现访问Firefox浏览器并自动搜索"Eastmount"关键字及截图的功能.而这篇文章主要简单介绍如何实现自动登录163邮箱,同时 ...

  5. 自动化测试基础篇--Selenium简单的163邮箱登录实例

    摘自https://www.cnblogs.com/sanzangTst/p/7472556.html 前面几篇内容一直讲解Selenium Python的基本使用方法.学习了什么是selenium: ...

  6. 【webdriver自动化】使用unittest实现自动登录163邮箱然后新建一个联系人

    #练习:登录163邮箱然后新建一个联系人 import unittest import time from selenium import webdriver from selenium.webdri ...

  7. python selenium模拟登录163邮箱和QQ空间

    最近在看python网络爬虫,于是我想自己写一个邮箱和QQ空间的自动登录的小程序, 下面以登录163邮箱和QQ空间和为例: 了解到在Web应用中经常会遇到frame/iframe 表单嵌套页面的应用, ...

  8. 5、Selenium+Python自动登录163邮箱发送邮件

    1.Selenium实现自动化,需要定位元素,以下查看163邮箱的登录元素 (1)登录(定位到登录框,登录框是一个iframe,如果没有定位到iframe,是无法定位到账号框与密码框) 定位到邮箱框( ...

  9. Python selenium登录163邮箱示例

    思路:使用python自带的unittest单元测试框架测试163邮箱登录成功的case import unittestfrom selenium import webdriverimport tim ...

随机推荐

  1. 2017.12.12 架构探险-第一章-从一个简单的web应用开始

    参考来自:<架构探险>黄勇 著 1 使用IDEA搭建MAVEN项目 1.1 搭建java项目 (1)创建java项目 为了整个书籍的项目,我创建了一个工程,在这个工程里创建了每个章节的mo ...

  2. 转: iOS崩溃堆栈符号表使用与用途

    转:http://bugly.qq.com/blog/?p=119 iOS崩溃堆栈符号化,定位问题分分钟搞定! 2015.3.16 腾讯Bugly 微信分享   最近一段时间,在跟开发者沟通过程中,萝 ...

  3. C# this.Hide()

    C# this.Hide() 第一次用的时候是在_Load函数里: BookSystem bs = new BookSystem();             bs.ShowDialog();     ...

  4. CSS学习(九)-CSS背景

    一.理论: 1.background-break  a.bounding-box 背景图像在整个内联元素中进行平铺 b.each-box 背景图像在行内中进行平铺 c.continuous 下一行的背 ...

  5. Reading assignments in science

    First, get a perspective: Review the assignment in the syllabus and any handouts (1-2 minutes)Maybe ...

  6. Openerp约束句型

    内容摘自:http://blog.csdn.net/sz_bdqn/article/details/8785483 _constraints _constraints可以灵活定义OpenERP对象的约 ...

  7. MySQL中 order by 与 limit 的执行顺序以及使用实例

    在 MySQL 执行查询的时候,我们可能既要对结果集进行排序又要限制行数,那么此时 order by 与 limit 的执行顺序是怎么样的呢? order by与limit的执行顺序是:先执行orde ...

  8. .Net操作Excel,Work等几种解决方案

    (一)传统操作Excel遇到的问题: 1.如果是.NET[使用office组件Microsoft.Iffice.interop.Excel的话],需要在服务器端装Office,且及时更新它,以防漏洞, ...

  9. ConfigurationManager读取dll的配置文件

    ConfigurationManager读取dll的配置文件 最近一个项目,需要发布dll给第三方使用,其中需要一些配置参数. 我们知道.NET的exe工程是自带的App.config文件的,编译之后 ...

  10. spring Di依赖注入

    依赖注入有两种方式 通过 get   set 方法 Person.java package cn.itcast.spring.sh.di.set; import java.util.List; imp ...