onWebView检查网页中文

问题：要检查网页中的一段文本：

开始我是这样写的：

private final static String SPECIFIED_TEXT = "这个是一段中文";

onWebView().check(webContent(containingTextInNode(SPECIFIED_TEXT )));

然后直接报错了

从adb logcat看到的结果是网页中文显示为乱码，尝试输出了一下每个中文的长度都是3；但是可以看到网页结构和数据

可以看到数据文本数据是在<p></p> <h2></h2> 里面
不死心啊：

把检查代码全部从库里面拷贝出来

改成

onWebView().check(userWebContent(containingTextInNode(SPECIFIED_TEXT , "p")));

/**
 * 为了把网页输出出来
 * @param xml
 */
public static void logall(String xml) {
    if (xml.length() > 4000) {
        for (int i = 0; i < xml.length(); i += 4000) {
            if (i + 4000 < xml.length())
                Log.i(TAG, xml.substring(i, i + 4000));
            else
                Log.i(TAG, xml.substring(i, xml.length()));
        }
    } else
        Log.i(TAG, xml);
}

/**
     * A WebAssertion which asserts that the document is matched by th provided matcher.

     */

    public static WebAssertion<Document> userWebContent(final Matcher<Document> domMatcher) {

        checkNotNull(domMatcher);

        return webMatches(transform(script("return document.documentElement.outerHTML;"),

                new TransformingAtom.Transformer<Evaluation, Document>() {

                    @Override

                    public Document apply(Evaluation eval) {

                        if (eval.getValue() instanceof String) {

                            try {

//                                Logall( "eval.getValue() " + (String)eval.getValue()); //这个地方能完整输出网页数据-不乱码的

//                                return TagSoupDocumentParser.newInstance().parse((String) eval.getValue()); //这个方法不能显示中文

                                org.jsoup.helper.W3CDom w3cDom = new W3CDom();

                                org.jsoup.nodes.Document doc = Jsoup.parseBodyFragment((String) eval.getValue()); //org.jsoup.nodes.Document无法转换为org.w3c.dom.Document

                                return w3cDom.fromJsoup(doc);

                            } catch (Exception se) {

                                throw new RuntimeException("Parse failed: " + eval.getValue(), se);

                            }

                        }

                        throw new RuntimeException("Value should have been a string: " + eval);

                    }

                }), domMatcher,

                new WebViewAssertions.ResultDescriber<Document>() {

                    @Override

                    public String apply(Document document) {

                        try {

                            DOMSource docSource = new DOMSource(document);

                            Transformer tf = TransformerFactory.newInstance().newTransformer();

                            StringWriter writer = new StringWriter();

                            StreamResult streamer = new StreamResult(writer);

                            tf.transform(docSource, streamer);

                            return writer.toString();

                        } catch (TransformerException e) {

                            return "Could not transform!!!" + e;

                        }

                    }

                });

    }

/**

 * Returns a matcher that matches Documents that have a body containing the given test.

 */

public static Matcher<Document> containingTextInNode(String text, final String nodeNme) {

    checkNotNull(text);

    return withNodeName(withTextContent(containsString(text)), nodeNme);

}

    /**

     * Returns a matcher that matches {@link Document}s with body that matches the given matcher.

     */

    public static Matcher<Document> withNodeName(final Matcher<Element> bodyMatcher, final String nodeNme) {

        checkNotNull(bodyMatcher);

        return new TypeSafeMatcher<Document>() {

            @Override

            public void describeTo(Description description) {

                description.appendText("with NodeName: ");

                bodyMatcher.describeTo(description);

            }

            @Override

            public boolean matchesSafely(Document document) {

                NodeList nodeList = document.getElementsByTagName(nodeNme);

                if (nodeList.getLength() == 0) {

                    return false;

                }

//                showNode(nodeList, "");

                for (int i = 0; i < nodeList.getLength(); i++) {

                    if (bodyMatcher.matches(nodeList.item(i))) {

                        return true;

                    }

                }

                return false;

            }

        };

    }

/**

 * 将节点集放入已排序的集合中时，W3C 将其称为 NodeList；可以按从零开始的索引检索数据。

 *

 * @param nodeList

 * @param path

 */

public static void showNode(NodeList nodeList, String path) {

    for (int i = 0; i < nodeList.getLength(); i++) {

        Node mobilePhone = nodeList.item(i);

        int destination = mobilePhone.getTextContent().length();

        NodeList mobileNodeList = mobilePhone.getChildNodes();

        if (mobileNodeList.getLength() > 0) {

            showNode(mobileNodeList, path + "-" + mobilePhone.getNodeName());

        } else {

            Log.i(TAG, path + "-" + mobilePhone.getNodeName() + ":" + destination + " " + mobilePhone.getTextContent()); //无子节点了就显示

        }

    }

}

//上面我们用了jsoup库，gradle里面增加库依赖 
//还要注意Document转换

dependencies {

    compile 'org.jsoup:jsoup:1.9.2'
　　 androidTestCompile 'org.jsoup:jsoup:1.9.2'  //测试用这个

}

至此可以顺利检查到网页中的中文啦，代码比较乱，将就着先用吧

onWebView检查网页中文的更多相关文章

解决Ubuntu下Chrome浏览器网页中文字体混乱
在Ubuntu下使用Chrome浏览器时碰到了网页中文字体混乱的现象: 黑体和楷体混杂,看起来非常不美观. 这是由于许多网页并没有指定字体,然后浏览器将调用系统默认字体配置. 首先,安装文泉驿字体: ...
【转载】 IE/Firefox每次刷新时自动检查网页更新，无需手动清空缓存的设置方法
[参考了别人的文章]我们做技术,经常在写页面的时候需要多次刷新测试,可是浏览器都有自己的缓存机制,一般CSS和图片都会被缓存在本地,这样我们修改的CSS就看不到效果了,每次都去清空缓存,再刷新看效 ...
IE/Firefox每次刷新时自动检查网页更新，无需手动清空缓存的设置方法
浏览器都有自己的缓存机制,一般CSS和图片都会被缓存在本地,这样我们修改的CSS就看不到效果了,每次都去清空缓存,再刷新看效果,这样操作太麻烦了.在IE下我们可以直接去修改internet选项/ ...
使用notepad++学习python爬虫，print网页中文乱码问题
今天学习使用python爬虫的时候发现爬到的网页中文会乱码,一直网上搜索解决办法,一个一个试验过去,发现还是乱码,然后我就开始使用其它方法测试,用python自带的编辑器打开是正常的,发现是notep ...
爬虫 Http请求,urllib2获取数据,第三方库requests获取数据,BeautifulSoup处理数据,使用Chrome浏览器开发者工具显示检查网页源代码,json模块的dumps，loads，dump，load方法介绍
爬虫 Http请求,urllib2获取数据,第三方库requests获取数据,BeautifulSoup处理数据,使用Chrome浏览器开发者工具显示检查网页源代码,json模块的dumps,load ...
[Python] - 使用chardet检查网页编码格式时发现的问题
最近在使用chardet检查网页编码格式时发现如下问题: 用urllib打开网页再检查编码格式和用urllib2打开网页检查编码格式结果不一样,所以urllib2打开可能导致问题,需要关注. 查看了相 ...
node爬虫之gbk网页中文乱码解决方案
之前在用 node 做爬虫时碰到的中文乱码问题一直没有解决,今天整理下备忘.(PS:网上一些解决方案都已经不行了) 中文乱码具体是指用 node 请求 gbk 编码的网页,无法正确获取网页中的中文(需 ...
mac下网页中文字体优化
最近某人吐槽某门户网站在mac下chrome字体超丑,然后发现虽然现在mac用户越来越多,但是大家依然无视mac下的字体差异,于是研究了下mac下网页中的中文字体,和大家分享. 看了一遍国内各大门户和 ...
Font-Spider 一个神奇的网页中文字体工具，就是这么任性
文章摘要: 1>> font-spider 字体神奇由于活动项目推广的需要,页面需要用到一些漂亮好看的字体,example : 邯郸-韩鹏毛遂体.ttf. 方正喵呜.ttf 我看 ...

随机推荐

java课后作业2017.10.20
动手动脑1: public class Test{ public static void main(String args[]) { Foo obj1=new Foo(); }}class Foo{ ...
RabbitMQ-Java客户端API指南-下
RabbitMQ-Java客户端API指南-下使用主机列表可以将Address数组传递给newConnection().的地址是简单地在一个方便的类com.rabbitmq.client包与主机 ...
nio的reactor模式
转自:http://blog.csdn.net/it_man/article/details/38417761 线程状态转换图就是非阻塞IO 采用多路分发方式举个例子吧,你服务器做一个聊天室,按照以 ...
POJ 1523 SPF 求割点的好(板子)题!
题意: 给个无向图,问有多少个割点,对于每个割点求删除这个点之后会产生多少新的点双联通分量题还是很果的怎么求割点请参考tarjan无向图关于能产生几个新的双联通分量,对于每个节点u来说,我们判断 ...
洛谷 P3349 [ZJOI2016]小星星解题报告
P3349 [ZJOI2016]小星星题目描述小\(Y\)是一个心灵手巧的女孩子,她喜欢手工制作一些小饰品.她有\(n\)颗小星星,用\(m\)条彩色的细线串了起来,每条细线连着两颗小星星. 有一 ...
js实时监听input中值得变化
<!DOCTYPE html> <html> <head> <title>zepto</title> <meta name=" ...
vue-cli脚手架的.babelrc文件
虽然es6还没被浏览器全部支持,但是使用es6是大势所趋,所以babel应运而生将es6代码转换成浏览器能够识别的代码什么是.babelrc文件呢? 熟悉linux的同学一定知道,rc结尾的文件通常 ...
Java并发编程--CyclicBarrier
概述 CyclicBarrier是一个同步工具类,它允许一组线程互相等待,直到到达某个公共屏障点.与CountDownLatch不同的是该barrier在释放等待线程后可以重用,所以称它为循环(Cyc ...
汕头市队赛 SRM14 T1 计算几何瞎暴力
计算几何瞎暴力 (easy.pas/c/cpp) 128MB 1s 在平面上,给定起点和终点,有一面墙(看作线段)不能穿过,问从起点走到终点的最短路程. 输入格式输入一行,包含8个用空格分隔的整数x ...
HDU 3853 LOOP (概率DP求期望)
D - LOOPS Time Limit:5000MS Memory Limit:65536KB 64bit IO Format:%I64d & %I64u Submit St ...

onWebView检查网页中文

onWebView检查网页中文的更多相关文章

随机推荐

热门专题