背景介绍

最近再做一个RSS阅读工具给自己用，其中一个环节是从服务器端获取一个包含了RSS源列表的json文件，再根据这个json文件下载、解析RSS内容。核心代码如下：

class PresenterImpl(val context: Context, val activity: MainActivity) : IPresenter {

    private val URL_API = "https://vimerzhao.github.io/others/rssreader/RSS.json"

    override fun getRssResource(): RssSource {

        val gson = GsonBuilder().create()

        return gson.fromJson(getFromNet(URL_API), RssSource::class.java)

    }

    private fun getFromNet(url: String): String {

        val result = URL(url).readText()

        return result

    }

    ......

}

之前一直执行地很好，直到前两天我购买了一个vimerzhao.top的域名，并将原来的域名vimerzhao.github.io重定向到了vimerzhao.top。这个工具就无法使用了，但在浏览器输入URL_API却能得到数据：

那为什么URL.readText()没有拿到数据呢？

不支持重定向

可以通过下面代码测试：

import java.net.*;

import java.io.*;

public class TestRedirect {

    public static void main(String args[]) {

        try {

            URL url1 = new URL("https://vimerzhao.github.io/others/rssreader/RSS.json");

            URL url2 = new URL("http://vimerzhao.top/others/rssreader/RSS.json");

            read(url1);

            System.out.println("=--------------------------------=");

            read(url2);

        } catch (Exception e) {

            e.printStackTrace();

        }

    }

    public static void read(URL url) {

        try {

            BufferedReader in = new BufferedReader(

                    new InputStreamReader(url.openStream()));

            String inputLine;

            while ((inputLine = in.readLine()) != null) {

                System.out.println(inputLine);

            }

            in.close();

        } catch (IOException e) {

            e.printStackTrace();

        }

    }

}

得到结果如下：

<html>

<head><title>301 Moved Permanently</title></head>

<body bgcolor="white">

<center><h1>301 Moved Permanently</h1></center>

<hr><center>nginx</center>

</body>

</html>

=--------------------------------=

{"theme":"tech","author":"zhaoyu","email":"dutzhaoyu@gmail.com","version":"0.01","contents":[{"category":"综合版块","websites":[{"tag":"门户网站","url":["http://geek.csdn.net/admin/news_service/rss","http://blog.jobbole.com/feed/","http://feed.cnblogs.com/blog/sitehome/rss","https://segmentfault.com/feeds","http://www.codeceo.com/article/category/pick/feed"]},{"tag":"知名社区","url":["https://stackoverflow.com/feeds","https://www.v2ex.com/index.xml"]},{"tag":"官方博客","url":["https://www.blog.google/rss/","https://blog.jetbrains.com/feed/"]},{"tag":"个人博客-行业","url":["http://feed.williamlong.info/","https://www.liaoxuefeng.com/feed/articles"]},{"tag":"个人博客-学术","url":["http://www.norvig.com/rss-feed.xml"]}]},{"category":"编程语言","websites":[{"tag":"Kotlin","url":["https://kotliner.cn/api/rss/latest"]},{"tag":"Python","url":["https://www.python.org/dev/peps/peps.rss/"]},{"tag":"Java","url":["http://www.codeceo.com/article/category/develop/java/feed"]}]},{"category":"行业动态","websites":[{"tag":"Android","url":["http://www.codeceo.com/article/category/develop/android/feed"]}]},{"category":"乱七八遭","websites":[{"tag":"Linux-综合","url":["https://linux.cn/rss.xml","http://www.linuxidc.com/rssFeed.aspx","http://www.codeceo.com/article/tag/linux/feed"]},{"tag":"Linux-发行版","url":["https://blog.linuxmint.com/?feed=rss2","https://manjaro.github.io/feed.xml"]}]}]}

HTTP返回码301，即发生了重定向。可在浏览器上这个过程太快以至于我们看不到这个301界面的出现。这里需要说明的是URL.readText()是Kotlin中一个扩展函数，本质还是调用了URL类的openStream方法，部分源码如下：

.....

/**

 * Reads the entire content of this URL as a String using UTF-8 or the specified [charset].

 *

 * This method is not recommended on huge files.

 *

 * @param charset a character set to use.

 * @return a string with this URL entire content.

 */

@kotlin.internal.InlineOnly

public inline fun URL.readText(charset: Charset = Charsets.UTF_8): String = readBytes().toString(charset)

/**

 * Reads the entire content of the URL as byte array.

 *

 * This method is not recommended on huge files.

 *

 * @return a byte array with this URL entire content.

 */

public fun URL.readBytes(): ByteArray = openStream().use { it.readBytes() }

所以上面的测试代码即说明了URL.readText()失败的原因。

不过URL不支持重定向是否合理？为什么不支持？还有待探究。

不稳定的`equals`方法

首先看下equals的说明(URL (Java Platform SE 7 ))：

Compares this URL for equality with another object.

If the given object is not a URL then this method immediately returns false.

Two URL objects are equal if they have the same protocol, reference equivalent hosts, have the same port number on the host, and the same file and fragment of the file.

Two hosts are considered equivalent if both host names can be resolved into the same IP addresses; else if either host name can't be resolved, the host names must be equal without regard to case; or both host names equal to null.

Since hosts comparison requires name resolution, this operation is a blocking operation.

Note: The defined behavior for equals is known to be inconsistent with virtual hosting in HTTP.

接下来再看一段代码：

import java.net.*;

public class TestEquals {

    public static void main(String args[]) {

        try {

            // vimerzhao的博客主页

            URL url1 = new URL("https://vimerzhao.github.io/");

            // zhanglanqing的博客主页

            URL url2 = new URL("https://zhanglanqing.github.io/");

            // vimerzhao博客主页重定向后的域名

            URL url3 = new URL("http://vimerzhao.top/");

            System.out.println(url1.equals(url2));

            System.out.println(url1.equals(url3));

        } catch (Exception e) {

            e.printStackTrace();

        }

    }

}

根据定义输出结果是什么呢？运行之后是这样：

true

false

你可能猜对了，但如果我把电脑断网之后再次执行，结果却是：

false

false

但其实3个域名的IP地址都是相同的，可以ping一下：

zhaoyu@Inspiron ~/Project $ ping vimezhao.github.io

PING sni.github.map.fastly.net (151.101.77.147) 56(84) bytes of data.

64 bytes from 151.101.77.147: icmp_seq=1 ttl=44 time=396 ms

^C

--- sni.github.map.fastly.net ping statistics ---

1 packets transmitted, 1 received, 0% packet loss, time 0ms

rtt min/avg/max/mdev = 396.692/396.692/396.692/0.000 ms

zhaoyu@Inspiron ~/Project $ ping zhanglanqing.github.io

PING sni.github.map.fastly.net (151.101.77.147) 56(84) bytes of data.

64 bytes from 151.101.77.147: icmp_seq=1 ttl=44 time=396 ms

^C

--- sni.github.map.fastly.net ping statistics ---

2 packets transmitted, 1 received, 50% packet loss, time 1000ms

rtt min/avg/max/mdev = 396.009/396.009/396.009/0.000 ms

zhaoyu@Inspiron ~/Project $ ping vimezhao.top

ping: unknown host vimezhao.top

zhaoyu@Inspiron ~/Project $ ping vimerzhao.top

PING sni.github.map.fastly.net (151.101.77.147) 56(84) bytes of data.

64 bytes from 151.101.77.147: icmp_seq=1 ttl=44 time=409 ms

^C

--- sni.github.map.fastly.net ping statistics ---

2 packets transmitted, 1 received, 50% packet loss, time 1001ms

rtt min/avg/max/mdev = 409.978/409.978/409.978/0.000 ms

首先看一下有网络连接的情况，vimerzhao.github.io和zhanglanqing.github.io是我和我同学的博客，虽然内容不一样但是指向相同的IP，协议、端口等都相同，所以相等了；而vimerzhao.github.io虽然和vimerzhao.top指向同一个博客，但是一个是https一个是http，协议不同，所以判断为不相等。相信这和大多数人的直觉是相背的：指向不同博客的URL相等了，但指向相同博客的URL却不相等！

再分析断网之后的结果：首先查看URL的源码：



    public boolean equals(Object obj) {

        if (!(obj instanceof URL))

            return false;

        URL u2 = (URL)obj;

        return handler.equals(this, u2);

    }

再看handler对象的源码：



    protected boolean equals(URL u1, URL u2) {

        String ref1 = u1.getRef();

        String ref2 = u2.getRef();

        return (ref1 == ref2 || (ref1 != null && ref1.equals(ref2))) &&

               sameFile(u1, u2);

    }

sameFile源码：



    protected boolean sameFile(URL u1, URL u2) {

        // Compare the protocols.

        if (!((u1.getProtocol() == u2.getProtocol()) ||

              (u1.getProtocol() != null &&

               u1.getProtocol().equalsIgnoreCase(u2.getProtocol()))))

            return false;

        // Compare the files.

        if (!(u1.getFile() == u2.getFile() ||

              (u1.getFile() != null && u1.getFile().equals(u2.getFile()))))

            return false;

        // Compare the ports.

        int port1, port2;

        port1 = (u1.getPort() != -1) ? u1.getPort() : u1.handler.getDefaultPort();

        port2 = (u2.getPort() != -1) ? u2.getPort() : u2.handler.getDefaultPort();

        if (port1 != port2)

            return false;

        // Compare the hosts.

        if (!hostsEqual(u1, u2))

            return false;// 无网络连接时会触发这一句

        return true;

    }

最后是hostsEqual的源码：



    protected boolean hostsEqual(URL u1, URL u2) {

        InetAddress a1 = getHostAddress(u1);

        InetAddress a2 = getHostAddress(u2);

        // if we have internet address for both, compare them

        if (a1 != null && a2 != null) {

            return a1.equals(a2);

        // else, if both have host names, compare them

        } else if (u1.getHost() != null && u2.getHost() != null)

            return u1.getHost().equalsIgnoreCase(u2.getHost());

         else

            return u1.getHost() == null && u2.getHost() == null;

    }

在有网络的情况下，a1和a2都不是null所以会触发return a1.equals(a2)，返回true；而没有网络时则会触发return u1.getHost().equalsIgnoreCase(u2.getHost());即第二个判断，显然url1的host（vimerzhao.github.io）和url2的host（zhanglanqing.github.io）不等，所以返回false，导致if (!hostsEqual(u1, u2))判断为真，return false执行。

可见，URL类的equals方法不仅违反直觉还缺乏一致性，在不同环境会有不同结果，十分危险！

耗时的`equals`方法

此外，equals还是个耗时的操作，因为在有网络的情况下需要进行DNS解析，hashCode()同理，这里以hashCode()为例说明。URL类的hashCode()源码：

    public synchronized int hashCode() {

        if (hashCode != -1)

            return hashCode;

        hashCode = handler.hashCode(this);

        return hashCode;

    }

handler对象的hashCode()方法：



    protected int hashCode(URL u) {

        int h = 0;

        // Generate the protocol part.

        String protocol = u.getProtocol();

        if (protocol != null)

            h += protocol.hashCode();

        // Generate the host part.

        InetAddress addr = getHostAddress(u);

        if (addr != null) {

            h += addr.hashCode();

        } else {

            String host = u.getHost();

            if (host != null)

                h += host.toLowerCase().hashCode();

        }

        // Generate the file part.

        String file = u.getFile();

        if (file != null)

            h += file.hashCode();

        // Generate the port part.

        if (u.getPort() == -1)

            h += getDefaultPort();

        else

            h += u.getPort();

        // Generate the ref part.

        String ref = u.getRef();

        if (ref != null)

            h += ref.hashCode();

        return h;

    }

其中getHostAddress()会消耗大量时间。所以，如果在基于哈希表的容器中存储URL对象，简直就是灾难。下面这段代码，对比了URL和URI在存储50次时的表现：

import java.net.*;

import java.util.*;

public class TestHash {

    public static void main(String args[]) {

        HashSet<URL> list1 = new HashSet<>();

        HashSet<URI> list2 = new HashSet<>();

        try {

            URL url1 = new URL("https://vimerzhao.github.io/");

            URI url2 = new URI("https://zhanglanqing.github.io/");

            long cur = System.currentTimeMillis();

            int cnt = 50;

            for (int i = 0; i < cnt; i++) {

                list1.add(url1);

            }

            System.out.println(System.currentTimeMillis() - cur);

            cur = System.currentTimeMillis();

            for (int i = 0; i < cnt; i++) {

                list2.add(url2);

            }

            System.out.println(System.currentTimeMillis() - cur);

        } catch (Exception e) {

            e.printStackTrace();

        }

    }

}

输出为：

271

0

所以，基于哈希表实现的容器最好不要用URL。

`TrailingSlash`的作用

所谓TrailingSlash就是域名结尾的斜杠。比如我们在浏览器看到vimerzhao.top,复制后粘贴发现是http://vimerzhao.top/。首先用下面代码测试：

import java.net.*;

import java.io.*;

public class TestTrailingSlash {

    public static void main(String args[]) {

        try {

            URL url1 = new URL("https://vimerzhao.github.io/");

            URL url2 = new URL("https://vimerzhao.github.io");

            System.out.println(url1.equals(url2));

            outputInfo(url1);

            outputInfo(url2);

        } catch (Exception e) {

            e.printStackTrace();

        }

    }

    public static void outputInfo(URL url) {

        System.out.println("------" + url.toString() + "----------");

        System.out.println(url.getRef());

        System.out.println(url.getFile());

        System.out.println(url.getHost());

        System.out.println("----------------");

    }

}

得到结果如下：

false

------https://vimerzhao.github.io/----------

null

/

vimerzhao.github.io

----------------

------https://vimerzhao.github.io----------

null

vimerzhao.github.io

----------------

其实，无论用前面的read()方法读或者地址栏直接输入url，url1和url2的内容都是相同的，但是加/表示这是一个目录，不加表示这是一个文件，所以二者getFile()的结果不同，导致equals判断为false。在地址栏输入时甚至不会觉察到这个TrailingSlash，所返回的结果也一样，但equals判断竟然为false，真是防不胜防！

这里还有一个问题就是：一个是文件，令一个是目录，为什么都能得到相同结果？

调查一番后发现：其实再请求的时候如果有/，那么就会在这个目录下找index.html文件；如果没有，以vimerzhao.top/tags为例，则会先找tags，如果找不到就会自动在后面添加一个/，再在tags目录下找index.html文件。如图：

这里有一个有趣的测试，编写两段代码如下：

import java.net.*;

import java.io.*;

public class TestTrailingSlash {

    public static void main(String args[]) {

        try {

            URL urlWithSlash = new URL("http://vimerzhao.top/tags/");

            int cnt = 5;

            long cur = System.currentTimeMillis();

            for (int i = 0; i < cnt; i++) {

                read(urlWithSlash);

            }

            System.out.println(System.currentTimeMillis() - cur);

        } catch (Exception e) {

            e.printStackTrace();

        }

    }

    public static void read(URL url) {

        try {

            BufferedReader in = new BufferedReader(

                    new InputStreamReader(url.openStream()));

            String inputLine;

            while ((inputLine = in.readLine()) != null) {

                //System.out.println(inputLine);

            }

            in.close();

        } catch (IOException e) {

            e.printStackTrace();

        }

    }

}



import java.net.*;

import java.io.*;

public class TestWithoutTrailingSlash {

    public static void main(String args[]) {

        try {

            URL urlWithoutSlash = new URL("http://vimerzhao.top/tags");

            int cnt = 5;

            long cur = System.currentTimeMillis();

            for (int i = 0; i < cnt; i++) {

                read(urlWithoutSlash);

            }

            System.out.println(System.currentTimeMillis() - cur);

        } catch (Exception e) {

            e.printStackTrace();

        }

    }

    public static void read(URL url) {

        try {

            BufferedReader in = new BufferedReader(

                    new InputStreamReader(url.openStream()));

            String inputLine;

            while ((inputLine = in.readLine()) != null) {

                //System.out.println(inputLine);

            }

            in.close();

        } catch (IOException e) {

            e.printStackTrace();

        }

    }

}

使用如下脚本测试：

#!/bin/sh

for i in {1..20}; do

    java TestTrailingSlash > out1

    java TestWithoutTrailingSlash > out2

done

将输出的时间做成表格：

可以发现，添加了/的速度更快，这是因为省去了查找是否有tags文件的操作。这也给我们启发：URL结尾的/最好还是加上！

以上，本周末发现的一些坑。

参考

Java URL类踩坑指南的更多相关文章

Java 热更新 Groovy 实践及踩坑指南
Groovy 是什么? Apache的Groovy是Java平台上设计的面向对象编程语言.这门动态语言拥有类似Python.Ruby和Smalltalk中的一些特性,可以作为Java平台的脚本语言使用 ...
C# -- HttpWebRequest 和 HttpWebResponse 的使用 C#编写扫雷游戏使用IIS调试ASP.NET网站程序 WCF入门教程 ASP.Net Core开发(踩坑)指南 ASP.Net Core Razor+AdminLTE 小试牛刀 webservice创建、部署和调用 .net接收post请求并把数据转为字典格式
C# -- HttpWebRequest 和 HttpWebResponse 的使用 C# -- HttpWebRequest 和 HttpWebResponse 的使用结合使用HttpWebReq ...
Spring WebSocket踩坑指南
Spring WebSocket踩坑指南本次公司项目中需要在后台与安卓App间建立一个长连接,这里采用了Spring的WebSocket,协议为Stomp. 关于Stomp协议这里就不多介绍了,网上 ...
Nuxt.js的踩坑指南（常见问题汇总）
本文会不定期更新在nuxt.js中遇到的问题进行汇总.转发请注明出处,尊重作者,谢谢! 强烈推荐作者文档版踩坑指南,点击跳转踩坑指南在Nuxt的官方文档中,中文文档和英文文档都存在着不小的差异. 1 ...
树莓派4B踩坑指南 - （15）搭建在线python IDE
今天想在树莓派上自己搭一个在线的python IDE,于是找到了一篇教程--Fred913大神的从头开始制作OJ-在线IDE的搭建自己尝试动手做了一下, 还是发现不少细节需要注意, 记录在此如果不 ...
正则表达式 test 踩坑指南
正则表达式 test 踩坑指南 test 只能使用一次,第二次返回的是错误结果! reg = /edg|edge/g; /edg|edge/g reg.test(`edg`) true reg.tes ...
Taro 开发踩坑指南 (小程序，H5, RN)
Taro 开发踩坑指南 (小程序,H5, RN) css taro 如何展示多行文本省略号 https://www.cnblogs.com/xgqfrms/p/12569057.html UI 设计稿 ...
小程序 & taro 踩坑指南
小程序 & taro 踩坑指南微信开发者工具, 不支持 react bug https://github.com/NervJS/taro/issues/5042 solution just ...
『OGG 02』Win7 配置 Oracle GoldenGate Adapter Java 踩坑指南
上一文章 <__Win7 配置OGG(Oracle GoldenGate).docx>定下了两个目标: 目标1: 给安装的Oracle_11g 创建两个用户 admin 和 root ...

随机推荐

win10 uwp 自定义控件 SplitViewItem
本文主要是因为汉堡菜单里面列出的菜单很多重复的图标和文字,我把它作为控件,因为是随便写,可能存在错误,如果发现了,请和我说或关掉浏览器,请不要发不良言论. 我们使用汉堡菜单,经常需要一个需要一个图标 ...
用Eclipse 创建一个简单的 Maven JavaWeb 项目
使用Maven 创建一个简单的 javaWeb 项目: 本篇属于创建 JavaWeb 项目的第三篇: 建议阅读本篇之前阅读用 Eclipse 创建一个简单的web项目 ;本篇是这这篇文章的基础 ...
【转】缓存淘汰算法系列之3——FIFO类
原文地址:http://www.360doc.com/content/13/0805/16/13247663_304923435.shtml 1 FIFO 1.1. 原理按照“先进先出(First ...
C#用到windows 消息列表Message类MSG的id代号
C# Constants:private const UInt32 WM_ACTIVATE = 0x0006;private const UInt32 WM_ACTIVATEAPP = 0x001C; ...
树莓派.使用Node.js控制GPIO
树莓派上的40个GPIO是最好玩的东西它们可以被C,/C++, Python, Java等语言直接控制现在就来看看怎么用Node.js做到同样的事情在试验之前, 请先安装好Node.js, 具体 ...
Kotlin——最详细的接口使用、介绍
在Kotlin中,接口(Interface)的使用和Java中的使用方式是有很大的区别.不管是接口中的属性定义,方法等.但是定义方式还是相同的. 目录一.接口的声明 1.接口的声明关键字:inte ...
Strtus2框架笔记
Struts2以WebWork优秀的设计思想为核心,吸收了 Struts框架的部分优点,提供了一个更加整洁的MVC设计模式实现的Web 应用程序框架. Struts2引入了几个新的框架特性:从逻辑中分 ...
rwx对于文件和目录的意义
1.对于文件 r:可读. w:可以编辑,可以修改. x:可以执行.在windows中,可执行指的是.exe,.bat等这些后缀结尾的文件,在linux没有这种限制. 2.对于目录 r:表示可以用ls命 ...
2016-2017 ACM-ICPC Pacific Northwest Regional Contest (Div. 1) K Tournament Wins
题目链接:http://codeforces.com/gym/101201 /* * @Author: lyucheng * @Date: 2017-10-22 14:38:52 * @Last Mo ...
J - Scarily interesting! URAL - 2021
This year at Monsters University it is decided to arrange Scare Games. At the Games all campus gathe ...

Java URL类踩坑指南

背景介绍

不支持重定向

不稳定的equals方法

耗时的equals方法

TrailingSlash的作用

参考

Java URL类踩坑指南的更多相关文章

随机推荐

热门专题

不稳定的`equals`方法

耗时的`equals`方法

`TrailingSlash`的作用