【java源码】解读HashTable类背后的实现细节

HashTable这个类实现了哈希表从key映射到value的数据结构形式。任何非null的对象都可以作为key或者value。

要在hashtable中存储和检索对象，作为key的对象必须实现hashCode、equals方法。

一般来说，默认的加载因子（0.75）提供了一种对于空间、时间消耗比较好的权衡策略。太高的值（指加载因子loadFactor）虽然减少了空间开销但是增加了检索时间，这反应在对hashtable的很多操作中，比如get、put方法。

初始容量的控制也是在空间消耗和rehash操作耗时(该操作耗时较大)二者之间的权衡。如果初始容量大于哈希表的当前最大的条目数除以加载因子，则不会发生rehash。但是，将初始容量设置过高会浪费空间。

如果有大量的数据需要放进hashtable，则选择设置较大的初始容量比它自动rehash更优。

在Java平台v1.2中，这个类被重新安装以实现Map接口，使它成为Java集合框架的成员。与新的集合实现不同，Hashtable是同步的。如果不需要线程安全的实现，建议使用HashMap代替Hashtable。如果想要一个线程安全的高并发实现，那么建议使用java.util.concurrent.ConcurrentHashMap取代了Hashtable。

重要理解：Java中的HashTable数据存储结构

HashTable 是以数组和单向链表结合的存储形式；
存储元素时，key通过hash映射函数得到在HashTable存储数组中的位置；
该位置存放的是hash值一致的单向链表的首元素；
新的元素存储到该位置指向的列表中；
数组存储哈希后的key，哈希值相同，则使用链表解决哈希碰撞，放到链表中。

结构示意图如下：

![](https://images2018.cnblogs.com/blog/1424536/201807/1424536-20180705103313084-111211141.png)

HashTable 的属性

private transient Entry<?,?>[] table; // table 是存储数据的数组。
private transient int count; // 在哈希表中已经存储的数据个数
private int threshold; // 哈希表将会在存储数据个数达到这个值时进行rehash，该值 = (int)容量 * 加载因子。
private float loadFactor; // 加载因子，默认为 0.75。
private transient int modCount = 0; // 哈希表发生结构性改变的次数(如新增、删除操作)，这个字段用于在创建迭代器时发生快速失败（fail-fast）发生ConcurrentModificationException。

HashTable 构造器

HashTable 提供了常见的4个构造器，常见的有三个：

指定初始容量、加载因子的构造器

public Hashtable(int initialCapacity, float loadFactor) {

        if (initialCapacity < 0)

            throw new IllegalArgumentException("Illegal Capacity: "+

                                               initialCapacity);

        if (loadFactor <= 0 || Float.isNaN(loadFactor))

            throw new IllegalArgumentException("Illegal Load: "+loadFactor);

        if (initialCapacity==0)

            initialCapacity = 1;

        this.loadFactor = loadFactor;

        table = new Entry<?,?>[initialCapacity];

        threshold = (int)Math.min(initialCapacity * loadFactor, MAX_ARRAY_SIZE + 1);

    }

指定初始容量的构造器

指定初始容量的构造器，其默认加载因子为0.75

public Hashtable(int initialCapacity) {

        this(initialCapacity, 0.75f);

    }

空构造器

空构造器，内部使用了默认初始容量为11，加载因子为0.75.

public Hashtable() {

        this(11, 0.75f);

    }

HashTable 的哈希函数

HashTable 的哈希函数是将key的hashCode跟最大整数进行按位与，最后对哈希表的存储数组的长度进行取模，以便得到该 key 在存储数组中的索引值index。

获取key的hashCode值
key的hashCode值与最大整数值进行按位后对存储数组的长度进行取模，得到该key在存储数组中的安放位置index。

int hash = key.hashCode();

int index = (hash & 0x7FFFFFFF) % tab.length;

hash & 0x7FFFFFFF 使用该值(Integer的最大值进行按位与) 是为了取得绝对值，按位与高位为0，则保证符号位为正。

0x7FFFFFFF 的二进制编码如下，高位为0：

哈希桶内部存储数据结构类 Entry<K,V>

HashTable 内部存储数组中的链表对象：数据使用一个静态内部类对象存储，Entry<K,V>，该实体类包含四个属性：

final int hash; // 哈希值 key.hashCode()
final K key; // 键
V value; // 值
Entry<K,V> next; //指向存储数组的该位置的单向链表中的下一个元素



private static class Entry<K,V> implements Map.Entry<K,V> {

        final int hash;

        final K key;

        V value;

        Entry<K,V> next;

		//......

}

put(K key, V value) 存储数据到哈希表

HashTable的put(key, value) 方法，可以将非空数据key-value 键值对放到哈希表中。

put 方法获取 key 的 hashCode值，再将其与Integer的最大值进行按位与（保证符号为正），得到哈希值，哈希值再对存储数组长度取模得到存储位置index。

然后遍历该链表，选择合适的位置放入该元素。

index 作为开始在存储数组中的索引值进行匹配，如果index处没有存储数据（即没有进入到for循环中），则直接在此位置上添加该key-value键值对（调用 addEntry(hash, key, value, index); 方法）。

如果在index中已经存储过数据了，又分两种情况：

（1）已经存在的数据的 hash 值和当前要存储的数据的 hash 值相等，且 key 也相等。就表示是数据对象相等，则将旧的数据对象的value设置为此次需要存储的数据对象的value，并返回旧的数据对象的value。
（2）如果不相等且找到链表末尾，则在链尾位置插入数据----进入 addEntry(hash, key, value, index); 方法。



public synchronized V put(K key, V value) {

        // Make sure the value is not null

        if (value == null) {

            throw new NullPointerException();

        }

        // Makes sure the key is not already in the hashtable.

        Entry<?,?> tab[] = table;

        int hash = key.hashCode();  //hash值

        int index = (hash & 0x7FFFFFFF) % tab.length; // 高位按位与操作：取得绝对值，按位与高位为0，则保证符号位为正

        @SuppressWarnings("unchecked")

        Entry<K,V> entry = (Entry<K,V>)tab[index]; // 哈希函数后，得到指定的存储数组元素（是一个单项链表的首元素）

        for(; entry != null ; entry = entry.next) {

			// 遍历链表

            if ((entry.hash == hash) && entry.key.equals(key)) {

				// hash 值相等，且 key值equals，则说明已经存在该ke，则替换旧值

                V old = entry.value;

                entry.value = value;

                return old;

            }

        }

		// 链表为空， 则

        addEntry(hash, key, value, index);

        return null;

    }

addEntry(int hash, K key, V value, int index)

该方法用于存储数据key-value对象。

首先会将 modCount 自增，表示该哈希表会发生结构性改变。

如果哈希表数据个数 count 还没有达到阈值 threshold，则直接添加该key-value到index此位置，同时count自增。

如果 count >= threshold 了，就执行rehash操作（详见rehash方法描述），哈希表扩大，



private void addEntry(int hash, K key, V value, int index) {

        modCount++;//结构性改变

        Entry<?,?> tab[] = table;

        if (count >= threshold) {

            // Rehash the table if the threshold is exceeded

            rehash();

            tab = table;

            hash = key.hashCode();

            index = (hash & 0x7FFFFFFF) % tab.length;

        }

        // Creates the new entry.

        @SuppressWarnings("unchecked")

        Entry<K,V> e = (Entry<K,V>) tab[index];

        tab[index] = new Entry<>(hash, key, value, e);

        count++;

}

rehash()

rehash 操作是哈希表的重要且耗时的操作。

rehash 操作会扩大、重新组织哈希表，newCapacity = (oldCapacity << 1) + 1: 新的容量 = 旧容量 * 2 + 1。

存储数组的最大长度设置为Integer的最大值 - 8。

private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;

如果新容量大于存储数组的最大容量：如果旧容量已经等于存储数组的最大容量，则不做任何动作直接返回；否则，新容量置为存储数组的最大容量（已满）。

步骤：

创建一个新的存储数组，容量为新容量newCapacity。

modCount++，结构性变更。

使用新容量更新阈值 threshold。

双层for循环，使用新的容量取模重新计算在新的存储数组中的位置。

protected void rehash() {

        int oldCapacity = table.length;

        Entry<?,?>[] oldMap = table;

        // 设置rehash后的新容量 START...

        int newCapacity = (oldCapacity << 1) + 1; // 新的容量 = 旧容量 * 2 + 1

        if (newCapacity - MAX_ARRAY_SIZE > 0) { //如果新容量大于存储数组的最大容量

            if (oldCapacity == MAX_ARRAY_SIZE)

                // Keep running with MAX_ARRAY_SIZE buckets

                return;

            newCapacity = MAX_ARRAY_SIZE;

        }

        // 设置rehash后的新容量 END.

        Entry<?,?>[] newMap = new Entry<?,?>[newCapacity];

        modCount++;

		// 使用新容量更新阈值

        threshold = (int)Math.min(newCapacity * loadFactor, MAX_ARRAY_SIZE + 1);

        table = newMap;

        for (int i = oldCapacity ; i-- > 0 ;) {

        // 遍历旧的存储数组

            for (Entry<K,V> old = (Entry<K,V>)oldMap[i] ; old != null ; ) {

	            // 如果当前索引位置的链表存在元素，则处理

                Entry<K,V> e = old; // 记录旧的元素

                old = old.next;  // 下一次内层for循环则处理当前元素的下一个元素

				//旧的元素重新根据新容量计算hash

                int index = (e.hash & 0x7FFFFFFF) % newCapacity;

                e.next = (Entry<K,V>)newMap[index];

				//将旧数据放到新表中的新index位置上

                newMap[index] = e;//放置旧元素

            }

        }

        //for循环完成旧表数据堆放到新表中去, 就是一次rehash

    }

get(Object key)

get(Object key) 方法根据指定的key获取其value。该方法通过key的hash值，再进行高位按位与，获取到索引index，然后遍历存储数组，直到找到 hash、key均相等的元素，返回其value。



public synchronized V get(Object key) {

        Entry<?,?> tab[] = table;

        int hash = key.hashCode();

        int index = (hash & 0x7FFFFFFF) % tab.length;

        for (Entry<?,?> e = tab[index] ; e != null ; e = e.next) {

            if ((e.hash == hash) && e.key.equals(key)) {

                return (V)e.value;

            }

        }

        return null;

    }

contains(Object value)

contains(Object value) 判断哈希表的存储数组中是否包含指定的value。会遍历数组，去匹配value，找到一个就返回true。



public synchronized boolean contains(Object value) {

        if (value == null) {

            throw new NullPointerException();

        }

        Entry<?,?> tab[] = table;

        for (int i = tab.length ; i-- > 0 ;) {

            for (Entry<?,?> e = tab[i] ; e != null ; e = e.next) {

                if (e.value.equals(value)) {

                    return true;

                }

            }

        }

        return false;

    }

containsKey(Object key)

containsKey(Object key) 方法用于检查哈希表中是否包含指定的key。



public synchronized boolean containsKey(Object key) {

        Entry<?,?> tab[] = table;

        int hash = key.hashCode();

        int index = (hash & 0x7FFFFFFF) % tab.length;

        for (Entry<?,?> e = tab[index] ; e != null ; e = e.next) {

            if ((e.hash == hash) && e.key.equals(key)) {

                return true;

            }

        }

        return false;

    }

remove(Object key)

remove 操作移除指定的key。会改变 modCount 、count 的值。



public synchronized V remove(Object key) {

        Entry<?,?> tab[] = table;

        int hash = key.hashCode();

        int index = (hash & 0x7FFFFFFF) % tab.length;

        @SuppressWarnings("unchecked")

        Entry<K,V> e = (Entry<K,V>)tab[index];

        for(Entry<K,V> prev = null ; e != null ; prev = e, e = e.next) {

            if ((e.hash == hash) && e.key.equals(key)) {

                modCount++;

                if (prev != null) {

                    prev.next = e.next;

                } else {

                    tab[index] = e.next;

                }

                count--;

                V oldValue = e.value;

                e.value = null;

                return oldValue;

            }

        }

        return null;

    }

clear()

clear() 方法会改变 modCount、count 的值。将存储数组中元素置为 null。



public synchronized void clear() {

        Entry<?,?> tab[] = table;

        modCount++;

        for (int index = tab.length; --index >= 0; )

            tab[index] = null;

        count = 0;

    }

hashCode()

hashTable 的hashCode(0 方法也是有点意思的。

loadFactor = -loadFactor; 这行代码用于防止递归调用hashCode导致堆栈溢出。



public synchronized int hashCode() {

        /*

         * This code detects the recursion caused by computing the hash code

         * of a self-referential hash table and prevents the stack overflow

         * that would otherwise result.  This allows certain 1.1-era

         * applets with self-referential hash tables to work.  This code

         * abuses the loadFactor field to do double-duty as a hashCode

         * in progress flag, so as not to worsen the space performance.

         * A negative load factor indicates that hash code computation is

         * in progress.

         */

        int h = 0;

        if (count == 0 || loadFactor < 0)

            return h;  // Returns zero

        loadFactor = -loadFactor;  // Mark hashCode computation in progress

        Entry<?,?>[] tab = table;

        for (Entry<?,?> entry : tab) {

            while (entry != null) {

                h += entry.hashCode();

                entry = entry.next;

            }

        }

        loadFactor = -loadFactor;  // Mark hashCode computation complete

        return h;

    }