[源码]HashMap源码解析-1注释说明

Wanxp

2021-04-03

源码 Map HashMap SourceCode

394

前言

一翻开HashMap的源码，映入眼帘的就是冗长的注释。JDK别的不说，就单论注释人家就写满屏幕。在注释中透露着很多HashMap的细节，那就从注释开始看看HashMap到底说了啥。
HashMap的集成与实现关系如下。

实现Cloneable,可以进行复制
实现Serializeable,可以被序列化反序列化
继承自抽象Map:AbstractMap。在AbstractMap中主要是Iterator进行遍历的操作，如遍历节点查询是否containsKey(),containVale()等等。
实现Map接口，可存储与操作键值对。
相对与Hashtable, HashMap在操作和效率上有了很大的改进，java在实现上重在效率，非设计上。里面有很多很多细节的东西可以值得借鉴，这些细节在我们工作中虽然帮不了很大的作用，但却对于我们理解使用用很大的帮助。虽然我们不会在未来有这种源码的coding，但也许呢。

其实HashMap的类注释说明了概要的信息，本博客会根据注释信息解释作者为什么这么说明。

详解

注释

Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.
基于哈希表的Map接口的实现。此实现提供所有可选的映射操作，并允许空值和空键。（HashMap类与Hashtable大致等效，不同之处在于它是不同步的，并且允许为null。）该类不保证映射的顺序；它不保证映射的顺序。特别是，它不能保证顺序会随着时间的推移保持恒定。

Hashtable是同步的，支持多线程操作安全，原因在于Hashtable的方法使用synchronize修饰，对应Hashtable操作都是同步的；而hashmap不是如此，是线程不安全的。
Hashtable不支持key和value是null的情况，从Hashtable的put源码可以窥见一斑

    public synchronized V put(K key, V value) {
        // Make sure the value is not null
        if (value == null) {
            throw new NullPointerException();
        }
    	//..other
	return value;
     }

Hashtable与HashMap是不保证插入的顺序的。

This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
假设哈希函数将元素正确分散在存储桶中，则此实现为基本操作（put和get）提供恒定时间的性能。集合视图上的迭代所需的时间与HashMap实例的“容量”（存储桶数）及其大小（键-值映射数）成正比。因此，如果迭代性能很重要，则不要将初始容量设置得过高（或负载因子过低），这一点非常重要。

基本操作(put与get)的时间复杂度为log(n)。
迭代性能与容量大小（总容量大小非实际key-value个数）成反比。 HashMap的基本结构为数组+链表+红黑树。如果容量大小越大或者负载因子过低会导致数组的长度过长，而遍历HashMap与就需要遍历其数组table，如下的KeySet的实现。外层循环导致的遍历的时间复杂度是o(n)，n为table的长度。所以长度越长时间复杂度越高。
HashkeySet的

    final class KeySet extends AbstractSet<K> {
        public final void forEach(Consumer<? super K> action) {
            Node<K,V>[] tab;
            if (action == null)
                throw new NullPointerException();
            if (size > 0 && (tab = table) != null) {
                int mc = modCount;
                for (Node<K,V> e : tab) {
                    for (; e != null; e = e.next)
                        action.accept(e.key);
                }
                if (modCount != mc)
                    throw new ConcurrentModificationException();
            }

n instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.
HashMap的实例具有两个影响其性能的参数：初始容量(initial capacity)和负载因子(load factor)。容量是哈希表中存储桶的数量，初始容量只是创建哈希表时的容量。负载因子是散列表的容量自动增加之前允许其填充的完整程度的度量。当哈希表中的条目数超过负载因子和当前容量的乘积时，哈希表将被重新哈希（即，内部数据结构将被重建），此时哈希表的存储桶数会扩大两倍。

HashMap有构造方法：

    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }

    /**
     * Returns a power of two size for the given target capacity.
     */
	//返回
    static final int tableSizeFor(int cap) {
        int n = -1 >>> Integer.numberOfLeadingZeros(cap - 1);
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

其中获取threshold的方法为最小大于initialCapacity的2ⁿ，如initialCapacity为5，则threshold为8,阶梯函数图像如下

2. 这里提到的是resize的问题。resize的判断依据是

final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
	//resize，之前已经初始化过了，现在扩容
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
		// double threshold
		//扩容即原来桶数量扩大一倍
                newThr = oldThr << 1; 
        }
        else if (oldThr > 0) 
	// initial capacity was placed in threshold
        // 初始化容器: 在构造方法指定初始化容量()与负载因子()后初始化
	newCap = oldThr;
        else {               
	// zero initial threshold signifies using defaults
	// 初始化容器: 没有在构造方法指定初始化容量()与负载因子()采用默认的数值初始化
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if (oldTab != null) {
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
			//如果是红黑树，则对该树进行拆分
			//根据节点的hash值(注意节点存储的hash内容为hashcode的高16位余低16位的异或结果)与新的table大小与是否为0区分出高低链表，
			//高链表则放置在新桶的((旧桶大小)加上旧的下标位)的位置，
			//并且当链表长度为6或者6以下则转为常规双向链表
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.
通常，默认负载因子（.75）在时间和空间成本之间提供了一个很好的折衷方案。较高的值会减少空间开销，但会增加查找成本（在HashMap类的大多数操作中都得到体现，包括get和put）。设置其初始容量时，应考虑映射中的预期条目数及其负载因子，以最大程度地减少重新哈希操作的数量。如果初始容量大于最大条目数除以负载因子，则将不会进行任何哈希操作。

这里很好理解，每次resize都会有cpu和内存开销，所以最好每次都确认好容量，防止反复扩容问题。

If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table. Note that using many keys with the same hashCode() is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are Comparable, this class may use comparison order among keys to help break ties.
如果将许多映射存储在HashMap实例中，则创建具有足够大容量的映射将比让其根据需要增长表的自动重新哈希处理更有效地存储映射。请注意，使用具有具有大量相同hashCode()的键会降低哈希表性能。为了改善影响，当键为Comparable时，此类可以使用键之间的比较顺序来帮助打破这种尴尬的问题。

如果相同的hashCode内容则会在同一个桶内。但是即使链表长度超过8以及总链表长度超过64，链表转化为红黑树，也会出现一个尴尬的问题：普通双向链表遍历即可通过equals()找出目标key的节点，但是红黑树是依据什么进行排序的(如果他们的hashCode相同,计算出的hash也是相同的)？以下方法是TreeNode的find的方法实现。可以看出节点的遍历主要通过hash的大小进行查找，红黑树也是按照hash的大小排序。

       final TreeNode<K,V> find(int h, Object k, Class<?> kc) {
            TreeNode<K,V> p = this;
            do {
                int ph, dir; K pk;
                TreeNode<K,V> pl = p.left, pr = p.right, q;
                if ((ph = p.hash) > h)
                    p = pl;
                else if (ph < h)
                    p = pr;
                else if ((pk = p.key) == k || (k != null && k.equals(pk)))
                    return p;
                else if (pl == null)
                    p = pr;
                else if (pr == null)
                    p = pl;
                else if ((kc != null ||
                          (kc = comparableClassFor(k)) != null) &&
                         (dir = compareComparables(kc, k, pk)) != 0)
                    p = (dir < 0) ? pl : pr;
                else if ((q = pr.find(h, k, kc)) != null)
                    return q;
                else
                    p = pl;
            } while (p != null);
            return null;
        }

接上面的问题，如果map的链表中的桶的链表超过长度转成红黑树，但是hashCode的值都是相同的，hashMap是依据什么进行排序。答案是key的ClassName。当className不相同则比较className，如果className相同则比较System.identityHashCode(obj)结果,而System.identityHashCode()是native方法为jvm底层本地方法实现，等效于Object.hashCode()。hashCode方法可以被重写并返回重写后的值，identityHashCode会返回对象的hash值而不管对象是否重写了hashCode方法。

static int tieBreakOrder(Object a, Object b) {
      int d;
	//当
	//a == null || b == null
	//a 的className 与 b的className相等
	//则使用identityHashCode判断
      if (a == null || b == null ||
          (d = a.getClass().getName().compareTo(b.getClass().getName())) == 0)
           d = (System.identityHashCode(a) <= System.identityHashCode(b) ?
           	-1 : 1);
      return d;
}

    /**
     * Returns the same hash code for the given object as
     * would be returned by the default method hashCode(),
     * whether or not the given object's class overrides
     * hashCode().
     * The hash code for the null reference is zero.
     *
     * @param x object for which the hashCode is to be calculated
     * @return  the hashCode
     * @since   JDK1.1
     *
    public static native int identityHashCode(Object x);

上面第1条中的代码中可以看出，如果所有的hash值相同，则判断是否实现了compareComparable接口，若实现了则比较，若比较结果相同或者没有实现caompare接口，则使用pr.find()，这个方法是递归，递归情况还是当前情况则一直是取当前节点的right节点进行下一步递归。

Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:
Map m = Collections.synchronizedMap(new HashMap(...));
请注意，此实现未同步。如果多个线程同时访问一个哈希映射，并且至少有一个线程在结构上修改该映射，则必须在外部进行同步。（结构修改是添加或删除一个或多个映射的任何操作；仅更改与实例已经包含的键相关联的值不是结构修改。）通常通过在自然封装了Map的某个对象上进行同步来完成此操作。。如果不存在这样的对象，则应使用Collections.synchronizedMap方法“包装”Map。最好在创建时完成此操作，以防止意外不同步地访问Map：

HashMap是线程不安全的这我们很清楚，但注释中给出了一个方案去解决这个问题，那就是使用Collections.synchronizedMap去包装HashMap,就做到了同步问题。那为啥呢，答案是封装。SynchronizedMap内的属性Map就是传入的HashMap。SynchronizedMap的所有操作都会使用synchronized进行修饰。

    /**
     * Returns a synchronized (thread-safe) map backed by the specified
     * map.  In order to guarantee serial access, it is critical that
     * <strong>all</strong> access to the backing map is accomplished
     * through the returned map.<p>
     *
     * It is imperative that the user manually synchronize on the returned
     * map when iterating over any of its collection views:
     * <pre>
     *  Map m = Collections.synchronizedMap(new HashMap());
     *      ...
     *  Set s = m.keySet();  // Needn't be in synchronized block
     *      ...
     *  synchronized (m) {  // Synchronizing on m, not s!
     *      Iterator i = s.iterator(); // Must be in synchronized block
     *      while (i.hasNext())
     *          foo(i.next());
     *  }
     * </pre>
     * Failure to follow this advice may result in non-deterministic behavior.
     *
     * <p>The returned map will be serializable if the specified map is
     * serializable.
     *
     * @param <K> the class of the map keys
     * @param <V> the class of the map values
     * @param  m the map to be "wrapped" in a synchronized map.
     * @return a synchronized view of the specified map.
     */
    public static <K,V> Map<K,V> synchronizedMap(Map<K,V> m) {
        return new SynchronizedMap<>(m);
    }
    private static class SynchronizedMap<K,V>
        implements Map<K,V>, Serializable {
        private static final long serialVersionUID = 1978198479659022715L;

        private final Map<K,V> m;     // Backing Map
       public V get(Object key) {
            synchronized (mutex) {return m.get(key);}
        }

        public V put(K key, V value) {
            synchronized (mutex) {return m.put(key, value);}
        }
}

The iterators returned by all of this class's "collection view methods" are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove method, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
由此类的所有“集合视图方法”返回的迭代器都是快速失败的：如果在创建迭代器后的任何时间以任何方式对映射进行结构修改，则除了通过迭代器自己的remove方法之外，该迭代器都会抛出ConcurrentModificationException 。因此，面对并发修改，迭代器会快速干净地失败，而不会在未来的不确定时间内冒任意，不确定的行为的风险。

Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.
请注意，迭代器的快速失败行为无法得到保证，因为通常来说，在存在不同步的并发修改的情况下，不可能做出任何严格的保证。快速失败的迭代器会尽最大努力抛出ConcurrentModificationException。因此，编写依赖于此异常的程序的正确性是错误的：迭代器的快速失败行为应仅用于检测错误。