Dowemo
0 0 0 0

A hashmap is also a very much collection that's based on a hash table 's implementation of the map interface that exists in the form of a key value. In a hashmap, key value is always treated as a whole, and the system will compute the storage location of key value according to the hash algorithm, and we can. Here's an analysis of the access.

One, defined

Hashmap implements map interface, inheritance AbstractMap. Map defines the key to the key mapping, and the AbstractMap class provides the backbone implementation of the map interface to minimize the required work. The AbstractMap class.

publicclass HashMap<K,V>
 extends AbstractMap<K,V>
 implements Map<K,V>, Cloneable, Serializable

Two, co tructor

A hashmap provides three co & tructo &

Hashmap ( ): co tructs an empty hashmap with default initial capacity ( 16 ) and default load factor ( 0. 75 ).

Hashmap ( int initialCapacity ): Co tructs an empty hashmap with the specified initial capacity and default load factor ( 0. 75 ).

Hashmap ( int initialCapacity, float loadfactor ): Co tructs an empty hashmap with the specified initial capacity and load factor.

Here are two arguments: Initial capacity, load factor. These are important paramete that affect the performance of a hashmap, which represents the number of buckets in a hash table, and the initial capacity is the capacity of the hash table. The load factor is a scale that's a hash table before its capacity is automatically increased; the greater the load factor, the greater the load factor is, the greater the load factor. For a hash list using the list method, the average time of finding an element is o ( 1 + a ), so if the load factor is larger, the utilization of the space is If the load factor is too small, the data of the hash table will be too sparse, causing a serious waste of space. The system default load factor is 0. 75, in general, we don't need to modify it.

A hashmap is a data structure that supports fast access, to understand its performance must be aware of its data structure.

Three, data structures

We know that the most common two structures in java are arrays and analog pointers ( references ), and almost all data structures can be implemented with these two. In fact, a hashmap is a list hash, as follows:

. HashMap数据结构图_thumb[13]

From the above figure, we can see the underlying implementation or array of hashmap, just a chain of each item of the array. Where the parameter initialCapacity represents the length of the array. The following is the source code for the hashmap co & tructor:

复制代码
public HashMap(int initialCapacity, float loadFactor) {
 //初始容量不能<0if (initialCapacity <0)
 thrownew IllegalArgumentException("Illegal initial capacity:"
 + initialCapacity);
 //初始容量不能> 最大容量值,HashMap的最大容量值为2^30if (initialCapacity>  MAXIMUM_CAPACITY)
 initialCapacity = MAXIMUM_CAPACITY;
 //负载因子不能 <0if (loadFactor <= 0 || Float.isNaN(loadFactor))
 thrownew IllegalArgumentException("Illegal load factor:"
 + loadFactor);
 // 计算出大于 initialCapacity 的最小的 2 的 n 次方值。int capacity = 1;
 while (capacity < initialCapacity)
 capacity <<= 1;
 this.loadFactor = loadFactor;
 //设置HashMap的容量极限,当HashMap的容量达到该极限时就会进行扩容操作 threshold = (int) (capacity * loadFactor);
 //初始化table数组 table = new Entry[capacity];
 init();
 }
复制代码

As you can see from the source code, a table array is initialized every time a hashmap is created. The element of the table array is an entry node.

复制代码
staticclass Entry<K,V> implements Map.Entry<K,V> {
 final K key;
 V value;
 Entry<K,V> next;
 finalint hash;
 /** * Creates new entry.
 */ Entry(int h, K k, V v, Entry<K,V> n) {
 value = v;
 next = n;
 key = k;
 hash = h;
 }
. . .....
 }
复制代码

Where entry is the inner class of the hashmap, which contains key key, value value, next node next, and hash value, which is very important, because entry is.

The data structure of the hashmap is simply analyzed, and the following will explore how the hashmap implements fast access.

Four, storage implementation: put ( key, vlaue )

First, we first read the source code.

复制代码
public V put(K key, V value) {
 //当key为null,调用putForNullKey方法,保存null与table第一个位置中,这是HashMap允许为null的原因if (key == null)
 return putForNullKey(value);
 //计算key的hash值int hash = hash(key.hashCode()); ------(1)
 //计算key hash 值在 table 数组中的位置int i = indexFor(hash, table.length); ------(2)
 //从i出开始迭代 e,找到 key 保存的位置for (Entry<K, V> e = table[i]; e!= null; e = e.next) {
 Object k;
 //判断该条链上是否有hash值相同的(key相同)
 //若存在相同,则直接覆盖value,返回旧valueif (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
 V oldValue = e.value; //旧值 = 新值 e.value = value;
 e.recordAccess(this);
 return oldValue; //返回旧值 }
 }
 //修改次数增加1 modCount++;
 //将key、value添加至i位置处 addEntry(hash, key, value, i);
 returnnull;
 }
复制代码

Through the source code we can clearly see the process of keeping data in a hashmap as: First, determine if key is null, if null, call the putfornullkey method directly. If you don't want to calculate the hash value of key, then search for the index position in the table array based on the hash value, if the table array has an element at that location, then save the value of the original key if it exists; otherwise, save the element in the chain header ( the fi & t saved element is placed at the end of the chain ). If there's no element in the table, save it directly. This process seems to be simple and deep. There are several points:

1, fi & t iteration. An iterative reason is to prevent the same key value from being found, and when two hash values are found the same, the hashmap is treated with the new value.

2, at ( 1 ), ( 2 ). Here's the essence of the hashmap. First is a hash method, which is a pure mathematical computation, which is a hash value of h.

staticint hash(int h) {
 h ^= (h>> > 20) ^ (h>> > 12);
 return h ^ (h>> > 7) ^ (h>> > 4);
 }

We know that the data distribution needs uniform for the table 's table ( best for each item, so that it can be found directly ), not too tight, too tight, causing the query. How to ensure that the table element distribution is consistent with that. We're going to think about modularity, but because of the large size of modularity, the hashmap is treated like this: Call indexFor method.

staticint indexFor(int h, int length) {
 return h & (length-1);
 }

The underlying array length of the hashmap is always 2 squared, and exists in the co & tructor: Capacity <<= 1; doing so always guarantees the n power of the underlying array of the hashmap. H amp; ( length ) is equivalent to length modulus when length is 2, and speed is much faster than direct modulus, which is an optimization of hashmap at velocity. As for why is 2 the n of.

Let's return to the indexFor method, which has only one statement: H & amp; ( length ), in addition to the above modulus operation, there's a very important responsibility: Distribute table data and make full use of space.

Here we assume that length is 16 ( 2^n ) and 15, h is 5, 6, 7.

. table1_thumb[3]

As the result of n = 15, 6 and 7, this means that they're the same in the table store, that's, the collision, 6, 7 will form a list, which results in. Sure here, only three numbers are analyzed, so we'll look at 0 15.

. table2_thumb[16]

From the above chart we see 8 this collision, and found that waste space is very large, 1, 3, 5, 7, microformats, 11, 13, and no data storage. This is because they're in & amp with 14, the resulting result is always 0, 0001, 0011, 0101, HSPA, 1001, 1011, 1111, which will cause the query to slow down, causing the query to slow down, causing the query to slow down. If length = 16, length = is 1111, then & amp; the value is always the same as the original hash value, and the value is equal to its lower value when the value is. So when length = 2 n, the probability of collision between different hash values is relatively small, so that the data can be evenly distributed in table array.

Here we'll review the put process: When we want to add a pair of key value to a hashmap, the system first evaluates the hash value of key, and then determines the location stored in the table based on the hash value. If there's no element in place, I ert it directly. Otherwise, the element list is iterated and the hash value of its key is compared accordingly. If two hash values are equal and the key value is equal ( e. Hash = = hash & amp; & amp ( ( k = e. Key ) = = key | | key. Equals ( k ) ), then cover the value of the original node with the new entry value. If two hash values are equal but key values aren't, then the node is inserted into the chain header. A specific implementation process see the addEntry method, as follows:

复制代码
void addEntry(int hash, K key, V value, int bucketIndex) {
 //获取bucketIndex处的Entry Entry<K, V> e = table[bucketIndex];
 //将新创建的 Entry 放入 bucketIndex 索引处,并让新的 Entry 指向原来的 Entry  table[bucketIndex] = new Entry<K, V>(hash, key, value, e);
 //若HashMap中元素的个数超过极限了,则容量扩大两倍if (size++> = threshold)
 resize(2 * table.length);
 }
复制代码

There are two points in this method:

It's a chain. It's a very elegant design. Always add new entry objects to bucketIndex at the system. If there's already an object at bucketIndex, the newly added entry object will point to the original entry object, form an entry chain, but if there's no entry object at the bucketIndex, then the new entry object points to null and no entry chain is generated.

Two, expansion problems.

As the number of elements in the hashmap is more and more, the probability of collision becomes more and larger. The length of the linked list will be longer, so. A critical point in a hashmap is equal to the number of elements in the table array length * loading factor. But expansion is a very time-consuming process because it needs to recalculate the location of the data in the new table array and replicate it. So if we already predict the number of elements in a hashmap, the number of default elements can effectively improve the performance of the hashmap.

Five, read implementation: get ( key )

It's relatively simple to fetch relative to the store. Find the entry at the index of the table array by key 's hash value, and then return the value corresponding to the key.

复制代码
public V get(Object key) {
 // 若为null,调用getForNullKey方法返回相对应的valueif (key == null)
 return getForNullKey();
 // 根据该 key 的 hashCode 值计算它的 hash 码 int hash = hash(key.hashCode());
 // 取出 table 数组中指定索引处的值for (Entry<K, V> e = table[indexFor(hash, table.length)]; e!= null; e = e.next) {
 Object k;
 //若搜索的key与查找的key相同,则返回相对应的valueif (e.hash == hash && ((k = e.key) == key || key.equals(k)))
 return e.value;
 }
 returnnull;
 }
复制代码

In addition to the key fast value, in addition to the hashmap data structure, there's a relationship with entry, as mentioned above, the hashmap don't store the key and value separately in the stored procedure, but as an integral key value, which is the entry object. At the same time, value is just key. In the stored procedure, the system determines the location of the entry in the table array according to the key hashcode, and takes the corresponding entry object.






Copyright © 2011 Dowemo All rights reserved.    Creative Commons   AboutUs