There are two critical aspects for designing hash, one is the hash function; the other is the hash collision, which in worse case will turn a hash table into linked list, if the chaining approach is used in constructing hash.
There are several ways to build hash table without any collisions or pre-defined bucket size. Cuckoo hash table is one of them. Cuckoo is one kind of birds who kicks off the host's egg and put its own in. See this link http://www.nwf.org/news-and-magazines/national-wildlife/birds/archives/1997/bullies-of-the-bird-world.aspx
Cuckoo hash do exactly the same, there is one interesting paper written by a group of computer scientists from CMU, Cuckoo Filter: Practically Better Than Bloom, they developed a cuckoo algorithm which performs better than bloom function.
The below is my untested version of cuckoo hash.
The below is my untested version of cuckoo hash.
import java.util.Collections;
import java.util.Random;
public class CuckooHash {
//the buckets
private Entry[] cache;
//the size of the buckets
private int capacity;
//the number of filled buckets
private int size;
//the maximum ratio of size over capacity until resizing
private final float loadFactor;
private final transient IntHashFuncI[] hashFunctions;
this(iniCapacity, loadfactor, new IntHashFuncI[]{new HashFunc(1) , new HashFunc(2)});
}
this(16, 0.75f, new IntHashFuncI[]{new HashFunc(1) , new HashFunc(2)});
}
cache = new Entry[iniCapacity];
loadFactor = loadfactor;
capacity = iniCapacity;
hashFunctions = hashFuncs;
}
for(int i=0; i<2; i++){
int index = hashFunctions[i].hash(id, capacity);
if(cache[index]!=null && cache[index].id == id)
return cache[index].t;
}
return null;
}
for(int i=0; i<2; i++){
int index = hashFunctions[i].hash(id, capacity);
if(cache[index]==null){
cache[index] = new Entry(id, t);
this.size++;
return true;
}}
return false;
}
if(this.size>=this.loadFactor*this.capacity){
System.out.format("ensureCapacity %d, %d %d %f", id, this.size, this.capacity, this.loadFactor);rehash(this.capacity<<1);
}
}
ensureCapacity(id);
return;
//start the cuckoo bullying process
Entry insert = new Entry(id, t);Entry current = insert;
int counter = 0;
int index = hashFunctions[0].hash(id, capacity);
while(counter++<this.capacity || current!=insert ){
if(cache[index]==null){
cache[index] = current;
size++;
return;
}Entry temp = cache[index];
cache[index] = current;
current = temp;
index = hashFunctions[1].hash(current.id, capacity);
else
index = hashFunctions[0].hash(current.id, capacity);}
put(id, t);
}
//double hash table size
private void rehash(int newSize) {System.out.println("rehash to " + newSize);
int temp = this.size;
this.capacity = newSize;
Entry[] oldCache = cache;
cache = new Entry[newSize];
for(Entry e : oldCache){
if(e!=null)
this.put(e.id, e.tags);
}
this.size = temp;
System.out.println("rehash and size is " + this.size);
}
//int primitive HashMap entry
private class Entry {
private final int id;
private final Object t;public Entry(int id, Object t) {
super();
this.id = id;
this.t = t;
}
}
private static final Random GENERATOR = new Random();
private int round;
HashFunc(int loop){round = loop;
}
GENERATOR.setSeed(key);
int hash = GENERATOR.nextInt(range);
for(int i=0; i<this.round; i++)
hash = GENERATOR.nextInt(range);
return hash;
}
}
private static interface IntHashFuncI {
public int hash(int key, int range);}
It looks pretty cool. As I read, cuckoo hash has a amortized time. Query would be fast. It would be cooler if you use generic type for this class.
ReplyDeleteI'm fighting with a Spring Batch framework. It has nothing to do with algorithm…oops...
I am working on a generic type version and a thread safe versIon as well
DeleteI mean amortized time for insertion
ReplyDeleteWhen you resize, for example, from 10 capacity to 20 capacity. Will this lead a O(n) operation?
ReplyDeleteI did a quick reading, it is said add() takes amortized O(1) time even if it starts with 1 capacity.
Deletehttp://www.cs.cornell.edu/courses/cs312/2008sp/lectures/lec20.html
Another question, what if the we want to save a Long type data to hash table. But the hashcode you or java wrote is int. How can we store the Long type data with a small hashcode.
ReplyDeleteI have another question. Since put() and rehash() call each other. Is there a chance that they would call each other until forever?
ReplyDeleteit doesnt happen. since the rehash will double size and put keys in different slots. also another improvement is to add a fixed size of list called stash, if the keys can not be inserted, then put it in stash
Deletealso further improvement is to use 3 hash functions, and fixed size of bucket to allow 2 keys per bucket.
Delete