Friday, March 27, 2015

Cuckoo hash table

As we all know search in a list is O(n), search in a binary search tree is O(logN), and hash is O(1).

There are two critical aspects for designing hash, one is the hash function; the other is the hash collision, which in worse case will turn a hash table into linked list, if the chaining approach is used in constructing hash.

There are several ways to build hash table without any collisions or pre-defined bucket size. Cuckoo hash table is one of them. Cuckoo is one kind of birds who kicks off the host's egg and put its own in. See this link http://www.nwf.org/news-and-magazines/national-wildlife/birds/archives/1997/bullies-of-the-bird-world.aspx

Cuckoo hash do exactly the same, there is one interesting paper written by a group of computer scientists from CMU, Cuckoo Filter: Practically Better Than Bloom, they developed a cuckoo algorithm which performs better than bloom function.

The below is my untested version of cuckoo hash.

import java.util.Collections;
import java.util.Random;


public class CuckooHash {
//the buckets
private Entry[] cache;
//the size of the buckets
private int capacity;
//the number of filled buckets
private int size;
//the maximum ratio of size over capacity until resizing
private final float loadFactor;
private final transient IntHashFuncI[] hashFunctions;
public CuckooHash(int iniCapacity, float loadfactor){
this(iniCapacity, loadfactor, new IntHashFuncI[]{new HashFunc(1) , new HashFunc(2)});
}
CuckooHash() {
this(16, 0.75f, new IntHashFuncI[]{new HashFunc(1) , new HashFunc(2)});
}
private CuckooHash(int iniCapacity, float loadfactor, IntHashFuncI[] hashFuncs){
cache = new Entry[iniCapacity];
loadFactor = loadfactor;
capacity = iniCapacity;
hashFunctions = hashFuncs;

}
public Object get(int id){

for(int i=0; i<2; i++){
int index = hashFunctions[i].hash(id, capacity);
if(cache[index]!=null && cache[index].id == id)
return cache[index].t;
}
return null;
}
boolean insert(int id, Object tags){
for(int i=0; i<2; i++){
int index = hashFunctions[i].hash(id, capacity);
if(cache[index]==null){
cache[index] = new Entry(id, t);
this.size++;
return true;
}
}
return false;
}
private void ensureCapacity (int id) {
if(this.size>=this.loadFactor*this.capacity){
System.out.format("ensureCapacity %d, %d %d %f", id, this.size, this.capacitythis.loadFactor);
rehash(this.capacity<<1);
}
}
 
void put(int id, Object t){
ensureCapacity(id);
if(this.insert(id, t))
return;
//start the cuckoo bullying process
Entry insert = new Entry(id, t);
Entry current = insert;
int counter = 0;
int index = hashFunctions[0].hash(id, capacity);
while(counter++<this.capacity || current!=insert ){
if(cache[index]==null){
cache[index] = current;
size++;
return;
}
Entry temp = cache[index];
cache[index] = current;
current = temp;
if(index == hashFunctions[0].hash(current.id, capacity))
index = hashFunctions[1].hash(current.id, capacity);
else
index = hashFunctions[0].hash(current.id, capacity);
}
rehash(this.capacity<<1);
put(id, t);
}
//double hash table size
private void rehash(int newSize) {
System.out.println("rehash to " + newSize);
int temp = this.size;
this.capacity = newSize;
Entry[] oldCache = cache;
cache = new Entry[newSize];
for(Entry e : oldCache){
if(e!=null)
this.put(e.id, e.tags);
}
this.size = temp;
System.out.println("rehash and size is " + this.size);
}

//int primitive HashMap entry
private class Entry {
private final int id;
private final Object t;
public Entry(int id, Object t) {
super();
this.id = id;
this.t = t;
}
}
static class HashFunc implements IntHashFuncI {
private static final Random GENERATOR = new Random();
private int round;
HashFunc(int loop){
round = loop;
}
public int hash(int key, int range){
GENERATOR.setSeed(key);
int hash = GENERATOR.nextInt(range);
for(int i=0; i<this.round; i++)
hash = GENERATOR.nextInt(range);
return hash;
}
}
private static interface IntHashFuncI {
public int hash(int key, int range);
}

}

Thursday, March 5, 2015

In-place rearrangement - Index-based sort - parking lot problem

Image we have a parking lot with 100 spots, there are 99 cars in each spot, and given one empty spot and array of random unique integers between 0 and 98, e.g. 1, 3, 5, 0, 2, 4.., we want to move cars at index i to array[i].  We can only move car from one spot to another spot, can't park car in any places except those designated spots.

The above similar is analogous to a problem we could face in sorting application, sometimes we have to rearrange the blocks in a disk so we can scan the blocks repeatedly in certain order.  If the disk is full, then we have to rearrange the blocks in-place. Loading one block into memory is to park one car in the extra spot.

This issue is also related to index-based sort problem, that is if the data sort we need to sort is big, in stead of sorting the data, we index the record then sort the indices in stead.

The basic in-place array rearrangement operation is called rotation. We can define a rotation operation at index i like this:

input: array[i], 3, 2, 1, n-1, ..., 0

int startPos = i;

int next_value = array[startPos];

while( startPos ! = next_value ){
       int next_index = next_value;
       next_value = array[next_index];

       array[next_index] = startPos;
       startPos  = next_index;

       if(next_index == i)
              break;
}

One thing we need to improve above is to label the item which we already process, we can simply do negation operation on it. Here is the code:


public static void circleArrange(int[] array){

for(int i=0; i<array.length; i++){
if(array[i]<0) //negative item is already moved
continue;
int startPos = i;
int next_value = array[startPos];
while( startPos != next_value ){

int next_index = next_value;
next_value = array[next_index];
array[next_index] = (startPos + 1) * -1; //mark it negative 
startPos  = next_index;
if(next_index == i)
break;
}
}

for(int i = 0; i< array.length; i++){
if(array[i]<0)
array[i] = array[i]*-1 -1;
}
}


Sunday, March 1, 2015

Circular Sort: an improved insertion sort algorithm

Insertion sort is great, it is in place, online, and stable, usually faster than selection sort and bubble sort.
Selection sort is the one of the sort algorithm using the least number of swaps, which is usually when swap in memory operation is expensive such as in flash.

In reality if the input is partially sorted or input size is small, insertion sort is quicker than quick sort.
When the input size is small, some of the fast sorting algorithms implementation, e.g. in Java, when using quick sort, heap sort, merge sort will fall back to insertion sort. The problem with insertion sort is the numbers of potential moves or shifts in array could be O(n^2), e.g. if the array is in reverse order.

Circular sort is an improved version of insertion sort which using circular buffer or circular array to solve the insertion sort problem.

I strongly recommend you to read the wiki page about circular buffer (cb), http://en.wikipedia.org/wiki/Circular_buffer.

A simple CB implementation involves a start, end pointer and buffer size parameter. An elegant CB implementation using mirroring algorithm.

Here is the code doing circular sort, which essentially an insertion sort, but can append to front and end in O(1) time thanks to the circular buffer.


public static int[] circularSort(int[] nums){
int len = nums.length;
int[] cb = new int[len];
cb[len/2] = nums[0];
int start = len/2, end = len/2;
for(int i=1; i<len; i++){
//if nums[i] is the smallest, append to the front
if(nums[i]<=cb[start]){
start = start == 0? len-1:start-1;
cb[start] = nums[i];
//if nums[i] is the biggest, append to the end
}else if(nums[i]>=cb[end]){
end = end == len-1? 0 : end+1;
cb[end] = nums[i];
}else{
//if nums[i] falls into first half
if(nums[i]<cb[len/2]){
start = start == 0? len-1:start-1;
int p = start;
//shift the first half to left circularly 
while(cb[p]<nums[i]){
int next = p-1>=0? p-1: len-1;
cb[next] = cb[p];
p++;
}
cb[p] = nums[i];
//if nums[i] falls into second half
}else{
end = end == len-1? 0 : end+1;
int p = end;
//shift the second half to right circularly
while(cb[p]>nums[i]){
int next = p+1<len? p+1: 0;
cb[next] = cb[p];
p--;
}
cb[p] = nums[i];
}
}
}
return cb;
}

Algorithm analysis: 
The best case is when the input is already in increasing or decreasing order, because only append or prepend operations are required.  O(N).

For random order input data is handled much better than Insertion Sort, because the number of moves is less. 

Potential improvements or modifications:
1. using binary search to find the position for nuns[i] in either half of the buffer
2. using bigger buffer like 2*N-1 so we don't need to wrap-around operation in circular buffer.
3. doing the sort in place which is the cycle sort implementation.