Wednesday, May 27, 2015

Reservoir sampling

Reservoir sampling is algorithm to randomly choose m samples from a large sample pool. The algorithm is as following: Supposed index starts from 0 1. choose first m sample from the pool 2. for each sample after that, e.g. index at k, randomly choose number i in the range of [0, k] if(i<=m) swap number at i and k Output the numbers at index from 0 to m.


public int[] ReservoirSampling(int[] input, int m) {
if(input.length<=m)
return input;

int[] output = new int[m];
for(int i = 0; i<input.length; i++){
if(i<m)
output[i] = input[i];
else{
int pick = (int) (Math.random()*(i+1));
if(pick<m){
output[pick] = input[i];
}
}
}

return output;
}

Proof by induction:

If at step i-1, the probability of number in the list is m/i-1

The at step i, 

the probability of any number in the output remains is probability of number in step i-1 and
probability it survives the step 1: m/(i-1) * i-1/i = m/i, 

the probability of the number at i being chosen is m/i.

Thank you for my friend Peng Li for presenting this problem. Please see his link on this: http://allenlipeng47.com/PersonalPage/index/view/161/nkey

Friday, May 22, 2015

Rabin Karp Algorithm

There are couple string match algorithms which can achieve linear time to do the search in stead of O(n*m). Suffix tree algorithm requires to pre-compute the suffix tree and requires O(K*N) space, K is the alphabet size and N is the maximum length of the strings. KPM also needs to pre-compute a table of  failure function to quickly find the next starting positions in two string. Rabin Karp algorithm is probably the simplest one, which use hash to compare two strings.

The key to Rabin Karp algorithm is to compute the hash effectively and quickly. This is done by dynamic programming, which is current substring hash value is computed based on the previous one. 

Here is the Rabin Karp algorithm to solve an online algorithm for palindrome checking.

Thanks to Peng Li for bringing up this problem. Here is the link to his page on this: http://allenlipeng47.com/PersonalPage/index/view/157/nkey

Tuesday, May 12, 2015

Merge sort linked list

Merge sort, compared to quick sort has the best and worst time complexity of O(NLgN) and requires O(N) space.

However for linked list, merge sort doesn't require additional space.