Wednesday, May 27, 2015

Reservoir sampling

Reservoir sampling is algorithm to randomly choose m samples from a large sample pool. The algorithm is as following: Supposed index starts from 0 1. choose first m sample from the pool 2. for each sample after that, e.g. index at k, randomly choose number i in the range of [0, k] if(i<=m) swap number at i and k Output the numbers at index from 0 to m.


public int[] ReservoirSampling(int[] input, int m) {
if(input.length<=m)
return input;

int[] output = new int[m];
for(int i = 0; i<input.length; i++){
if(i<m)
output[i] = input[i];
else{
int pick = (int) (Math.random()*(i+1));
if(pick<m){
output[pick] = input[i];
}
}
}

return output;
}

Proof by induction:

If at step i-1, the probability of number in the list is m/i-1

The at step i, 

the probability of any number in the output remains is probability of number in step i-1 and
probability it survives the step 1: m/(i-1) * i-1/i = m/i, 

the probability of the number at i being chosen is m/i.

Thank you for my friend Peng Li for presenting this problem. Please see his link on this: http://allenlipeng47.com/PersonalPage/index/view/161/nkey

Friday, May 22, 2015

Rabin Karp Algorithm

There are couple string match algorithms which can achieve linear time to do the search in stead of O(n*m). Suffix tree algorithm requires to pre-compute the suffix tree and requires O(K*N) space, K is the alphabet size and N is the maximum length of the strings. KPM also needs to pre-compute a table of  failure function to quickly find the next starting positions in two string. Rabin Karp algorithm is probably the simplest one, which use hash to compare two strings.

The key to Rabin Karp algorithm is to compute the hash effectively and quickly. This is done by dynamic programming, which is current substring hash value is computed based on the previous one. 

Here is the Rabin Karp algorithm to solve an online algorithm for palindrome checking.

package org.blueocean;
public class RabinKarpString {
static final int prime = 101;
static final int base = 103;
public static void rabinKarp(String s){
assert(s!=null && !s.isEmpty());
long leftHash = s.charAt(0);
long rightHash = s.charAt(0);
System.out.println("Yes");
int h = 1;
for(int i = 1; i < s.length(); i++){
h = (h*prime)%base;
leftHash = (leftHash*prime + s.charAt(i))%base;
rightHash = ( (long) (s.charAt(i)*h + rightHash))%base;
if(leftHash == rightHash){
isPali(s, i);
}else
System.out.println("No");
}
}
public static void isPali(String s, int i){
for(int l=0,r=i; l<=r; l++, r--){
if(s.charAt(l) != s.charAt(r)){
System.out.println("No");
}
}
System.out.println("Yes");
}
}
view raw gistfile1.txt hosted with ❤ by GitHub
Thanks to Peng Li for bringing up this problem. Here is the link to his page on this: http://allenlipeng47.com/PersonalPage/index/view/157/nkey

Tuesday, May 12, 2015

Merge sort linked list

Merge sort, compared to quick sort has the best and worst time complexity of O(NLgN) and requires O(N) space.

However for linked list, merge sort doesn't require additional space.

public class SortQ {
public static class Node {
int d;
Node next;
}
public static Node merge_sort2(Node head){
assert(head!=null);
if(head.next == null)
return head;
Node fast = head, slow = head;
while(fast!=null && fast.next !=null){
fast = fast.next.next;
slow = slow.next;
}
Node right;
if(slow.next!=null){
right = slow.next;
//cut half
slow.next = null;
}else{
right = head.next;
head.next = null;
}
Node sortLeft = merge_sort2(head).next;
Node sortRight = merge_sort2(right).next;
//create fake head
Node ret = new Node();
Node p = ret;
while(sortLeft!=null && sortRight!=null){
if(sortLeft.d > sortRight.d){
p.next = sortLeft;
sortLeft = sortLeft.next;
}else{
p.next = sortRight;
sortRight = sortRight.next;
}
p = p.next;
}
if(sortLeft!=null)
p.next = sortLeft;
if(sortRight!=null)
p.next = sortRight;
return ret;
}
public static Node merge_sort(Node head){
int size = 0;
Node current = head;
while(current!=null)
size++;
return merge_sort(head, 0, size-1);
}
public static Node merge_sort(Node head, int start, int end){
if(start==end)
return head;
int mid = (start+end)>>>1;
Node left = merge_sort(head, start, mid);
Node current = head;
int counter = 0;
while(counter<=mid-start)
current = current.next;
Node right = merge_sort(current, mid+1, end);
int leftC = 0, rightC =0;
Node result;
if(left.d > right.d){
result = left;
left = left.next;
leftC++;
}
else {
result = right;
right = right.next;
rightC++;
}
Node p = result;
while(leftC < mid - start + 1 && rightC < end - mid ){
if(left.d > right.d){
p.next = left;
left = left.next;
leftC++;
}else{
rightC++;
p.next = right;
right = right.next;
}
p = p.next;
}
if(leftC++ < mid - start + 1)
p.next = left;
if(rightC++ < end - mid)
p.next = right;
return head;
}
}
view raw gistfile1.java hosted with ❤ by GitHub