Reservoir sampling is algorithm to randomly choose m samples from a large sample pool.
The algorithm is as following:
Supposed index starts from 0
1. choose first m sample from the pool
2. for each sample after that, e.g. index at k, randomly choose number i in the range of [0, k]
if(i<=m) swap number at i and k
Output the numbers at index from 0 to m.
public int[] ReservoirSampling(int[] input, int m) {
if(input.length<=m)
return input;
int[] output = new int[m];
for(int i = 0; i<input.length; i++){
if(i<m)
output[i] = input[i];
else{
int pick = (int) (Math.random()*(i+1));
if(pick<m){
output[pick] = input[i];
}
}
}
return output;
}
Proof by induction:
If at step i-1, the probability of number in the list is m/i-1
The at step i,
the probability of any number in the output remains is probability of number in step i-1 and
probability it survives the step 1: m/(i-1) * i-1/i = m/i,
the probability of the number at i being chosen is m/i.
Thank you for my friend Peng Li for presenting this problem. Please see his link on this: http://allenlipeng47.com/PersonalPage/index/view/161/nkey
Wednesday, May 27, 2015
Friday, May 22, 2015
Rabin Karp Algorithm
There are couple string match algorithms which can achieve linear time to do the search in stead of O(n*m). Suffix tree algorithm requires to pre-compute the suffix tree and requires O(K*N) space, K is the alphabet size and N is the maximum length of the strings. KPM also needs to pre-compute a table of failure function to quickly find the next starting positions in two string. Rabin Karp algorithm is probably the simplest one, which use hash to compare two strings.
The key to Rabin Karp algorithm is to compute the hash effectively and quickly. This is done by dynamic programming, which is current substring hash value is computed based on the previous one.
Here is the Rabin Karp algorithm to solve an online algorithm for palindrome checking.
Thanks to Peng Li for bringing up this problem. Here is the link to his page on this: http://allenlipeng47.com/PersonalPage/index/view/157/nkey
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package org.blueocean; | |
public class RabinKarpString { | |
static final int prime = 101; | |
static final int base = 103; | |
public static void rabinKarp(String s){ | |
assert(s!=null && !s.isEmpty()); | |
long leftHash = s.charAt(0); | |
long rightHash = s.charAt(0); | |
System.out.println("Yes"); | |
int h = 1; | |
for(int i = 1; i < s.length(); i++){ | |
h = (h*prime)%base; | |
leftHash = (leftHash*prime + s.charAt(i))%base; | |
rightHash = ( (long) (s.charAt(i)*h + rightHash))%base; | |
if(leftHash == rightHash){ | |
isPali(s, i); | |
}else | |
System.out.println("No"); | |
} | |
} | |
public static void isPali(String s, int i){ | |
for(int l=0,r=i; l<=r; l++, r--){ | |
if(s.charAt(l) != s.charAt(r)){ | |
System.out.println("No"); | |
} | |
} | |
System.out.println("Yes"); | |
} | |
} |
Tuesday, May 12, 2015
Merge sort linked list
Merge sort, compared to quick sort has the best and worst time complexity of O(NLgN) and requires O(N) space.
However for linked list, merge sort doesn't require additional space.
However for linked list, merge sort doesn't require additional space.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
public class SortQ { | |
public static class Node { | |
int d; | |
Node next; | |
} | |
public static Node merge_sort2(Node head){ | |
assert(head!=null); | |
if(head.next == null) | |
return head; | |
Node fast = head, slow = head; | |
while(fast!=null && fast.next !=null){ | |
fast = fast.next.next; | |
slow = slow.next; | |
} | |
Node right; | |
if(slow.next!=null){ | |
right = slow.next; | |
//cut half | |
slow.next = null; | |
}else{ | |
right = head.next; | |
head.next = null; | |
} | |
Node sortLeft = merge_sort2(head).next; | |
Node sortRight = merge_sort2(right).next; | |
//create fake head | |
Node ret = new Node(); | |
Node p = ret; | |
while(sortLeft!=null && sortRight!=null){ | |
if(sortLeft.d > sortRight.d){ | |
p.next = sortLeft; | |
sortLeft = sortLeft.next; | |
}else{ | |
p.next = sortRight; | |
sortRight = sortRight.next; | |
} | |
p = p.next; | |
} | |
if(sortLeft!=null) | |
p.next = sortLeft; | |
if(sortRight!=null) | |
p.next = sortRight; | |
return ret; | |
} | |
public static Node merge_sort(Node head){ | |
int size = 0; | |
Node current = head; | |
while(current!=null) | |
size++; | |
return merge_sort(head, 0, size-1); | |
} | |
public static Node merge_sort(Node head, int start, int end){ | |
if(start==end) | |
return head; | |
int mid = (start+end)>>>1; | |
Node left = merge_sort(head, start, mid); | |
Node current = head; | |
int counter = 0; | |
while(counter<=mid-start) | |
current = current.next; | |
Node right = merge_sort(current, mid+1, end); | |
int leftC = 0, rightC =0; | |
Node result; | |
if(left.d > right.d){ | |
result = left; | |
left = left.next; | |
leftC++; | |
} | |
else { | |
result = right; | |
right = right.next; | |
rightC++; | |
} | |
Node p = result; | |
while(leftC < mid - start + 1 && rightC < end - mid ){ | |
if(left.d > right.d){ | |
p.next = left; | |
left = left.next; | |
leftC++; | |
}else{ | |
rightC++; | |
p.next = right; | |
right = right.next; | |
} | |
p = p.next; | |
} | |
if(leftC++ < mid - start + 1) | |
p.next = left; | |
if(rightC++ < end - mid) | |
p.next = right; | |
return head; | |
} | |
} |
Subscribe to:
Posts (Atom)