Tuesday, February 3, 2015

Find median in the stream of numbers

A typical on-time problem or streaming problem is that the number of input is unknown or very large, and data is continuously feed in.

Find median of numbers belongs to the category of problems that find kth largest or smallest numbers, which can be solved by quick selection algorithm in O(n) time.

How about if the input size is unknown, how can we compute the median in less than linear time? The solution is to maintain two heaps data structure to hold the first half and second half of the data found so far, then compute the median by looking at the root of each heaps.

In the below solution, to demonstrate the idea without loosing its generality, input is an array.

Then a maxHeap and minHeap is maintained when looping through the numbers, the first half of the data is loaded into maxHeap, and second half in minHeap. Also an invariant that the maxHeap size is equal or one element larger than minHeap is maintained throughout the loop.


/*
* given a stream of numbers, find the median of them at any given time
*/
public static Integer findMedianMinMaxHeap(int[] nums){
PriorityQueue<Integer> minQ = new PriorityQueue<Integer>();
PriorityQueue<Integer> maxQ = new PriorityQueue<Integer>(nums.length>>>2, new Comparator<Integer>(){
@Override
public int compare(Integer o1, Integer o2) {
return o2 - 01;
}
});
if(nums.length == 0)
return null;
if(nums.length>0)
minQ.add(nums[0]);
if(nums.length>1)
maxQ.add(nums[1]);
for(int i = 2; i<nums.length; i++){
if(nums[i] > minQ.peek())
minQ.add(nums[i]);
else
maxQ.add(nums[i]);
//maintain the maxQ and minQ sizes condition mentioned above
if(minQ.size()>maxQ.size())
maxQ.add(minQ.poll());
if(maxQ.size() - minQ.size() == 2)
minQ.add(maxQ.poll());
}
if(nums.length%2==1)
return minQ.peek();
else
return (minQ.peek() + maxQ.peek())>>>1;
}

1 comment:

  1. how about: PriorityQueue maxQ = new PriorityQueue(nums.length >> 2, Collections.reverseOrder());
    I don't understand why you use '>>>', instead of '>>'. For example, -8>>>1 won't be -4.

    ReplyDelete