Monday, January 26, 2015

Segment tree - an order statistics binary tree

Binary tree like binary search tree is useful for many operations: search any element, min or max, update, delete, they all have time complexity of O(lgN). However in many high order statistics like range query, like find max/min/sum in the range of index from i to j in O(lgN) time, the order statistics binary trees are often used. Segment tree is one of them.

Segment tree is a binary tree which stores all original elements are leaves in sequential order of their indices, and internal nodes of the tree store the aggregate information of their children. The tree is often represented as binary heap-like data structure. Node i's left children is 2*i+1, right children is 2*i+2. The height of the tree is Ceiling(lg(SIZE_OF_INPUT)). For example, array of 8 elements (0, 1, ..., 7), its height will be 3, array of 6 elements, its height will be 3 too.  So the heap storage size will be (1 + 2 + 4 + .. + 2^height) = 2^(height+1)-1. Note the size of the tree is 2*SIZE_OF_INPUT-1.

The nice thing about segment tree is that the operation on it: search, add, update is very similar to binary heap's bubble down operation, merge operation is like heap's bubble up. The index property of parent and children are exactly like binary heap.


/*
* a segment tree representation using array
* note this supports min query only, but can be adopted to other types like sum, max and etc.
*/
public class SegmentTree {
int[] tree;
int len;
public SegmentTree(int[] nums){
len = nums.length;
int x = (int) Math.ceil(Math.log10(len)/Math.log10(2));
tree = new int[(int) (Math.pow(2, x)*2-1)];
constructTree(0, nums, 0, nums.length-1);
}
//Time Complexity for tree construction is O(n). There are total 2n-1 nodes, and value of //every node is calculated only once in tree construction.
private int constructTree(int current, int[] nums, int start, int end){
if(start==end){
tree[current] = nums[start];
}else{
int left = current*2+1;
int right = current*2+2;
int mid = start + (end-start)/2;
tree[current] = Math.min(constructTree(left, nums, start, mid), constructTree(right, nums, mid+1, end));
}
return tree[current];
}
//Time complexity to query is O(Logn). To query a range minimum, we process at most 
//two nodes at every level and number of levels is O(Logn). 
public int getMin(int l, int r){
return getMinRec(l, r, 0, 0, len-1);
}
private int getMinRec(int l, int r, int index, int start, int end){
if(l<=start && r>=end)
return tree[index];
else if(end<l || start>r){
return Integer.MAX_VALUE;
}else{
int mid = start + (end-start)/2;
return Math.min(getMinRec(l, r, 2*index+1, start, mid), getMinRec(l, r, 2*index+2, mid+1, end));
}
}

public void update(int index, int v){
   update(index, v, 0, 0, len-1);

}

//bubble down

private int update(int index, int v, int treeIndex, int start, int end){
   if(index<start || index >end)
return tree[treeIndex];
 
   if(index == start && index == end){
tree[treeIndex] = v;
return v;
  }
         
  int mid = start + (end-start)/2;
tree[treeIndex] = Math.min(update(index, v, 2*treeIndex+1, start, mid), update(index, v, 2*treeIndex+2, mid+1, end));      
return tree[treeIndex];
       
}
}

The segment tree is a static data structure, that is, in order to construct the tree the original input has to be known first. However it can be used to solve on-line or streaming problem. For example, we can initialized the tree to a large size with values for any future incoming elements are set to default values. For example, for min query, we can set the default values to Integer.MIN_VALUE. In this way, any insertion becomes update operation. Of course the drawback of this implementation is that we are wasting some space in the tree implementation.


6 comments:

  1. I'm looking forward to the insert() funciton.

    ReplyDelete
    Replies
    1. it was described in the last paragraph, e.g. given input array size of 3, we just treat it as array size of 3000, so we can insert element up to 3000.

      Delete
  2. I don't understand what you said "a large size with values for unknown elements". Can you share the blog of it, or draw a picture?

    ReplyDelete
    Replies
    1. yes, input array: 1, 2, 4, 3, 7. we can use the following array for max range query instead:
      1, 2, 4, 3, 7, Integer.MIN_VALUE, Integer.MIN_VALUE, Integer.MIN_VALUE, Integer.MIN_VALUE, Integer.MIN_VALUE, Integer.MIN_VALUE, .....

      Delete
    2. but this is more like a hack than an implementation.

      Delete
    3. suppose array is [1, 2, 4, 3, 7], now I want to insert a new 8 in position 2. Let it become [1, 2, 8, 4, 3, 7]. Can you do that?

      Delete