Sunday, February 8, 2015

Interval Tree Implementation

Interval Tree is a kind of ordered statistics tree data structure. The data stored on each node of the tree are intervals along with statistics data on the subtree rooted on current node. The interval values can be some kind of measurement in time and space. It is regularly used for geometry queries, find the streets intersect a street on a digital map, or detect if viewers watch a whole youtube video. Along with sweep-line algorithm it was used to solve an important problem during the design of Very Large Scale Integration (VLSI) microprocessor.




public class IntervalTree{
     
        public class Node implements Comparable<Node>{
            int max;
            Interval interval;
         
            Node left, right;
            public Node(Interval inter) {
                this.interval = inter;
                this.max = inter.right;
            }
            @Override
            public int compareTo(Node o){
                return this.interval.left - o.interval.left;
            }
         
            public boolean overlap(Interval inter){
                return this.interval.left <= inter.right && this.interval.right >= inter.left;
            }
        }
     
        public class Interval implements Comparable<Interval>{
            int left, right;
            public Interval(int l, int r){
                this.left = l; this.right = r;
            }
            @Override
            public int compareTo(Interval i){
                return this.left - i.left;
            }
         
            public boolean overlap(Interval inter){
                return this.left <= inter.right && this.right >= inter.left;
            }
         
        }
     
        private Node tree;
     
        public IntervalTree(Interval[] pairs){
            treeBuild(pairs);
        }
     
        /*
         * this implementation assuming static input, use array to present the ordered intervals
         * for dynamically input such as add/delete on the fly,
         * we can use balanced interval tree, then in-order travel to process the tree
         * storage O(n)
         * query time O(n)
         * construction: O(nlog(n))
         * insertion: o(n)
         */
        public int totalLength(Interval[] pairs){
            //sort by the left corr
            Arrays.sort(pairs);
            List<Interval> merged = new ArrayList<Interval>();
            merged.add(pairs[0]);
         
            for(int i=1; i<pairs.length; i++){
                int size = merged.size();
                Interval lastInter = merged.get(size-1);
                if(lastInter.overlap(pairs[i])){
                    lastInter.right = Math.max(lastInter.right, pairs[i].right);
                }else{
                    merged.add(pairs[i]);
                }
            }
         
            int sum = 0;
            for(Interval inter : merged){
                sum += inter.right - inter.left;
            }
            return sum;
        }
     
        public Interval findOne(Interval input){
            Node curr = tree;
            while(curr!=null){
                if(curr.overlap(input))
                    return curr.interval;
                else if(curr.left==null)
                    curr = curr.right;
                else if(curr.left.max < input.left)
                    curr = curr.right;
                else
                    curr = curr.left;
            }
         
            return null;
        }
     
        public List<Interval> findAll(Interval input){
            List<Interval> result = new ArrayList<Interval>();
            findAll(tree, input, result);
            return result;
        }
     
        //in-order travel with condition check against curr.max and curr.left
        public void findAll(Node curr, Interval input, List<Interval> result){
            if(curr==null)
                return;
           // input is to the right of all intervals, no match
            if(curr.max<input.left)
              return;

            if(curr.left!=null)
                findAll(curr.left, input, result);
         
            if(curr.overlap(input))
                result.add(curr.interval);    
            //input is to the left of the all intervals on the right subtree.
            if(input.right < curr.interval.left)
                return;
         
            if(curr.right!=null)
                findAll(curr.right, input, result);
         
        }
     
     
     
        //a simple tree build without balanced
        //ideally we should use splay tree, or r-b tree
        private void treeBuild(Interval[] pairs){
            tree = new Node(pairs[0]);        
            for(int i=1; i<pairs.length; i++)
                insert(pairs[i], tree);
        }
     
        private void insert(Interval inter, Node curr){
            curr.max = Math.max(curr.max, inter.right);
            if(inter.left<=curr.interval.left){
                if(curr.left==null){
                    curr.left = new Node(inter);
                }else
                    insert(inter, curr.left);
            }
            else{
                if(curr.right==null){
                    curr.right = new Node(inter);
                }else
                    insert(inter, curr.right);
            }
        }
     
    }

13 comments:

  1. I understand your findAll function. It works well! But do you see, it takes O(n) time. Since you might traverse both left and right subtree. I think this is same as iterating all intervals in a linear time. What I expect is a findAll function better than linear time.

    ReplyDelete
    Replies
    1. Nope, it doesn't go down the branch if it found no possible candidates, it is lgN +R

      Delete
    2. For example, build the interval tree for [0,10], [1,9], [2,8], [3,7], [4,6]. Now we want to get all the intervals who covers 5. For this case, it traverse the whole tree, because all intervals comply.

      Delete
    3. I have explained well on this algorithm and its time complexity already, and the code is pretty straightforward.

      Delete
    4. we can always pre-computing the result in certain way, but we have to optimize the solution, consider all of the costs in real world situations.

      Delete
    5. maybe it is logN+R. For the finding 5 among [0,10], [1,9], [2,8], [3,7], [4,6] case, each iteration is a part of result output, which should be considered as a part of R. Sorry for being picky. This is a question I was asked. The given answer is the solution 2 which I posted.

      Delete
    6. And this reminds me of this problem: http://allenlipeng47.com/PersonalPage/index/view/115/nkey
      I think find range in BST has O(logN+R) time.

      Delete
    7. I already said it is lgN + R in the comments above. ;)

      Delete
    8. plus the solution 2 in the blog: http://allenlipeng47.com/PersonalPage/index/view/130/nkey
      is questionable. the algorithm will traverse half of each tree, so it is O(N), N is the size of the tree.

      Delete
    9. and you are not picky here, i like you question my solution so i can defend it, but I was afraid you didn't understand this algorithm here (now you got it) and the issues with your solution 2 and 3.

      Delete
    10. Yes, the solution 2 in my post was told O(logN+R). But the R looks very expensive.

      Delete
    11. Did you read the slide I gave? It said interval tree has 3 versions. I wonder if version 2 provides better to implement findAll() function. But I'm not sure how version 2 interval tree look like.

      Delete
    12. lgN + R is reasonable in practice for average case, even for worse case if there are millions of results, processing them one by one is understandable too. people are aware of its expense. thinking about those findAll() database queries, and most of time the records are fetched from disk to caches, then process in cache.

      Delete