Sunday, August 16, 2015

Trie implementation and related string search problems

Trie is special purpose tree to store set of strings to solve a specific set of problems. In stead of storing the characters in the nodes, it is stored on the edges. The string can be constructed by concatenating all the characters found in the path.

For example it can store a dictionary contains a set of words, then it can answer the questions such as:

1. Does word S exist in the dictionary?
2. Given a string S, find all of the words which has the prefix equals to S. Think about design a data structure to support a auto-complete search box.

The followings are Java implementation of Trie to store the set of words from dictionary and support word search in the dictionary.


/*
* trie node definition
* it assumes the character are ascii chacaters
* each word in the dictionary is stored at leaf level
*/


public class TrieNode {
        private final static ALPHABET_SIZE = 256;
        TrieNode[] children = new SuffixNode[ALPHABET_SIZE];
boolean isLeaf;
String word;
}

/*
* suffix tree class
* it defines couple operations to support string search in a dictionary 
*/
public class Trie {
TrieNode root;
public Trie(){
root = new TrieNode();
}
/*
* insert a string into suffix tree
*/
public void insert(String s){
insertHelper(s, 0, root);
}
private void insertHelper(String s, int index, TrieNode node){
if(index == s.length()){
node.isLeaf = true;
node.word = s;
return;
}
TrieNode trieNode = node.children[s.charAt(index)];
if(trieNode==null)
trieNode = new TrieNode();
insertHelper(s, index+1, trieNode);
}
/*
* check if string is prefix of one of the words contained in the tree
*/
public boolean containsPrefix(String s){
return containsPrefix(s, 0, root);
}
private boolean containsPrefix(String s, int index, TrieNode node){
if(index == s.length())
return true;
TrieNode trieNode = node.children[s.charAt(index)];
if(trieNode != null)
return containsPrefix(s, index+1, trieNode);
else
return false;
}

/*
* check if the string is a word existing in the suffix tree
*/
public boolean isWord(String s){
return isWordHelper(s, 0, root);
}
private boolean isWordHelper(String s, int index, TrieNode node){
if(index == s.length())
return node.isLeaf; //the current node must be leaf node
TrieNode trieNode = node.children[s.charAt(index)];
if(trieNode != null)
return isWordHelper(s, index+1, trieNode);
else
return false;
}
/*
* find if the word is prefix of any words contained in the trie tree
* and return the last node which contains the word
*/
public TrieNode searchFor(String s){
return searchFor(s, 0, root);
}
private TrieNode searchFor(String s, int index, TrieNode node){
if(index == s.length())
return node;
TrieNode trieNode = node.children[s.charAt(index)];
if(trieNode != null)
return searchFor(s, index+1, trieNode);
return null;
}
}

No comments:

Post a Comment