Monday, January 5, 2015

Given sorted dictionary, find precedence characters


Topology sort algorithm is used to find the order of elements in a dependency graph.

Here is the problem:
http://www.geeksforgeeks.org/given-sorted-dictionary-find-precedence-characters/

public static class CNode {
char c;
Set<CNode> outgoing = new HashSet<CNode>();
CNode(char i){this.c=i;}

  //these two methods are important for hash
  //but we are not using them in this solution
public int hasCode() {return this.c;}
public boolean equals(Object c) {
if(c == this)
return true;
if(c instanceof CNode){
return this.c == ((CNode)c).c;
}
return false;
}
}
public static List<Character> findLanguageOrderDFS(String[] words){

Set<CNode> vertices = new HashSet<CNode>();
createGraph(words, vertices);
List<Character> result = new ArrayList<Character>();
//add those vertices without any incoming edges
Set<CNode> visited = new HashSet<CNode>();
Set<CNode> processed = new HashSet<CNode>();
Stack<CNode> stack = new Stack<CNode>();
for(CNode n : vertices){
if(visited.contains(n))
     continue;
   if(processed.contains(n))
     throw new IllegalArgumentException("cycle found");
DFS(n, visited, processed, stack);
visited.add(n);
stack.add(n);
}
while(!stack.isEmpty()){
result.add(stack.pop().c);
}
return result;
}
public static void DFS(CNode v, Set<CNode> visited,  Set<CNode> processed, Stack<CNode> s){
if(visited.contains(v))
return;
if(processed.contains(v))
throw new IllegalArgumentException("cycle found");
processed.add(v);
for(CNode n : v.outgoing){
if(!visited.contains(n)){
DFS(n, visited, processed, s);
}
}
visited.add(v);
s.push(v);
}


private static void createGraph(String[] words,
Set<CNode> vertices) {
//we may not need this hash map if we can trust the hashcode() written in CNode class
Map<Character, CNode> nodes = new HashMap<Character, CNode>();
for(int i=0; i<words.length-1; i++){
String current = words[i], next = words[i+1];
int j = 0;
for(j=0; j<current.length() && j<next.length() && current.charAt(j) == next.charAt(j); j++){}

char c1=current.charAt(j), c2=next.charAt(j);
CNode start = null, end = null;
if(!nodes.containsKey(c1)){
start = new CNode(c1);
nodes.put(c1, start);
vertices.add(start);
}
if(!nodes.containsKey(c2)){
end = new CNode(c2);
nodes.put(c2, end);
vertices.add(end);
}
start = nodes.get(c1);
end = nodes.get(c2);
start.outgoing.add(end);
}
}


16 comments:

  1. 1. Since you already rewritten hashcode() for CNode, I think you don't need the the nodes for auxiliary. Because you can already do vertices.add() without problem.
    2. For collections class, like "Set vertices", if we want to use the add(), remove(), my feeling is that we only rewrite hashcode(). It has nothing to do with equals().
    Correct me if I'm wrong.
    Besides, from your code, I learned how to do topological sorting in DFS way. Thanks!

    ReplyDelete
    Replies
    1. 2. If we rewrite equals(), we must rewrite hashcode() also. But if we want to rewrite hashcode(), is it necessary that we rewrite equals() too?
      Besides, I understand that it is good that we always rewrite hashcode(), equals(), toString() in a class

      Delete
    2. 1. correct, i don't need the Map nodes, i can use CNode[] vertices = new CNode[N], where N is the size of known characters in the dictionary, e.g. in English, it will be 52. The key here is to find a CNode object based on the character,

      Delete
    3. my topology sort code is based on wiki page, it is doing sorting and cycle detection, hope it works. you can do a lookup on wiki to find it out.

      Delete
    4. "But if we want to rewrite hashcode(), is it necessary that we rewrite equals() too?", no. There is nice chapter about this topic on Effective Java. You can take a look, i feel this is very important topic. Almost every java interview will ask about this and anything related to hash.

      Delete
  2. I read again. You use Map nodes, is it because you want to use nodes.get() function, yet Set doesn't has get() function. If it is like that, why you don't use a Map for vertices?

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. I read again. You use Map < Character, CNode> nodes, is it because you want to use nodes.get() function, yet Set doesn't has get() function. If it is like that, why you don't use a Map < Character, CNode> for vertices directly?

      Delete
    3. agree. i can use map directly. let me clean the code.

      Delete
  3. I think advantage of DFS for topological sorting is that it doesn't require in-degree for the node.

    ReplyDelete
    Replies
    1. exactly, i have another implementation which has to use in and out edges. but that solution looks clean. i will share it later this week.

      Delete
  4. Besides, I was thinking, if standard definition of tree has parent pointer, so many problem can be solved with O(1) space.

    ReplyDelete
    Replies
    1. ummm, maybe not O(1) space, but much easier.

      Delete
    2. but coding interview questions are usually tricky, in practice, i believe we can have parent points if that makes things easy, as long as space is not an issue.

      i found out most of tree or binary tree problems can be solved by recursive and dynamic program, going down the tree and then going up along with intermediate result, sometimes, we may need to going down again during the process of going up.

      Delete
    3. I said that, it's because in your another diameter tree post, you assume the node knows its parent like a graph. If the node doesn't know the parent, the solution would be like the problem of finding diameter in binary tree too.

      Delete
    4. no, no, the diameter tree, there is no parents or children relationship. just edge connecting two vertex. and we use adjacent list representation of graph, http://en.wikipedia.org/wiki/Adjacency_list

      Delete