Monday, February 16, 2015

Find repeating sub-sequences in a string

This is a problem given by Peng Li (http://allenlipeng47.com/PersonalPage/index/list/1/nkey)
Given a string , find if there is any sub-sequence that repeats itself. Here, sub-sequence can be a non-contiguous pattern, with the same relative order. 


1. abab <------yes, ab is repeated
2. abba <---- No, a and b follow different order
3. acbdaghfb <-------- yes there is a followed by b at two places
4. abcdacb <----- yes a followed by b twice
5. aacc< ----yes, aacc, the underline ‘ac’ and red ‘ac’ are the repeating sub-sequences.

We can reduce this problem to the problem of finding common sub-sequence amount two strings, in this case, the two strings are identical. Also note the diagonal values of the DP table will be the string itself or identical sub-sequences which we should not consider.

For example:

negative example: abba

before and after we set diagonal values to 0


len(sub-seq)
a
b
b
a
a
1
1
1
2
b
1
2
2
2
b
1
2
3
3
a
1
2
3
4

len(sub-seq)
a
b
b
a
a
0
0
0
1
b
0
0
1
1
b
0
1
0
1
a
1
1
1
0

positive example: abab



len(sub-seq)
a
b
a
b
a
0
0
1
1
b
0
0
1
2
a
1
1
0
2
b
1
2
2
0





public static boolean findLongestSubsequence(String s1, String s2){
 int len1 = s1.length(), len2 = s2.length();
//part1 build DP table bottom-up
 int[][] table = new int[len1+1][len2+1];
     
 for(int i = 1; i<len1+1; i++){
     for(int j=1; j<len2+1; j++){
          if(i!=j){ <=== can not consider the subsequence sharing the same characters
          if(s1.charAt(i-1) == s2.charAt(j-1))
            table[i][j] = table[i-1][j-1]+1;
          else
            table[i][j] = Math.max(table[i-1][j], table[i][j-1]);        
          }
     }
 } 

  for(int i = 0; i<len2+1; i++){
     if(table[len1][i]>1)
        return true;
  }  

return false;

}


2 comments:

  1. You could save a bit of time and unnecessary computation if you move the check for if(table[len1][i] > 1) within the for loop, so that you go ahead and return true as soon as you find an adequate sequence. Great solution overall though! Thanks for the post!

    ReplyDelete