Module gopy.dynamicProgramming.longest_common_subsequence

Longest Common Subsequence

If a set of sequences are given, the longest common subsequence problem is to find a common subsequence of all the sequences that is of maximal length.

The longest common subsequence problem is a classic computer science problem, the basis of data comparison programs such as the diff-utility, and has applications in bioinformatics. It is also widely used by revision control systems, such as SVN and Git, for reconciling multiple changes made to a revision-controlled collection of files.

Naïve Method

Let X be a sequence of length m and Y a sequence of length n. Check for every subsequence of X whether it is a subsequence of Y, and return the longest common subsequence found.

There are 2m subsequences of X. Testing sequences whether or not it is a subsequence of Y takes O(n) time. Thus, the naïve algorithm would take O(n2m) time.

Dynamic Programming

Let X = < x1, x2, x3,…, xm > and Y = < y1, y2, y3,…, yn > be the sequences. To compute the length of an element the following algorithm is used.

In this procedure, table C[m, n] is computed in row major order and another table B[m,n] is computed to construct optimal solution.

Algorithm:

LCS-Length-Table-Formulation (X, Y)
m := length(X) 
n := length(Y) 
for i = 1 to m do 
   C[i, 0] := 0 
for j = 1 to n do 
   C[0, j] := 0 
for i = 1 to m do 
   for j = 1 to n do 
      if xi = yj 
         C[i, j] := C[i - 1, j - 1] + 1 
         B[i, j] := ‘D’ 
      else 
         if C[i -1, j] ≥ C[i, j -1] 
            C[i, j] := C[i - 1, j] + 1 
            B[i, j] := ‘U’ 
         else 
         C[i, j] := C[i, j - 1]
         B[i, j] := ‘L’ 
return C and B

Analysis

To populate the table, the outer for loop iterates m times and the inner for loop iterates n times. Hence, the complexity of the algorithm is O(m, n), where m and n are the length of two strings.

Expand source code
"""
### Longest Common Subsequence

If a set of sequences are given, the longest common subsequence problem is to find a common subsequence of all the sequences that is of maximal length.

The longest common subsequence problem is a classic computer science problem, the basis of data comparison programs such as the diff-utility, and has applications in bioinformatics. It is also widely used by revision control systems, such as SVN and Git, for reconciling multiple changes made to a revision-controlled collection of files.

### Naïve Method

Let X be a sequence of length m and Y a sequence of length n. Check for every subsequence of X whether it is a subsequence of Y, and return the longest common subsequence found.

There are 2m subsequences of X. Testing sequences whether or not it is a subsequence of Y takes O(n) time. Thus, the naïve algorithm would take O(n2m) time.

### Dynamic Programming

Let X = < x1, x2, x3,…, xm > and Y = < y1, y2, y3,…, yn > be the sequences. To compute the length of an element the following algorithm is used.

In this procedure, table C[m, n] is computed in row major order and another table B[m,n] is computed to construct optimal solution.

### Algorithm:

```
LCS-Length-Table-Formulation (X, Y)
m := length(X) 
n := length(Y) 
for i = 1 to m do 
   C[i, 0] := 0 
for j = 1 to n do 
   C[0, j] := 0 
for i = 1 to m do 
   for j = 1 to n do 
      if xi = yj 
         C[i, j] := C[i - 1, j - 1] + 1 
         B[i, j] := ‘D’ 
      else 
         if C[i -1, j] ≥ C[i, j -1] 
            C[i, j] := C[i - 1, j] + 1 
            B[i, j] := ‘U’ 
         else 
         C[i, j] := C[i, j - 1]
         B[i, j] := ‘L’ 
return C and B
```

### Analysis

To populate the table, the outer for loop iterates m times and the inner for loop iterates n times. Hence, the complexity of the algorithm is O(m, n), where m and n are the length of two strings.
"""
def lcs(x,y):
    m= len(x)
    n= len(y)
    # print(m,n)
    k = [[0 for i in range(n)] for i in range(m)]
    for i in range(m):
        for j in range(n):
            if i==0 or y==0:
                # print(i,j)
                k[i][j] = 0
            if x[i-1]==y[j-1]:
                k[i][j] = 1+k[i-1][j-1]
            else:
                k[i][j] = max(k[i-1][j], k[i][j-1])
    
    return k[m-1][n-1]

Functions

def lcs(x, y)
Expand source code
def lcs(x,y):
    m= len(x)
    n= len(y)
    # print(m,n)
    k = [[0 for i in range(n)] for i in range(m)]
    for i in range(m):
        for j in range(n):
            if i==0 or y==0:
                # print(i,j)
                k[i][j] = 0
            if x[i-1]==y[j-1]:
                k[i][j] = 1+k[i-1][j-1]
            else:
                k[i][j] = max(k[i-1][j], k[i][j-1])
    
    return k[m-1][n-1]