最长公共字串算法, 文本比较算法, longest common subsequence(LCS) algorithm
- '''
- merge two configure files, basic file is aFile
- insert the added content of bFile compare to aFile
- for example, 'bbb' is added content
- -----------------------------------------------------------
- a file content | b file content | c merged file content
- 111 | 111 | 111
- aaa | bbb | aaa
- | | bbb
- 222 | 222 | 222
- ------------------------------------------------------------
- '''
- def mergeFiles(aPath, bPath, cPath):
- with open(aPath, 'r') as f:
- aLines = f.readlines();
- aLines = [ line.strip() + '\n' for line in aLines]
- with open(bPath, 'r') as f:
- bLines = f.readlines();
- bLines = [ line.strip() + '\n' for line in bLines]
- cLines = mergeSequences(aLines, bLines)
- with open(cPath, 'w') as f:
- for line in cLines:
- f.write(line)
- '''
- merge the sequence
- '''
- def mergeSequences(aLines, bLines):
- record = {}
- lcs = findLCS(record, aLines, 0, bLines, 0)
- currA = currB = 0
- merged = []
- for (line, aI, bI) in lcs:
- # add deleted
- if aI > currA:
- merged.extend(aLines[currA:aI])
- currA = aI + 1
- # add added
- if bI > currB:
- merged.extend(bLines[currB:bI])
- currB = bI + 1
- # add common
- merged.append(line)
- if currA < len(aLines):
- merged.extend(aLines[currA:])
- if currB < len(bLines):
- merged.extend(bLines[currB:])
- return merged
- '''
- find Longest common subsequence
- return list of (line, x, y)
- line is common line, x is the index in aLines, y is the index in bLines
- TODO: eliminate recursive invoke, use dynamic algorithm
- '''
- def findLCS(record, aLines, aStart, bLines, bStart):
- key = lcsKey(aStart, bStart)
- if record.has_key(key):
- return record[key]
- aL = aLines[aStart:]
- bL = bLines[bStart:]
- if len(aL) > 0 and len(bL) > 0:
- if aL[0] == bL[0]:
- lsc = [(aL[0], aStart, bStart)]
- lsc.extend(findLCS(record, aLines, aStart + 1, bLines, bStart + 1))
- record[key] = lsc
- return lsc
- else:
- aLsc = findLCS(record, aLines, aStart, bLines, bStart + 1)
- bLsc = findLCS(record, aLines, aStart + 1, bLines, bStart)
- if len(aLsc) > len(bLsc):
- record[key] = aLsc
- return aLsc
- else:
- record[key] = bLsc
- return bLsc
- else:
- return []
- Code
最长公共字串算法, 文本比较算法, longest common subsequence(LCS) algorithm的更多相关文章
- 最长公共子序列与最长公共字串 (dp)转载http://blog.csdn.net/u012102306/article/details/53184446
1. 问题描述 子串应该比较好理解,至于什么是子序列,这里给出一个例子:有两个母串 cnblogs belong 比如序列bo, bg, lg在母串cnblogs与belong中都出现过并且出现顺序与 ...
- URAL 1517 Freedom of Choice(后缀数组,最长公共字串)
题目 输出最长公共字串 #define maxn 200010 int wa[maxn],wb[maxn],wv[maxn],ws[maxn]; int cmp(int *r,int a,int b, ...
- (字符串)最长公共字串(Longest-Common-SubString,LCS)
题目: 给定两个字符串X,Y,求二者最长的公共子串,例如X=[aaaba],Y=[abaa].二者的最长公共子串为[aba],长度为3. 子序列是不要求连续的,字串必须是连续的. 思路与代码: 1.简 ...
- 动态规划求最长公共子序列(Longest Common Subsequence, LCS)
1. 问题描述 子串应该比较好理解,至于什么是子序列,这里给出一个例子:有两个母串 cnblogs belong 比如序列bo, bg, lg在母串cnblogs与belong中都出现过并且出现顺序与 ...
- poj 3080 kmp求解多个字符串的最长公共字串,(数据小,有点小暴力 16ms)
Blue Jeans Time Limit: 1000MS Memory Limit: 65536K Total Submissions: 14113 Accepted: 6260 Descr ...
- java_基础知识_字符串练习题_计算两个字符串的最长公共字串长度
package tek; Java算法——求出两个字符串的最长公共字符串 /** * @Title: 问题:有两个字符串str1和str2,求出两个字符串中最长公共字符串. * @author 匹夫( ...
- 【水:最长公共子序列】【HDU1159】【Common Subsequence】
Common Subsequence Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 65536/32768 K (Java/Other ...
- 动态规划 ---- 最长公共子序列(Longest Common Subsequence, LCS)
分析: 完整代码: // 最长公共子序列 #include <stdio.h> #include <algorithm> using namespace std; ; char ...
- HDU 1423 最长公共字串+上升子序列
http://acm.hdu.edu.cn/showproblem.php?pid=1423 在前一道题的基础上多了一次筛选 要选出一个最长的递增数列 lower_bound()函数很好用,二分搜索找 ...
随机推荐
- POJ 3348 Cows | 凸包模板题
题目: 给几个点,用绳子圈出最大的面积养牛,输出最大面积/50 题解: Graham凸包算法的模板题 下面给出做法 1.选出x坐标最小(相同情况y最小)的点作为极点(显然他一定在凸包上) 2.其他点进 ...
- CORS跨域cookie传递
服务端 Access-Control-Allow-Credentials:true Access-Control-Allow-Methods:* Access-Control-Allow-Origin ...
- 枪战(maf)
枪战(maf) settle the dispute. Negotiations were very tense, and at one point the trigger-happy partici ...
- win8.1 安装sql2008 启动报错.net framework 应用程序的组件中发生了异常unable to read the previous list
解决方案:删除 RegSrvr.xml C:\Documents and Settings\Administrator\Application Data\Microsoft\Microsoft SQL ...
- ping(NOIP模拟赛Round 4)第一次程序Rank 1!撒花庆祝!~\(≧▽≦)/~
题目: 恩,就是裸的字符串处理啦. 连标程都打的是暴力(随机数据太水啦!吐槽.) 本来O(n^2q)TLE好吧.. 然后我发明了一种神奇的算法,随机数据跑的很快!,当然最坏复杂度跟标程一样啦. 不过期 ...
- 转圈游戏(NOIP2013)
原题传送门 好吧,这道题很水,, 首先我们一看,这就是一道快速幂的题目,k那么大... 然后第X个人的答案就是(x+m*10^k)%n啦!! 好吧,这道题没有什么注意事项 太水了 #include&l ...
- 100行代码实现最简单的基于FFMPEG+SDL的视频播放器(SDL1.x)【转】
转自:http://blog.csdn.net/leixiaohua1020/article/details/8652605 版权声明:本文为博主原创文章,未经博主允许不得转载. 目录(?)[-] ...
- usb驱动---What is the difference between /dev/ttyUSB and /dev/ttyACM【转】
转自:http://blog.csdn.net/ppp2006/article/details/25654733 https://www.rfc1149.net/blog/2013/03/05/wha ...
- Centos 6.3nginx安装
1. 增加源: vi /etc/yum.repos.d/nginx.repo CentOS: [nginx] name=nginx repo baseurl=http://nginx.org/pack ...
- 设计模式-python实现
设计模式是什么? 设计模式是经过总结.优化的,对我们经常会碰到的一些编程问题的可重用解决方案.一个设计模式并不像一个类或一个库那样能够直接作用于我们的代码.反之,设计模式更为高级,它是一种必须在特定情 ...