编辑距离算法-DP问题
Levenshtein Distance
The Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.
Example
For example, the Levenshtein distance between kitten
and sitting
is 3
, since the following three edits change one into the other, and there is no way to do it with fewer than three edits:
- kitten → sitten (substitution of "s" for "k")
- sitten → sittin (substitution of "i" for "e")
- sittin → sitting (insertion of "g" at the end)
思路:
Let’s take a simple example of finding minimum edit distance between strings ME
and MY
. Intuitively you already know that minimum edit distance here is 1
operation and this operation. And it is replacing E
with Y
. But let’s try to formalize it in a form of the algorithm in order to be able to do more complex examples like transforming Saturday
into Sunday
.
To apply the mathematical formula mentioned above to ME → MY
transformation we need to know minimum edit distances of ME → M
, M → MY
and M → M
transformations in prior. Then we will need to pick the minimum one and add one operation to transform last letters E → Y
. So minimum edit distance of ME → MY
transformation is being calculated based on three previously possible transformations.
To explain this further let’s draw the following matrix:
- Cell
(0:1)
contains red number 1. It means that we need 1 operation to transformM
to an empty string. And it is by deletingM
. This is why this number is red. - Cell
(0:2)
contains red number 2. It means that we need 2 operations to transformME
to an empty string. And it is by deletingE
andM
. - Cell
(1:0)
contains green number 1. It means that we need 1 operation to transform an empty string toM
. And it is by insertingM
. This is why this number is green. - Cell
(2:0)
contains green number 2. It means that we need 2 operations to transform an empty string toMY
. And it is by insertingY
andM
. - Cell
(1:1)
contains number 0. It means that it costs nothing to transformM
intoM
. - Cell
(1:2)
contains red number 1. It means that we need 1 operation to transformME
toM
. And it is by deletingE
. - And so on...
This looks easy for such small matrix as ours (it is only 3x3
). But here you may find basic concepts that may be applied to calculate all those numbers for bigger matrices (let’s say a 9x7
matrix for Saturday → Sunday
transformation).
According to the formula you only need three adjacent cells (i-1:j)
, (i-1:j-1)
, and (i:j-1)
to calculate the number for current cell (i:j)
. All we need to do is to find the minimum of those three cells and then add 1
in case if we have different letters in i
's row and j
's column.如果等的话,找最小就好。
代码如下:
/**
* @param {string} a
* @param {string} b
* @return {number}
*/
export default function levenshteinDistance(a, b) {
// Create empty edit distance matrix for all possible modifications of
// substrings of a to substrings of b.
const distanceMatrix = Array(b.length + ).fill(null).map(() => Array(a.length + ).fill(null)); // Fill the first row of the matrix.
// If this is first row then we're transforming empty string to a.
// In this case the number of transformations equals to size of a substring.
for (let i = ; i <= a.length; i += ) {
distanceMatrix[][i] = i;
} // Fill the first column of the matrix.
// If this is first column then we're transforming empty string to b.
// In this case the number of transformations equals to size of b substring.
for (let j = ; j <= b.length; j += ) {
distanceMatrix[j][] = j;
} for (let j = ; j <= b.length; j += ) {
for (let i = ; i <= a.length; i += ) {
const indicator = a[i - ] === b[j - ] ? : ;
distanceMatrix[j][i] = Math.min(
distanceMatrix[j][i - ] + , // deletion
distanceMatrix[j - ][i] + , // insertion
distanceMatrix[j - ][i - ] + indicator, // substitution
);
}
} return distanceMatrix[b.length][a.length];
}
编辑距离算法-DP问题的更多相关文章
- 用C#实现字符串相似度算法(编辑距离算法 Levenshtein Distance)
在搞验证码识别的时候需要比较字符代码的相似度用到"编辑距离算法",关于原理和C#实现做个记录. 据百度百科介绍: 编辑距离,又称Levenshtein距离(也叫做Edit Dist ...
- [转]字符串相似度算法(编辑距离算法 Levenshtein Distance)
转自:http://www.sigvc.org/bbs/forum.php?mod=viewthread&tid=981 http://www.cnblogs.com/ivanyb/archi ...
- Levenshtein Distance算法(编辑距离算法)
编辑距离 编辑距离(Edit Distance),又称Levenshtein距离,是指两个字串之间,由一个转成另一个所需的最少编辑操作次数.许可的编辑操作包括将一个字符替换成另一个字符,插入一个字符, ...
- 字符串相似度算法(编辑距离算法 Levenshtein Distance)(转)
在搞验证码识别的时候需要比较字符代码的相似度用到“编辑距离算法”,关于原理和C#实现做个记录. 据百度百科介绍: 编辑距离,又称Levenshtein距离(也叫做Edit Distance),是指两个 ...
- 字符串相似度算法(编辑距离算法 Levenshtein Distance)
在搞验证码识别的时候需要比较字符代码的相似度用到“编辑距离算法”,关于原理和C#实现做个记录.据百度百科介绍:编辑距离,又称Levenshtein距离(也叫做Edit Distance),是指两个字串 ...
- 自然语言处理(5)之Levenshtein最小编辑距离算法
自然语言处理(5)之Levenshtein最小编辑距离算法 题记:之前在公司使用Levenshtein最小编辑距离算法来实现相似车牌的计算的特性开发,正好本节来总结下Levenshtein最小编辑距离 ...
- 编辑距离算法(Levenshtein)
编辑距离定义: 编辑距离,又称Levenshtein距离,是指两个字串之间,由一个转成另一个所需的最少编辑操作次数. 许可的编辑操作包括:将一个字符替换成另一个字符,插入一个字符,删除一个字符. 例如 ...
- Java实现编辑距离算法
Java实现编辑距离算法 编辑距离,又称Levenshtein距离(莱文斯坦距离也叫做Edit Distance),是指两个字串之间,由一个转成另一个所需的最少编辑操作次数,如果它们的距离越大,说明它 ...
- Levenshtein distance 编辑距离算法
这几天再看 virtrual-dom,关于两个列表的对比,讲到了 Levenshtein distance 距离,周末抽空做一下总结. Levenshtein Distance 介绍 在信息理论和计算 ...
随机推荐
- Excel Old format or invalid type library 错误原因
Old format or invalid type library 错误原因 调用excel方法失败,Old format or invalid type library 解决方案: 1,这是Exc ...
- G6:AntV 的图可视化与图分析
导读 G6 是 AntV 旗下的一款专业级图可视化引擎,它在高定制能力的基础上,提供简单.易用的接口以及一系列设计优雅的图可视化解决方案,是阿里经济体图可视化与图分析的基础设施.今年 AntV 11. ...
- 67)vector的begin() end() 和 front() back()的区别 rbegin() rend()
1) ·············· 2)`````````v1.begin() 和v1.end() 是作为迭代器v1的 第一个位置 和 最后一个元素的下一个位置. `````````````v1. ...
- 吴裕雄--天生自然ShellX学习笔记:Shell 变量
定义变量时,变量名不加美元符号($,PHP语言中变量需要),如: your_name="runoob.com" 注意,变量名和等号之间不能有空格,这可能和你熟悉的所有编程语言都不一 ...
- 吴裕雄--天生自然 PYTHON3开发学习:基础语法
#!/usr/bin/python3 # 第一个注释 print ("Hello, Python!") # 第二个注释 #!/usr/bin/python3 # 第一个注释 # 第 ...
- error: snap "electronic-wechat" has "install-snap" change in progress
今天因为要使用 wechat ,但是因为 wechat 并没有官方的 Ubuntu 版本,幸好有大神出了 electronic-wechat ,可以直接在应用商店中搜到,然后直接安装,也可以命令行安装 ...
- Postgresql的导表
背景 前面已经介绍了常用的备份与恢复了,接下来介绍一下导表. 正文 很多情况,会有把数据导出的需求,轻重缓急总会有特别紧急的情况,但是又不是专业干db的人,还是记录下来,以防不时之需. 针对于导表,个 ...
- 记一次修复Windows
打开了一堆网页和应用,然后桌面不见了.. 于是很着急..就按各种快捷键..Win+R挂了.. 本来以为要reboot(机房电脑) , 然后问老师发现会格式化 然后发现Ctrl+Alt+Delete还存 ...
- typescript-学习使用ts-1
Hello World 新建 greeter.ts 并写入以下内容: function greeter(person) { return "Hello, " + person; } ...
- 二、Shell脚本高级编程实战第二部
一.什么是变量? 变量就是一个固定的字符串替代更多更复杂的内容,当然内容里面可能还有变量.路径.字符串等等内容,最大的特点就是方便,更好开展工作 1.变量有环境变量(全局变量)和局部变量 环境变量就是 ...