算法说明:

计算机科学里,Boyer-Moore字符串搜索算法是一种非常高效的字符串搜索算法。它由Bob BoyerJ Strother Moore设计于1977年。此算法仅对搜索目标字符串(关键字)进行预处理,而非被搜索的字符串。虽然Boyer-Moore算法的执行时间同样线性依赖于被搜索字符串的大小,但是通常仅为其它算法的一小部分:它不需要对被搜索的字符串中的字符进行逐一比较,而会跳过其中某些部分。通常搜索关键字越长,算法速度越快。它的效率来自于这样的事实:对于每一次失败的匹配尝试,算法都能够使用这些信息来排除尽可能多的无法匹配的位置。

算法原理:

假设被检索文字列是“1234567890”,检索文字列是“MOORE”。简单的比较需要执行十次才得到结论不匹配。

被检索文字列:
第一次比较:M.... (M和1比较,不匹配)
第二次比较: M.... (M和2比较,不匹配)
第三次比较: M.... (M和3比较,不匹配)
...
第十次比较: M....(M和0比较,不匹配)

※未参与比较的文字用【.】占位。

BM算法只需要2次比较。

被检索文字列:
第一次比较:....E (E和5比较,不匹配,并且5不是MOORE中任何文字)
第二次比较: ....E (E和0比较,不匹配,并且0不是MOORE中任何文字)

第一次从检索文字的末尾开始,因为如果被检索文字的第5文字位置不是E,则无论前4个文字是什么,都绝不可能匹配了。这一点比较容易理解。 那么,为什么不用E和6比较呢?

这是BM算法又一处精妙之处。在E和5进行比较的时候不仅知道他们不相等,而且还知道了5不和检索文字MOORE中的任何一个文字相等,这使得下面这些比较都可以省略掉。

被检索文字列:..........
不需要的比较: ...R. (E和5比较时也同时发现5不等于R,于是这个比较是不必要的)
不需要的比较: ..O.. (E和5比较时也同时发现5不等于O,于是这个比较是不必要的)
不需要的比较: .O... (E和5比较时也同时发现5不等于O,于是这个比较是不必要的)
不需要的比较: M.... (E和5比较时也同时发现5不等于M,于是这个比较是不必要的)

下面附上自己写的代码:

代码的主要部分是步进数组的思想,将可显示字符对应的ASCII码和其步进长度形成一个对应关系,然后根据这个对应关系来指出遍历时的每个位置对应的字符所对应的步进长度.

#include <iostream>
#include <time.h>
using namespace std; //定义一个长字符串,以及他的大小
char CHARS[] = "beautiful, but men seldom realized it when caught by her charmas the Tarleton twins were. In her face were too sharply blended the delicate features of her mother,a Coast aristocrat of French descent, and the heavy ones of her florid Irish father. But it was anarresting face, pointed of chin, square of jaw. Her eyes were pale green without a touch of hazel,starred with bristly black lashes and slightly tilted at the ends. Above them, her thick black browsslanted upward, cutting a startling oblique line in her Seated with Stuart and Brent Tarleton in the cool shade of the porch of Tara, her father’splantation, that bright April afternoon of 1861, she made a pretty picture. Her new green flowered-muslin dress spread its twelve yards of billowing material over her hoops and exactly matched theflat-heeled green morocco slippers her father had recently brought her from Atlanta. The dress set off to perfection the seventeen-inch waist, the smallest in three counties, and the tightly fittingbasque showed breasts well matured for her sixteen years. But for all the modesty of her spreadingskirts, the demureness of hair netted smoothly into a chignon and the quietness of small whitehands folded in her lap, her true self was poorly concealed. The green eyes in the carefully sweetface were turbulent, willful, lusty with life, distinctly at variance with her decorous demeanor. Hermanners had been imposed upon her by her mother’s gentle admonitions and the sterner disciplineof her mammy; her eyes were her own.On either side of her, the twins lounged easily in their chairs, squinting at the sunlight throughtall mint-garnished glasses as they laughed and talked, their long legs, booted to the knee and thickwith saddle muscles, crossed negligently. Nineteen years old, six feet two inches tall, long of boneand hard of muscle, with sunburned faces and deep auburn hair, their eyes merry and arrogant,their bodies clothed in identical blue coats and mustard-colored breeches, they were as much alikeas two bolls of cotton.Outside, the late afternoon sun slanted down in the yard, throwing into gleaming brightness thedogwood trees that were solid masses of white blossoms against the background of new green. Thetwins’ horses were hitched in the driveway, big animals, red as their masters’ hair; and around thehorses’ legs quarreled the pack of lean, nervous possum hounds that accompanied Stuart and Brentwherever they went. A little aloof, as became an aristocrat, lay a black-spotted carriage dog,muzzle on paws, patiently waiting for the boys to go home to supper.Although born to the ease of plantation life, waited on hand and foot since infancy, the faces ofthe three on the porch were neither slack nor soft. They had the vigor and alertness of countrypeople who have spent all their lives in the open and troubled their heads very little with dullthings in books. Life in the north Georgia county of Clayton was still new and, according to thestandards of Augusta, Savannah and Charleston, a little crude. The more sedate and older sectionsof the South looked down their noses at the up-country Georgians, but here in north Georgia, alack of the niceties of classical education carried no shame, provided a man was smart in the thingsthat mattered. And raising good cotton, riding well, shooting straight, dancing lightly, squiring theladies with elegance and carrying one’s liquor like a gentleman were the things that mattered.It was for this precise reason that Stuart and Brent were idling on the porch of Tara this Aprilafternoon. They had just been expelled from the University of Georgia, the fourth university thathad thrown them out in two years; and their older brothers, Tom and Boyd, had come home withthem, because they refused to remain at an institution where the twins were not welcome. Stuartand Brent considered their latest expulsion a fine joke, and Scarlett, who had not willingly opened a book since leaving the Fayetteville Female Academy the year before, thought it just as amusingas they did.“I know you two don’t care about being expelled, or Tom either,” she said. “But what aboutBoyd? He’s kind of set on getting an education, and you two have pulled him out of the Universityof Virginia and Alabama and South Carolina and now Georgia. He’ll never get finished at thisrate.”She meant what she said, for she could never long endure any conversation of which she wasnot the chief subject. But she smiled when she spoke, consciously deepening her dimple andfluttering her bristly black lashes as swiftly as butterflies’ wings. The boys were enchanted, as shehad intended them to be, and they hastened to apologize for boring her. They thought none the lessof her for her lack of interest. Indeed, they thought more. War was men’s business, not ladies’, andthey took her attitude as evidence of her femininity.";
#define MAXSIZE strlen(CHARS) //定义搜索字符串,以及其长度
#define search "when"
int Ssize = strlen(search); //步进表,,大小为128,对应着ASCII码的个数
#define stepsize 512
int stepvalue[stepsize]; //制作步进表,即,所要搜索的字符串中每个元素所对应的步进的值,假如不属于里面的,则步进值为字符串的大小,
void maketable(char *keys)
{
int i = ;
//先将步进表里面所有的值置为关键字长度
for (;i < stepsize;i++)
{
stepvalue[i] = Ssize;
}
for (i = ;i <= Ssize-;i++)//由于是倒序搜索,所以第一个元素的步进值为Ssize-1,最后一个元素的步进值应该为Ssize,所以循环应该循环到倒数第二个数为止.
{
/*if (Ssize-i-1 < stepvalue[keys[i]])//假如步进值小于才赋值,比如texture这个关键字, t的前进值应该取后面的3而不是取前面的7,所以我们比较下,除非新的步进值小于原先的,才会赋值
{
stepvalue[keys[i]] = Ssize-i-1;
}*/
stepvalue[keys[i]] = Ssize-i-;//上面忘记考虑了一点,就是到后面重新赋值的肯定会比前面赋值的要小,所以不用进行比较,优化代码
}
} //比较两个字符串是否相等,true表示相等,false表示不相等,从开始处开始前进行比较,一共有size个元素
bool Comparechars(char *char_1,char *char_2,int size)
{
//bool check = true;
int i = ;
while (i <= size-)
{
//先查看指针是否有效,否则返回错误
if (char_1-i == NULL || char_2+size--i == NULL)
{
return false;//及时的返回,避免后面无意义的比较
}//进行比较
if (*(char_1-i) != *(char_2+size--i))
{
return false;
}
i++;
}
return true;
} //在search字符串中搜索是否有字符*a
/*返回值说明:
true:存在
false:不存在
*/
bool InSearch(char *a)
{
char temp[] = search;
for (int i = ;i <= Ssize-;i++)
{
if (*a == temp[i])
{
return true;
}
}
return false;
} int main()
{
int count_num = ;//用来计算共找到多少个指定字符串
clock_t start,end;//用于计时
unsigned int s = Ssize-;//遍历游标 maketable(search);//制造步进表 cout<<"第 ";//开始循环
start = clock() ;
while(s <= strlen(CHARS)-)
{
//假如当前位置是指定字符串
if (Comparechars(CHARS+s,search,Ssize))
{
count_num++;
cout<<s - Ssize+<<" ";
}
//不是指定字符串,按照步进表进行步进
if (CHARS[s] > || CHARS[s] < )
{
s += Ssize;
continue;
}
s += stepvalue[CHARS[s]];
}
//搜索完成
end = clock();
cout<<" 处找到指定字符串"<<endl;
cout <<"所取字符串大小为: "<<MAXSIZE<<"其中找到的字符串有"<<count_num<<"个"<<endl;
cout<<"总共花费了"<<(double)(end - start) / CLK_TCK<<"秒"<<endl;
return ;
}

Boyer- Moore

下面是查找when字符串的结果,可以看到这个性能刚刚的!(因为在计算的时间函数时用到了(double)(end - start)/CLK_TCK,由于数值很小,转换为double后精度丢失,为0,可见程序运行时间很短,算法的时间复杂度相当小)

看下when的

下面我们再看看查找the字符串的结果

我们可以看到随着字符串的串长减小,出现频率增多,所花费的时间越短,原因在于,串长越长,每次步进的长度越长,遍历完整个字符串所花费的时间越短,出现的频率越多,用于检验的时间越多,遍历的时间越多,Boyer-Moore算法在长字符串的核对方面性能相当强悍!!

字符串核对之Boyer-Moore算法的更多相关文章

  1. Boyer Moore算法(字符串匹配)

    上一篇文章,我介绍了KMP算法. 但是,它并不是效率最高的算法,实际采用并不多.各种文本编辑器的"查找"功能(Ctrl+F),大多采用Boyer-Moore算法. Boyer-Mo ...

  2. ipv4 ipv6 求字符串和整数一一映射的算法 AmazonOrderId

    字符串和整数一一映射的算法 公司每人的英文名不同,现在给每个英文名一个不同的数字编号,怎么设计? 走ipv4/6  2/32 2/128就够了,把“网段”概念对应到“表或库”,ip有a_e5类,这概念 ...

  3. Boyer–Moore (BM)字符串搜索算法

    在计算机科学里,Boyer-Moore字符串搜索算法是一种非常高效的字符串搜索算法.它由Bob Boyer和J Strother Moore设计于1977年.此算法仅对搜索目标字符串(关键字)进行预处 ...

  4. 字符串匹配算法之BF(Brute-Force)算法

    BF(Brute-Force)算法 蛮力搜索,比较简单的一种字符串匹配算法,在处理简单的数据时候就可以用这种算法,完全匹配,就是速度慢啊. 基本思想 从目标串s 的第一个字符起和模式串t的第一个字符进 ...

  5. Java数据结构和算法总结-字符串及高频面试题算法

    前言:周末闲来无事,在七月在线上看了看字符串相关算法的讲解视频,收货颇丰,跟着视频讲解简单做了一下笔记,方便以后翻阅复习同时也很乐意分享给大家.什么字符串在算法中有多重要之类的大路边上的客套话就不多说 ...

  6. 字符串(2)KMP算法

    给你两个字符串a(len[a]=n),b(len[b]=m),问b是否是a的子串,并且统计b在a中的出现次数,如果我们枚举a从什么位置与匹配,并且验证是否匹配,那么时间复杂度O(nm), 而n和m的范 ...

  7. Java数据结构和算法总结-字符串相关高频面试题算法

    前言:周末闲来无事,看了看字符串相关算法的讲解视频,收货颇丰,跟着视频讲解简单做了一下笔记,方便以后翻阅复习同时也很乐意分享给大家.什么字符串在算法中有多重要之类的大路边上的客套话就不多说了,直接上笔 ...

  8. 字符串模式匹配算法2 - AC算法

    上篇文章(http://www.cnblogs.com/zzqcn/p/3508442.html)里提到的BF和KMP算法都是单模式串匹配算法,也就是说,模式串只有一个.当需要在字符串中搜索多个关键字 ...

  9. 子字符串查找之————关于KMP算法你不知道的事

    写在前面: (阅读本文前需要了解KMP算法的基本思路.另外,本着大道至简的思想,本文的所有例子都会做从头到尾的讲解) 作者翻阅了大量网上现有的KMP算法博客,发现广为流传的竟然是一种不完整的KMP算法 ...

随机推荐

  1. Progress Reporting

    Progress reporting is a key activity of project management. The project manager issues regular repor ...

  2. mysql中varchar(50)最多能存多少个汉字

    首先要确定mysql版本4.0版本以下,varchar(50),指的是50字节,如果存放UTF8汉字时,只能存16个(每个汉字3字节) 5.0版本以上,varchar(50),指的是50字符,无论存放 ...

  3. C#中派生类调用基类构造函数用法分析

    这里的默认构造函数是指在没有编写构造函数的情况下系统默认的无参构造函数 1.当基类中没有自己编写构造函数时,派生类默认的调用基类的默认构造函数例如: ? 1 2 3 4 5 6 7 8 9 10 11 ...

  4. Web设计者和开发者必备的28个Chrome插件

    摘要 对于许多Web设计者和开发者来说,Firefox浏览器是无法超越的,对于其他人Chrome正在蚕食Firefox的浏览器市场. 在过去的两年,谷歌Chrome浏览器的发布以来,引起了人们激烈争论 ...

  5. CentOS设置虚拟网卡做NAT方式和Bridge方式桥接

    CentOS设置虚拟网卡做NAT方式和Bridge方式桥接 http://www.centoscn.com/CentOS/config/2015/0225/4736.html 摘要:KVM虚拟机网络配 ...

  6. JBOSS安全配置

    1.jmx-console登录的用户名和密码设置 默认情况访问http://localhost:8080/jmx-console就可以浏览jboss的部署管理的一些信息,不需要输入用户名和密码,使用起 ...

  7. ubantu install chrome

    ubantu apt-get installt -y openssh-server sudo apt-get -f install libappindicator1 libindicator7dpkg ...

  8. 关于OC语法的公开和私有的讨论

    关于OC语法的公开和私有的讨论 OC语法中,类的.h文件向外面暴露类的功能/方法接口,其中定义的属性/方法/协议/类别/类扩展都属于公开的,但实例变量要看限定词(@protected/@public/ ...

  9. [g2o]一个备忘

    g2o使用的一个备忘 位姿已知,闭环的帧已知,进行图优化. #include "stdafx.h" #include <vector> #include "P ...

  10. [QT]抄—影像显示实验

    QtCreator新建一个Qt Application,命名为ImageView 在项目文件夹下添加gdal库,统一放在ImageView\gdal目录下. 右键单击项目,选择添加库命令,添加gdal ...