算法说明:

计算机科学里,Boyer-Moore字符串搜索算法是一种非常高效的字符串搜索算法。它由Bob BoyerJ Strother Moore设计于1977年。此算法仅对搜索目标字符串(关键字)进行预处理,而非被搜索的字符串。虽然Boyer-Moore算法的执行时间同样线性依赖于被搜索字符串的大小,但是通常仅为其它算法的一小部分:它不需要对被搜索的字符串中的字符进行逐一比较,而会跳过其中某些部分。通常搜索关键字越长,算法速度越快。它的效率来自于这样的事实:对于每一次失败的匹配尝试,算法都能够使用这些信息来排除尽可能多的无法匹配的位置。

算法原理:

假设被检索文字列是“1234567890”,检索文字列是“MOORE”。简单的比较需要执行十次才得到结论不匹配。

被检索文字列:
第一次比较:M.... (M和1比较,不匹配)
第二次比较: M.... (M和2比较,不匹配)
第三次比较: M.... (M和3比较,不匹配)
...
第十次比较: M....(M和0比较,不匹配)

※未参与比较的文字用【.】占位。

BM算法只需要2次比较。

被检索文字列:
第一次比较:....E (E和5比较,不匹配,并且5不是MOORE中任何文字)
第二次比较: ....E (E和0比较,不匹配,并且0不是MOORE中任何文字)

第一次从检索文字的末尾开始,因为如果被检索文字的第5文字位置不是E,则无论前4个文字是什么,都绝不可能匹配了。这一点比较容易理解。 那么,为什么不用E和6比较呢?

这是BM算法又一处精妙之处。在E和5进行比较的时候不仅知道他们不相等,而且还知道了5不和检索文字MOORE中的任何一个文字相等,这使得下面这些比较都可以省略掉。

被检索文字列:..........
不需要的比较: ...R. (E和5比较时也同时发现5不等于R,于是这个比较是不必要的)
不需要的比较: ..O.. (E和5比较时也同时发现5不等于O,于是这个比较是不必要的)
不需要的比较: .O... (E和5比较时也同时发现5不等于O,于是这个比较是不必要的)
不需要的比较: M.... (E和5比较时也同时发现5不等于M,于是这个比较是不必要的)

下面附上自己写的代码:

代码的主要部分是步进数组的思想,将可显示字符对应的ASCII码和其步进长度形成一个对应关系,然后根据这个对应关系来指出遍历时的每个位置对应的字符所对应的步进长度.

#include <iostream>
#include <time.h>
using namespace std; //定义一个长字符串,以及他的大小
char CHARS[] = "beautiful, but men seldom realized it when caught by her charmas the Tarleton twins were. In her face were too sharply blended the delicate features of her mother,a Coast aristocrat of French descent, and the heavy ones of her florid Irish father. But it was anarresting face, pointed of chin, square of jaw. Her eyes were pale green without a touch of hazel,starred with bristly black lashes and slightly tilted at the ends. Above them, her thick black browsslanted upward, cutting a startling oblique line in her Seated with Stuart and Brent Tarleton in the cool shade of the porch of Tara, her father’splantation, that bright April afternoon of 1861, she made a pretty picture. Her new green flowered-muslin dress spread its twelve yards of billowing material over her hoops and exactly matched theflat-heeled green morocco slippers her father had recently brought her from Atlanta. The dress set off to perfection the seventeen-inch waist, the smallest in three counties, and the tightly fittingbasque showed breasts well matured for her sixteen years. But for all the modesty of her spreadingskirts, the demureness of hair netted smoothly into a chignon and the quietness of small whitehands folded in her lap, her true self was poorly concealed. The green eyes in the carefully sweetface were turbulent, willful, lusty with life, distinctly at variance with her decorous demeanor. Hermanners had been imposed upon her by her mother’s gentle admonitions and the sterner disciplineof her mammy; her eyes were her own.On either side of her, the twins lounged easily in their chairs, squinting at the sunlight throughtall mint-garnished glasses as they laughed and talked, their long legs, booted to the knee and thickwith saddle muscles, crossed negligently. Nineteen years old, six feet two inches tall, long of boneand hard of muscle, with sunburned faces and deep auburn hair, their eyes merry and arrogant,their bodies clothed in identical blue coats and mustard-colored breeches, they were as much alikeas two bolls of cotton.Outside, the late afternoon sun slanted down in the yard, throwing into gleaming brightness thedogwood trees that were solid masses of white blossoms against the background of new green. Thetwins’ horses were hitched in the driveway, big animals, red as their masters’ hair; and around thehorses’ legs quarreled the pack of lean, nervous possum hounds that accompanied Stuart and Brentwherever they went. A little aloof, as became an aristocrat, lay a black-spotted carriage dog,muzzle on paws, patiently waiting for the boys to go home to supper.Although born to the ease of plantation life, waited on hand and foot since infancy, the faces ofthe three on the porch were neither slack nor soft. They had the vigor and alertness of countrypeople who have spent all their lives in the open and troubled their heads very little with dullthings in books. Life in the north Georgia county of Clayton was still new and, according to thestandards of Augusta, Savannah and Charleston, a little crude. The more sedate and older sectionsof the South looked down their noses at the up-country Georgians, but here in north Georgia, alack of the niceties of classical education carried no shame, provided a man was smart in the thingsthat mattered. And raising good cotton, riding well, shooting straight, dancing lightly, squiring theladies with elegance and carrying one’s liquor like a gentleman were the things that mattered.It was for this precise reason that Stuart and Brent were idling on the porch of Tara this Aprilafternoon. They had just been expelled from the University of Georgia, the fourth university thathad thrown them out in two years; and their older brothers, Tom and Boyd, had come home withthem, because they refused to remain at an institution where the twins were not welcome. Stuartand Brent considered their latest expulsion a fine joke, and Scarlett, who had not willingly opened a book since leaving the Fayetteville Female Academy the year before, thought it just as amusingas they did.“I know you two don’t care about being expelled, or Tom either,” she said. “But what aboutBoyd? He’s kind of set on getting an education, and you two have pulled him out of the Universityof Virginia and Alabama and South Carolina and now Georgia. He’ll never get finished at thisrate.”She meant what she said, for she could never long endure any conversation of which she wasnot the chief subject. But she smiled when she spoke, consciously deepening her dimple andfluttering her bristly black lashes as swiftly as butterflies’ wings. The boys were enchanted, as shehad intended them to be, and they hastened to apologize for boring her. They thought none the lessof her for her lack of interest. Indeed, they thought more. War was men’s business, not ladies’, andthey took her attitude as evidence of her femininity.";
#define MAXSIZE strlen(CHARS) //定义搜索字符串,以及其长度
#define search "when"
int Ssize = strlen(search); //步进表,,大小为128,对应着ASCII码的个数
#define stepsize 512
int stepvalue[stepsize]; //制作步进表,即,所要搜索的字符串中每个元素所对应的步进的值,假如不属于里面的,则步进值为字符串的大小,
void maketable(char *keys)
{
int i = ;
//先将步进表里面所有的值置为关键字长度
for (;i < stepsize;i++)
{
stepvalue[i] = Ssize;
}
for (i = ;i <= Ssize-;i++)//由于是倒序搜索,所以第一个元素的步进值为Ssize-1,最后一个元素的步进值应该为Ssize,所以循环应该循环到倒数第二个数为止.
{
/*if (Ssize-i-1 < stepvalue[keys[i]])//假如步进值小于才赋值,比如texture这个关键字, t的前进值应该取后面的3而不是取前面的7,所以我们比较下,除非新的步进值小于原先的,才会赋值
{
stepvalue[keys[i]] = Ssize-i-1;
}*/
stepvalue[keys[i]] = Ssize-i-;//上面忘记考虑了一点,就是到后面重新赋值的肯定会比前面赋值的要小,所以不用进行比较,优化代码
}
} //比较两个字符串是否相等,true表示相等,false表示不相等,从开始处开始前进行比较,一共有size个元素
bool Comparechars(char *char_1,char *char_2,int size)
{
//bool check = true;
int i = ;
while (i <= size-)
{
//先查看指针是否有效,否则返回错误
if (char_1-i == NULL || char_2+size--i == NULL)
{
return false;//及时的返回,避免后面无意义的比较
}//进行比较
if (*(char_1-i) != *(char_2+size--i))
{
return false;
}
i++;
}
return true;
} //在search字符串中搜索是否有字符*a
/*返回值说明:
true:存在
false:不存在
*/
bool InSearch(char *a)
{
char temp[] = search;
for (int i = ;i <= Ssize-;i++)
{
if (*a == temp[i])
{
return true;
}
}
return false;
} int main()
{
int count_num = ;//用来计算共找到多少个指定字符串
clock_t start,end;//用于计时
unsigned int s = Ssize-;//遍历游标 maketable(search);//制造步进表 cout<<"第 ";//开始循环
start = clock() ;
while(s <= strlen(CHARS)-)
{
//假如当前位置是指定字符串
if (Comparechars(CHARS+s,search,Ssize))
{
count_num++;
cout<<s - Ssize+<<" ";
}
//不是指定字符串,按照步进表进行步进
if (CHARS[s] > || CHARS[s] < )
{
s += Ssize;
continue;
}
s += stepvalue[CHARS[s]];
}
//搜索完成
end = clock();
cout<<" 处找到指定字符串"<<endl;
cout <<"所取字符串大小为: "<<MAXSIZE<<"其中找到的字符串有"<<count_num<<"个"<<endl;
cout<<"总共花费了"<<(double)(end - start) / CLK_TCK<<"秒"<<endl;
return ;
}

Boyer- Moore

下面是查找when字符串的结果,可以看到这个性能刚刚的!(因为在计算的时间函数时用到了(double)(end - start)/CLK_TCK,由于数值很小,转换为double后精度丢失,为0,可见程序运行时间很短,算法的时间复杂度相当小)

看下when的

下面我们再看看查找the字符串的结果

我们可以看到随着字符串的串长减小,出现频率增多,所花费的时间越短,原因在于,串长越长,每次步进的长度越长,遍历完整个字符串所花费的时间越短,出现的频率越多,用于检验的时间越多,遍历的时间越多,Boyer-Moore算法在长字符串的核对方面性能相当强悍!!

字符串核对之Boyer-Moore算法的更多相关文章

  1. Boyer Moore算法(字符串匹配)

    上一篇文章,我介绍了KMP算法. 但是,它并不是效率最高的算法,实际采用并不多.各种文本编辑器的"查找"功能(Ctrl+F),大多采用Boyer-Moore算法. Boyer-Mo ...

  2. ipv4 ipv6 求字符串和整数一一映射的算法 AmazonOrderId

    字符串和整数一一映射的算法 公司每人的英文名不同,现在给每个英文名一个不同的数字编号,怎么设计? 走ipv4/6  2/32 2/128就够了,把“网段”概念对应到“表或库”,ip有a_e5类,这概念 ...

  3. Boyer–Moore (BM)字符串搜索算法

    在计算机科学里,Boyer-Moore字符串搜索算法是一种非常高效的字符串搜索算法.它由Bob Boyer和J Strother Moore设计于1977年.此算法仅对搜索目标字符串(关键字)进行预处 ...

  4. 字符串匹配算法之BF(Brute-Force)算法

    BF(Brute-Force)算法 蛮力搜索,比较简单的一种字符串匹配算法,在处理简单的数据时候就可以用这种算法,完全匹配,就是速度慢啊. 基本思想 从目标串s 的第一个字符起和模式串t的第一个字符进 ...

  5. Java数据结构和算法总结-字符串及高频面试题算法

    前言:周末闲来无事,在七月在线上看了看字符串相关算法的讲解视频,收货颇丰,跟着视频讲解简单做了一下笔记,方便以后翻阅复习同时也很乐意分享给大家.什么字符串在算法中有多重要之类的大路边上的客套话就不多说 ...

  6. 字符串(2)KMP算法

    给你两个字符串a(len[a]=n),b(len[b]=m),问b是否是a的子串,并且统计b在a中的出现次数,如果我们枚举a从什么位置与匹配,并且验证是否匹配,那么时间复杂度O(nm), 而n和m的范 ...

  7. Java数据结构和算法总结-字符串相关高频面试题算法

    前言:周末闲来无事,看了看字符串相关算法的讲解视频,收货颇丰,跟着视频讲解简单做了一下笔记,方便以后翻阅复习同时也很乐意分享给大家.什么字符串在算法中有多重要之类的大路边上的客套话就不多说了,直接上笔 ...

  8. 字符串模式匹配算法2 - AC算法

    上篇文章(http://www.cnblogs.com/zzqcn/p/3508442.html)里提到的BF和KMP算法都是单模式串匹配算法,也就是说,模式串只有一个.当需要在字符串中搜索多个关键字 ...

  9. 子字符串查找之————关于KMP算法你不知道的事

    写在前面: (阅读本文前需要了解KMP算法的基本思路.另外,本着大道至简的思想,本文的所有例子都会做从头到尾的讲解) 作者翻阅了大量网上现有的KMP算法博客,发现广为流传的竟然是一种不完整的KMP算法 ...

随机推荐

  1. Be a Smart Project Manager

    The key to being a smart project manager is to remember how you are going to manage your project, to ...

  2. UILabel 添加图片

    //设置显示图片 NSMutableAttributedString * cellAttributeStr = [[NSMutableAttributedString alloc]initWithSt ...

  3. iOS WIFI

    一.公共WIFI综述 现在很多公司都在做免费WIFI,车站.公交.地铁.餐厅,只要是人员密集流动的地方就有WIFI,免费WIFI从最初的网页认证方式也逐渐向客户端认证方式偏移.本文主要讨论iOS认证上 ...

  4. 弹窗插件layer

    layer的插件的地址:http://layer.layui.com/简单使用: layer.open({ type: , //page层 area: ['500px', '300px'], titl ...

  5. Android 使用Okhttp/Retrofit持久化cookie的简便方式

    首先cookie是什么就不多说了,还是不知道的话推荐看看这篇文章 Cookie/Session机制详解 深入解析Cookie技术 为什么要持久化cookie也不多说了,你能看到这篇文章代表你有这个需求 ...

  6. 最近在研究电台类app,分享2个源码大家一起讨论

    好像去年有一阵,电台类的app特别火爆,喜马拉雅和蜻蜓FM互相还撕逼.听老罗,听好好说话,都得在电台app里,所以我想研究研究这些app.我没那么多资源,只好从app的开发架构方面去研究. 我看api ...

  7. 启用Service Broker

    2015-10-20 17:31 整理,未发布数据库邮件配置向导,在选择配置任务页面点击下一步时,弹出"数据库邮件依赖于 Service Broker...".点击是,整个SSMS ...

  8. javascript设计模式学习之十七——程序设计原则与面向接口编程

    一.编程设计原则 1)单一职责原则(SRP): 这里的职责是指“引起变化的原因”:单一职责原则体现为:一个对象(方法)只做一件事. 事实上,未必要在任何时候都一成不变地遵守原则,实际开发中,因为种种原 ...

  9. cocos2dx 3.x以上版本搭建Mac环境(百分百可行)

    近期由于工作的原因,有机会接触了游戏行业,说实话,本人学程序最原始的初衷就是想做游戏,于是就创建了一篇cocos2d-x的分类来记录我在学习cocos2d-x的成长过程. 首先第一篇,想学cocos2 ...

  10. NET4.5之初识async与await

    这是两个关键字,用于异步编程.我们传统的异步编程方式一般是Thread.ThreadPool.BeginXXX.EndXXX等等.把调用.回调分开来,代码的逻辑是有跳跃的,于是会导致思路不是很清晰的问 ...