【POJ3415】 Common Substrings（后缀数组|SAM）

Common Substrings

Description

A substring of a string T is defined as:

T(i, k)=T_iT_i₊₁...T_i+k_-1, 1≤i≤i+k-1≤|T|.

Given two strings A, B and one integer K, we define S, a set of triples (i, j, k):

S = {(i, j, k) | k≥K, A(i, k)=B(j, k)}.

You are to give the value of |S| for specific A, B and K.

Input

The input file contains several blocks of data. For each block, the first line contains one integer K, followed by two lines containing strings A and B, respectively. The input file is ended by K=0.

1 ≤ |A|, |B| ≤ 10⁵
1 ≤ K ≤ min{|A|, |B|}
Characters of A and B are all Latin letters.

Output

For each case, output an integer |S|.

Sample Input
2

aababaa

abaabaa

1

xx

xx

0
Sample Output
22

5

【题意】

　　给两个长度不超过100000的字符串A和B, 求三元组(i, j, k)的数量, 即满足A串从i开始的后缀与B串从j开始的后缀有长度为k的公共前缀, 还要求k不小于某个给你的数K.

【分析】

　　如果i后缀与j后缀的LCP长度为L, 在L不小于K的情况下, 它对答案的贡献为L - K + 1.

　　所以问题就是快速求出两个串的后缀的LCP。

　　就要用到后缀数组，把两个串拼成一个串，中间加一个特殊字符，然后做后缀数组。

　　求出height数组后根据k分组（min大于等于k的分在一组），同一组的LCP一定大于等于k。（height数组的性质啦，这个很基本）

　　接下来就是求出sum｛L-K+1｝，如果直接暴力的话for2遍加一个RMQ也很慢。

　　然后看到大神们说用什么单调栈，哦~~ 单调的优美性质！！！

　　可以分成两步求，先求B串在下，A串在上的ans。再反过来求一遍。

　　过程好像比较难说清楚。

　　反正...对于B串上面的一堆A串，因为LCP是求min，所以LCP必定是单调递增，所以像下图那样做：

　　开了三个数组，s表示其单位贡献值，cnt表示有多少个A串是这个贡献值，h为从当前位置到底部的cnt*s的和（h便于计算的）。

　　然后就更新和替换。遇到一个B串就把h[tp]累加到ans中即可。　

　　反过来做也是一样的。

　　单调！！单调！！单调哦！！

　　好像很厉害！！

　　主要看代码~~

　　还有，这道题是有大写字母的！！！坑的我RE了巨久！！

　　还有记得long long！！

代码如下：

 #include<cstdio>

 #include<cstdlib>

 #include<cstring>

 #include<iostream>

 #include<algorithm>

 #include<queue>

 using namespace std;

 #define INF 0xfffffff

 #define Maxl 200010

 #define LL long long

 int k,la;

 char a[Maxl],b[Maxl];

 int c[Maxl];

 int cl;

 int sa[Maxl],rank[Maxl],Rs[Maxl],wr[Maxl],y[Maxl];

 //sa -> 排名第几的是谁

 //rank -> i的排名

 //Rs数值小于等于i的有多少个

 //y -> 第二关键字排名第几的是谁（类似sa）

 int height[Maxl];

 void get_sa(int m)

 {

    memcpy(rank,c,sizeof(rank));

     for(int i=;i<=m;i++) Rs[i]=;

     for(int i=;i<=cl;i++) Rs[rank[i]]++;

     for(int i=;i<=m;i++) Rs[i]+=Rs[i-];

     for(int i=cl;i>=;i--) sa[Rs[rank[i]]--]=i;

     int ln=,p=;

     while(p<cl)

     {

        int k=;

         for(int i=cl-ln+;i<=cl;i++) y[++k]=i;

         for(int i=;i<=cl;i++) if(sa[i]>ln) y[++k]=sa[i]-ln;

         for(int i=;i<=cl;i++) wr[i]=rank[y[i]];

         for(int i=;i<=m;i++) Rs[i]=;

         for(int i=;i<=cl;i++) Rs[wr[i]]++;

         for(int i=;i<=m;i++) Rs[i]+=Rs[i-];

         for(int i=cl;i>=;i--) sa[Rs[wr[i]]--]=y[i];

         for(int i=;i<=cl;i++) wr[i]=rank[i];

         for(int i=cl+;i<=cl+ln;i++) wr[i]=;

         p=;rank[sa[]]=;

         for(int i=;i<=cl;i++)

         {

             if(wr[sa[i]]!=wr[sa[i-]]||wr[sa[i]+ln]!=wr[sa[i-]+ln]) p++;

             rank[sa[i]]=p;

         }

         m=p,ln*=;

     }

     sa[]=rank[]=;

 }

 void get_he()

 {

     int kk=;

     for(int i=;i<=cl;i++)

     {

         // if(rank[i]==1) break;

         int j=sa[rank[i]-];

         if(kk) kk--;

         while(c[i+kk]==c[j+kk]&&i+kk<=cl&&j+kk<=cl) kk++;

         height[rank[i]]=kk;

     }

 }

 LL h[Maxl],cnt[Maxl],s[Maxl];

 int tp;

 void ffind()

 {

     LL ans=;tp=;

     s[]=INF;

     //******

     for(int i=;i<=cl-;i++)

     {

         if(sa[i]>=la+) //B串

          ans+=h[tp];

         if(height[i+]-k+<s[tp])//替换

         {

             LL sum=;

             while(height[i+]-k+<=s[tp]&&tp) sum+=cnt[tp--];

             s[++tp]=(height[i+]-k+),cnt[tp]=sum,h[tp]=h[tp-]+cnt[tp]*s[tp];

         }

         else s[++tp]=height[i+]-k+,cnt[tp]=,h[tp]=h[tp-];

         if(sa[i]<=la) cnt[tp]++,h[tp]+=s[tp];//A串

         if(height[i+]<k)

         {

             tp=;s[]=INF;

         }

     }tp=;s[tp]=INF;

     for(int i=;i<=cl-;i++)

     {

         if(sa[i]<=la) //A串

          ans+=h[tp];

         if(height[i+]-k+<s[tp])//替换

         {

             LL sum=;

             while(height[i+]-k+<=s[tp]&&tp) sum+=cnt[tp--];

             s[++tp]=(height[i+]-k+),cnt[tp]=sum,h[tp]=h[tp-]+cnt[tp]*s[tp];

         }

         else s[++tp]=height[i+]-k+,cnt[tp]=,h[tp]=h[tp-];

         if(sa[i]>=la+) cnt[tp]++,h[tp]+=s[tp];//B串

         if(height[i+]<k)

         {

             tp=;s[tp]=INF;

         }

     }

     printf("%I64d\n",ans);

 }

 void init()

 {

     scanf("%s%s",a,b);

     int l=strlen(a);cl=;

     la=l;

     for(int i=;i<l;i++)

     {

         if(a[i]>='a'&&a[i]<='z') c[++cl]=a[i]-'a'+;

         else c[++cl]=a[i]-'A'+;

     }

     c[++cl]=;

     l=strlen(b);

     for(int i=;i<l;i++)

     {

         if(b[i]>='a'&&b[i]<='z') c[++cl]=b[i]-'a'+;

         else c[++cl]=b[i]-'A'+;

     }

 }

 int main()

 {

     while()

     {

         scanf("%d",&k);

         if(k==) break;

         init();

         get_sa();

         get_he();

         ffind();

     }

     return ;

 }

[POJ3415]

2016-07-17 15:00:50

update：

哈哈学了SAM的我回来更新这道题啦！！

然而很搞笑的是...我打的SAM其实跟后缀数组差不多长！！

而且蒟蒻表示调试了很久！狗带！！SAM太难调了！！

好吧我一开始想不出来..是因为·不太懂那个SAM上某个点表示的串究竟是什么~~

其实是一个连续的区间，记为min(x)~max(x)（这是长度）,

其中min(x)=step[pre[x]]+1,max(x)=step[x]...

为什么呢..其实SAM就是省掉了重复子串吧，我觉得~~


    比如说上面这个，parent最大串一定是他的后缀来的，后面那个b表示的后缀其实有aab、ab、b、xaab....
但是aab、ab、b明显他的parent也有，就不会用后面那个来表示，
所以最短的都有step[pre[x]]+1那么长而且到时如果还要表示aab-----的串的话，其实信息也是保存到parent那里了。（大概，意会一下~）

 关于这道题，我们处理的目标是前缀的后缀~（上面用后缀数组的方法处理的是后缀的前缀，都叫后缀xx其实怎么我觉得是反过来的呢~）
　　
用A串建机，B串跑。

跑法有点类似AC自动机上的跑法，跑出来的长度啊都是让后缀匹配最长的！！

我们知道如果有长度大于k的子串s和ss，如过ss是s的子串，而且s合法（就是在A串出现），那么ss也合法，而且ss的答案一定大于等于s的答案。

我一开始就是不知道怎么弄ss多余部分的答案。
　　
    很容易想到就是用rt[x]*(length-k+1),但是直接减的话，你算一些小的串，其实x并不能代表这个串，就是说你现在扫到的位置不一定是这个小串的首位置，那么直接用right数组就会有问题，就会漏解。
　　
    上面说了，x表示的串为min(x)~max（x），我们就在x的位置先算他表示的串，然后在parent那里打标记，用parent的right数组来算，就可以避免这种问题。

　　具体怎么做看GDXB讲解：
先对第一个串构造 SAM,逆拓扑序求出right存入r[]。
某个节点的right集合表示Min(x)~Max(x)这一段子串都出现了r[x]次！
用第二个串对 SAM 做 LCS，当前节点x LCS>=K时，ans+=ans+=r[x]*(len-maxx(k,step[pre[x]]+1)+1);（当前匹配的最长串的子串数）。如果step[pre[x]]>=k,cnt[pre[x]]++;
拓扑序逆序统计不是最长公共子串的状态但是被子串包含的个数

，ans+=cnt[p]*(step[p]- max(K,Min(p)+1)*r[p]，同时维护cnt:cnt[pre[p]]+=cnt[p]。

说一说right数组怎么求，因为我这个一开始都打错了！！

right[x]表示x这个点表示的串在原串出现多少次。

我们把主链上的right全部清成1，然后按拓扑序把i的答案放到pre[i]上即可。

代码如下：(SAM)

 #include<cstdio>

 #include<cstdlib>

 #include<cstring>

 #include<iostream>

 #include<algorithm>

 #include<queue>

 #include<cmath>

 using namespace std;

 #define Maxn 100010

 #define LL long long

 char s[Maxn],ss[Maxn];

 int len,ml;

 LL lim;

 LL ans;

 LL mymax(LL x,LL y) {return x>y?x:y;}

 struct node

 {

     int tot,pre[*Maxn],son[*Maxn][];

     LL step[*Maxn],rt[*Maxn],cnt[*Maxn];

     int last;

     void extend(int k)

     {

         int np=++tot,p=last;

         step[np]=step[last]+;

         rt[np]=;

         while(p&&!son[p][k])

         {

             son[p][k]=np;

             p=pre[p];

         }

         if(!p) pre[np]=;

         else

         {

             int q=son[p][k];

             if(step[q]==step[p]+) pre[np]=q;

             else

             {

                 int nq=++tot;

                 step[nq]=step[p]+;

                 memcpy(son[nq],son[q],sizeof(son[nq]));

                 pre[nq]=pre[q];

                 pre[q]=pre[np]=nq;

                 while(p&&son[p][k]==q)

                 {

                     son[p][k]=nq;

                     p=pre[p];

                 }

             }

         }

         last=np;

     }

     void init()

     {

         memset(pre,,sizeof(pre));

         memset(son,,sizeof(son));

         memset(rt,,sizeof(rt));

         memset(cnt,,sizeof(cnt));

     }

     void build()

     {

         step[]=;tot=;

         last=;

         for(int i=;i<=len;i++)

         {

             int k=s[i]-'a'+;

             if(k<) k=s[i]-'A'+;

             extend(k);

         }

     }

     int aq[*Maxn],tp[*Maxn],Rs[*Maxn];

     void get_tp()

     {

         for(int i=;i<=tot;i++) Rs[i]=;

         for(int i=;i<=tot;i++) Rs[step[i]]++;

         for(int i=;i<=tot;i++) Rs[i]+=Rs[i-];

         for(int i=;i<=tot;i++) tp[Rs[step[i]]--]=i;

     }

     void get_rt()

     {

         int now=;

         for(int i=tot;i>=;i--) rt[pre[tp[i]]]+=rt[tp[i]];

     }

     void ffind()

     {

         get_tp();

         get_rt();

         int now=,sp=;

         ans=;

         for(int i=;i<=ml;i++)

         {

             int k=ss[i]-'a'+;

             if(k<) k=ss[i]-'A'+;

             while(now&&!son[now][k]) now=pre[now],sp=step[now];

             if(son[now][k]) now=son[now][k],sp++;

             else now=,sp=;

             if(sp>=lim) ans+=(sp-mymax(lim,step[pre[now]]+)+)*rt[now];

             if(lim<step[pre[now]]+) cnt[pre[now]]++;

         }

         for(int i=tot;i>=;i--)

             cnt[pre[tp[i]]]+=cnt[tp[i]];

         for(int i=;i<=tot;i++) if(step[i]>=lim)

             ans+=rt[i]*cnt[i]*(step[i]-mymax(step[pre[i]]+,lim)+);

         for(int i=;i<=tot;i++) pre[i]=;

         for(int i=;i<=tot;i++)

          for(int j=;j<=;j++) son[i][j]=;

         for(int i=;i<=tot;i++) rt[i]=;

         for(int i=;i<=tot;i++) cnt[i]=;

     }

 }suf;

 int main()

 {

     suf.init();

     while()

     {

         scanf("%lld",&lim);

         if(lim==) break;

         scanf("%s%s",s+,ss+);

         len=strlen(s+);

         ml=strlen(ss+);

         suf.build();

         suf.ffind();

         printf("%lld\n",ans);

     }

     return ;

 }

[POJ3415]

记得long long~

2016-09-16 11:47:27

更新

【POJ3415】 Common Substrings（后缀数组|SAM）的更多相关文章

POJ3415 Common Substrings —— 后缀数组 + 单调栈公共子串个数
题目链接:https://vjudge.net/problem/POJ-3415 Common Substrings Time Limit: 5000MS Memory Limit: 65536K ...
POJ3415 Common Substrings(后缀数组单调栈)
借用罗穗骞论文中的讲解: 计算A 的所有后缀和B 的所有后缀之间的最长公共前缀的长度,把最长公共前缀长度不小于k 的部分全部加起来.先将两个字符串连起来,中间用一个没有出现过的字符隔开.按height ...
poj3415 Common Substrings (后缀数组+单调队列)
Time Limit: 5000MS Memory Limit: 65536K Total Submissions: 9414 Accepted: 3123 Description A sub ...
POJ 3415 Common Substrings 后缀数组+并查集
后缀数组,看到网上很多题解都是单调栈,这里提供一个不是单调栈的做法, 首先将两个串连接起来求height 求完之后按height值从大往小合并. height值代表的是 sa[i]和sa[i ...
poj 3415 Common Substrings——后缀数组+单调栈
题目:http://poj.org/problem?id=3415 因为求 LCP 是后缀数组的 ht[ ] 上的一段取 min ,所以考虑算出 ht[ ] 之后枚举每个位置作为右端的贡献. 一开始想 ...
poj 3415 Common Substrings —— 后缀数组+单调栈
题目:http://poj.org/problem?id=3415 先用后缀数组处理出 ht[i]: 用单调栈维护当前位置 ht[i] 对之前的 ht[j] 取 min 的结果,也就是当前的后缀与之前 ...
POJ - 3415 Common Substrings (后缀数组）
A substring of a string T is defined as: T( i, k)= TiTi +1... Ti+k -1, 1≤ i≤ i+k-1≤| T|. Given two s ...
poj 3415 Common Substrings 后缀数组+单调栈
题目链接题意:求解两个字符串长度大于等于k的所有相同子串对有多少个,子串可以相同,只要位置不同即可:两个字符串的长度不超过1e5; 如 s1 = "xx" 和 s2 = &qu ...
POJ - 3415 Common Substrings(后缀数组求长度不小于 k 的公共子串的个数+单调栈优化)
Description A substring of a string T is defined as: T( i, k)= TiTi+1... Ti+k-1, 1≤ i≤ i+k-1≤| T|. G ...
POJ.3145.Common Substrings(后缀数组倍增单调栈)
题目链接 \(Description\) 求两个字符串长度不小于k的公共子串对数. \(Solution\) 求出ht[]后先减去k,这样对于两个后缀A',B',它们之间的贡献为min{ht(A)}( ...

随机推荐

GDB-Dashboard－GDB可视化界面
项目地址 https://github.com/cyrus-and/gdb-dashboard 项目介绍 gdb-dashboard是一个gdb的可视化界面,可以通过web或者终端来现实可视化信息,支 ...
C#_ 项目打包附加数据库
C#_ 项目打包附加数据库 2010-07-11 23:22:45| 分类: Winfrom|举报|字号订阅实现效果:安装项目时直接附加数据库. 1.首先在需要部署的项目的解决方案资源 ...
（亲测）设置myeclipse打开默认工作空间
亲测一: 1.找到D:\MyEclipse 8.5\configuration\ config.ini 这个文件 2.找到这一行instance.area.default 3.将后面的地址替换为你想要 ...
jdk1.5多线程Lock接口及Condition接口
jdk1.5多线程的实现的方式: jdk1.5之前对锁的操作是隐式的 synchronized(对象) //获取锁 { } //释放锁 jdk1.5锁的操作是显示的:在包java.util.concu ...
.net缓存应用与分析
在 ASP.NET 提供的许多特性中,相比 ASP.NET 的所有其他特性,缓存对应用程序的性能具有最大的潜在影响,利用缓存和其他机制,ASP.NET 开发人员可以接受使用开销很大的控件(例如,Dat ...
iOS UIKit：TableView之表格创建（1）
Table View是UITableView类的实例对象,其是使用节(section)来描述信息的一种滚动列表.但与普通的表格不同,tableView只有一行,且只能在垂直方向进行滚动.tableVi ...
Mybatis中实现oracle的批量插入、更新
oracle 实现在Mybatis中批量插入,下面测试可以使用,在批量插入中不能使用insert 标签,只能使用select标签进行批量插入,否则会提示错误 ### Cause: java.sql.S ...
Mybatis下配置调用Oracle自定义函数返回的游标结果集
在ibatis和Mybatis对存储过程和函数函数的调用的配置Xml是不一样的,以下是针对Mybatis 3.2的环境进行操作的. 第一步配置Mapper的xml内容 <mapper names ...
php curl_init函数用法
原文地址:curl_init函数用法">php curl_init函数用法作者:loading 使用PHP的cURL库可以简单和有效地去抓网页.你只需要运行一个脚本,然后分析一下你所抓 ...
Singleton设计模式分类：设计模式 2014-12-03 17:54 59人阅读评论(0) 收藏
实现方法: public class SingleTon<T> where T : class, new() { protected SingleTon() { } pri ...

【POJ3415】 Common Substrings（后缀数组|SAM）

【POJ3415】 Common Substrings（后缀数组|SAM）的更多相关文章

随机推荐

热门专题