UVALive 6869 Repeated Substrings
Repeated Substrings
Time Limit: 3000MS Memory Limit: Unknown 64bit IO Format: %lld & %llu
Description
String analysis often arises in applications from biology and chemistry, such as the study of DNA and protein molecules. One interesting problem is to find how many substrings are repeated (at least twice) in a long string. In this problem, you will write a program to find the total number of repeated substrings in a string of at most 100 000 alphabetic characters. Any unique substring that occurs more than once is counted. As an example, if the string is “aabaab”, there are 5 repeated substrings: “a”, “aa”, “aab”, “ab”, “b”. If the string is “aaaaa”, the repeated substrings are “a”, “aa”, “aaa”, “aaaa”. Note that repeated occurrences of a substring may overlap (e.g. “aaaa” in the second case).
Input
The input consists of at most 10 cases. The first line contains a positive integer, specifying the number of
cases to follow. Each of the following line contains a nonempty string of up to 100 000 alphabetic characters.
Output
For each line of input, output one line containing the number of unique substrings that are repeated. You
may assume that the correct answer fits in a signed 32-bit integer.
Sample Input
3
aabaab
aaaaa
AaAaA
Sample Output
5
4
5
HINT
Source
解题:后缀数组lcp的应用,如果lcp[i] > lcp[i-1]那么累加lcp[i] - lcp[i-1]
#include <bits/stdc++.h>
using namespace std;
const int maxn = ;
int rk[maxn],wb[maxn],wv[maxn],wd[maxn],lcp[maxn];
bool cmp(int *r,int i,int j,int k) {
return r[i] == r[j] && r[i+k] == r[j+k];
}
void da(int *r,int *sa,int n,int m) {
int i,k,p,*x = rk,*y = wb;
for(i = ; i < m; ++i) wd[i] = ;
for(i = ; i < n; ++i) wd[x[i] = r[i]]++;
for(i = ; i < m; ++i) wd[i] += wd[i-];
for(i = n-; i >= ; --i) sa[--wd[x[i]]] = i; for(p = k = ; p < n; k <<= ,m = p) {
for(p = ,i = n-k; i < n; ++i) y[p++] = i;
for(i = ; i < n; ++i) if(sa[i] >= k) y[p++] = sa[i] - k;
for(i = ; i < n; ++i) wv[i] = x[y[i]]; for(i = ; i < m; ++i) wd[i] = ;
for(i = ; i < n; ++i) wd[wv[i]]++;
for(i = ; i < m; ++i) wd[i] += wd[i-];
for(i = n-; i >= ; --i) sa[--wd[wv[i]]] = y[i]; swap(x,y);
x[sa[]] = ;
for(p = i = ; i < n; ++i)
x[sa[i]] = cmp(y,sa[i-],sa[i],k)?p-:p++;
}
}
void calcp(int *r,int *sa,int n) {
for(int i = ; i <= n; ++i) rk[sa[i]] = i;
int h = ;
for(int i = ; i < n; ++i) {
if(h > ) h--;
for(int j = sa[rk[i]-]; i+h < n && j+h < n; h++)
if(r[i+h] != r[j+h]) break;
lcp[rk[i]] = h;
}
}
int r[maxn],sa[maxn];
char str[maxn];
int main() {
int hn,x,y,cs,ret;
scanf("%d",&cs);
while(cs--) {
scanf("%s",str);
int len = strlen(str);
for(int i = ; str[i]; ++i)
r[i] = str[i];
ret = r[len] = ;
da(r,sa,len+,);
calcp(r,sa,len);
for(int i = ; i <= len; ++i)
if(lcp[i] > lcp[i-]) ret += lcp[i] - lcp[i-];
printf("%d\n",ret);
}
return ;
}
后缀自动机
#include <bits/stdc++.h>
using namespace std;
const int maxn = ;
int cnt[maxn],c[maxn],sa[maxn];
struct node{
int son[],f,len;
void init(){
memset(son,-,sizeof son);
f = -;
len = ;
}
};
struct SAM{
node e[maxn];
int tot,last;
int newnode(int len = ){
e[tot].init();
e[tot].len = len;
return tot++;
}
void init(){
tot = last = ;
newnode();
}
void add(int c){
int p = last,np = newnode(e[p].len + );
while(p != - && e[p].son[c] == -){
e[p].son[c] = np;
p = e[p].f;
}
if(p == -) e[np].f = ;
else{
int q = e[p].son[c];
if(e[p].len + == e[q].len) e[np].f = q;
else{
int nq = newnode();
e[nq] = e[q];
e[nq].len = e[p].len + ;
e[q].f = e[np].f = nq;
while(p != - && e[p].son[c] == q){
e[p].son[c] = nq;
p = e[p].f;
}
}
}
last = np;
cnt[np] = ;
}
}sam;
char str[maxn];
int main(){
int kase;
scanf("%d",&kase);
while(kase--){
scanf("%s",str);
sam.init();
memset(cnt,,sizeof cnt);
int len = strlen(str);
for(int i = ; str[i]; ++i)
sam.add(str[i]);
node *e = sam.e;
memset(c,,sizeof c);
for(int i = ; i < sam.tot; ++i) c[e[i].len]++;
for(int i = ; i <= len; ++i) c[i] += c[i-];
for(int i = sam.tot-; i >= ; --i) sa[--c[e[i].len]] = i;
for(int i = sam.tot-; i > ; --i){
int v = sa[i];
cnt[e[v].f] += cnt[v];
}
int ret = ;
for(int i = ; i < sam.tot; ++i){
if(cnt[i] <= ) continue;
ret += e[i].len - e[e[i].f].len;
}
printf("%d\n",ret);
}
return ;
}
UVALive 6869 Repeated Substrings的更多相关文章
- UVALive - 6869 Repeated Substrings 后缀数组
题目链接: http://acm.hust.edu.cn/vjudge/problem/113725 Repeated Substrings Time Limit: 3000MS 样例 sample ...
- CSU-1632 Repeated Substrings (后缀数组)
Description String analysis often arises in applications from biology and chemistry, such as the stu ...
- UVALive 6869(后缀数组)
传送门:Repeated Substrings 题意:给定一个字符串,求至少重复一次的不同子串个数. 分析:模拟写出子符串后缀并排好序可以发现,每次出现新的重复子串个数都是由现在的height值减去前 ...
- Repeated Substrings(UVAlive 6869)
题意:求出现过两次以上的不同子串有多少种. /* 用后缀数组求出height[]数组,然后扫一遍, 发现height[i]-height[i-1]>=0,就ans+=height[i]-heig ...
- UVALive 4671 K-neighbor substrings 巧用FFT
UVALive4671 K-neighbor substrings 给定一个两个字符串A和B B为模式串.问A中有多少不同子串与B的距离小于k 所谓距离就是不同位的个数. 由于字符串只包含a和 ...
- UVALive - 4671 K-neighbor substrings (FFT+哈希)
题意:海明距离的定义:两个相同长度的字符串中不同的字符数.现给出母串A和模式串B,求A中有多少与B海明距离<=k的不同子串 分析:将字符a视作1,b视作0.则A与B中都是a的位置乘积是1.现将B ...
- CSU-1632 Repeated Substrings[后缀数组求重复出现的子串数目]
评测地址:https://cn.vjudge.net/problem/CSU-1632 Description 求字符串中所有出现至少2次的子串个数 Input 第一行为一整数T(T<=10)表 ...
- LeetCode 1100. Find K-Length Substrings With No Repeated Characters
原题链接在这里:https://leetcode.com/problems/find-k-length-substrings-with-no-repeated-characters/ 题目: Give ...
- [LeetCode] Repeated DNA Sequences 求重复的DNA序列
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
随机推荐
- HDOJ1084 What Is Your Grade?
What Is Your Grade? Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 65536/32768 K (Java/Othe ...
- Google浏览器怎样删除指定网址的网址提示
方法例如以下: Windows系统:按键盘上的"箭头下".按shift+delete(或者shift+回退): Mac系统:按fn+shift+delete. (此方法不须要清空b ...
- MPI搭建简要教程
具体安装部署,能够參考 http://www.ibm.com/developerworks/cn/linux/l-cn-mpich2/,该教程将的比較具体. 注:不同版本号的 MPICH2对编译器以及 ...
- SPFA的两种优化
SPFA是可以优化的,这个大家都是晓得的吧. 下面介绍两种SPFA的神奇优化(我只代码实现了的一种) SLF:Small Label First策略,设要加入的节点是j,队首元素为i,若dist(j) ...
- miniUI-SelectGrid 弹出选择表格-翻页选中
介绍 mini中已经给出 弹出表格的里例子 :MiniUi版本 但是在应用过程中遇到写小问题就是没有办法翻页后一并连之前翻页选中的一起提交 以下是解决方案 正文 下面首先介绍 JS 代码 //存储已 ...
- 使用goroutine+channel和java多线程+queue队列的方式开发各有什么优缺点?
我感觉很多项目使用java或者c的多线程库+线程安全的queue数据结构基本上可以实现goroutine+channel开发能达到的需求,所以请问一下为什么说golang更适合并发服务端的开发呢?使用 ...
- VS2012恢复默认设置的2种方法
方法一: 工具 → 导入和导出设置 → 重置所有设置 → 下一步 → 选择“是否保存当前设置”,下一步 → 选择“要重置的开发语言(如,Visual C# 开发设置)” → 完成. 方法二: 1.依次 ...
- c# 的类成员
1 字段和变量的区别 字段是在类中定义的数据成员 由访问修饰符+数据类型+字段名(public string name) 字段就像类的一个小数据库,用来存放和类相关的数据; 而变量是没有修饰符的(in ...
- ifsta---统计网络接口活动状态
ifstat命令就像iostat/vmstat描述其它的系统状况一样,是一个统计网络接口活动状态的工具.ifstat工具系统中并不默认安装,需要自己下载源码包,重新编译安装,使用过程相对比较简单. 下 ...
- 监控web服务(http,本地 / 远程监控nginx)
监控 httpd 服务一: #!/bin/bash #描述: 秒级别监控 http 服务 while [ 1 -lt 2 ] do sleep 10 ai=`netstat -ntl | grep & ...