CSU-1632 Repeated Substrings (后缀数组)
Description
String analysis often arises in applications from biology and chemistry, such as the study of DNA and protein molecules. One interesting problem is to find how many substrings are repeated (at least twice) in a long string. In this problem, you will write a program to find the total number of repeated substrings in a string of at most 100 000 alphabetic characters. Any unique substring that occurs more than once is counted. As an example, if the string is “aabaab”, there are 5 repeated substrings: “a”, “aa”, “aab”, “ab”, “b”. If the string is “aaaaa”, the repeated substrings are “a”, “aa”, “aaa”, “aaaa”. Note that repeated occurrences of a substring may overlap (e.g. “aaaa” in the second case).
Input
The input consists of at most 10 cases. The first line contains a positive integer, specifying the number of
cases to follow. Each of the following line contains a nonempty string of up to 100 000 alphabetic characters.
Output
For each line of input, output one line containing the number of unique substrings that are repeated. You
may assume that the correct answer fits in a signed 32-bit integer.
Sample Input
- 3
- aabaab
- aaaaa
- AaAaA
Sample Output
- 5
- 4
- 5
- 题目大意:统计字符串中重复出现的子串数目。
题目分析:sum(max(height(i)-height(i-1),0))即为答案。- 代码如下:
- //# define AC
- # ifndef AC
- # include<iostream>
- # include<cstdio>
- # include<cstring>
- # include<vector>
- # include<queue>
- # include<list>
- # include<cmath>
- # include<set>
- # include<map>
- # include<string>
- # include<cstdlib>
- # include<algorithm>
- using namespace std;
- # define mid (l+(r-l)/2)
- typedef long long LL;
- typedef unsigned long long ULL;
- const int N=100000;
- const int mod=1e9+7;
- const int INF=0x7fffffff;
- const LL oo=0x7fffffffffffffff;
- int SA[N+5];
- int tSA[N+5];
- int cnt[N+5];
- int rk[N+5];
- int *x,*y;
- int height[N+5];
- int idx(char c)
- {
- if('a'<=c&&c<='z') return c-'a';
- return c-'A'+26;
- }
- bool same(int i,int j,int k,int n)
- {
- if(y[i]-y[j]) return false;
- if(i+k<n&&j+k>=n) return false;
- if(i+k>=n&&j+k<n) return false;
- return y[i+k]==y[j+k];
- }
- void buildSA(char *s)
- {
- int n=strlen(s);
- int m=52;
- x=rk,y=tSA;
- for(int i=0;i<m;++i) cnt[i]=0;
- for(int i=0;i<n;++i) ++cnt[x[i]=idx(s[i])];
- for(int i=1;i<m;++i) cnt[i]+=cnt[i-1];
- for(int i=n-1;i>=0;--i) SA[--cnt[x[i]]]=i;
- for(int k=1;k<=n;k<<=1){
- int p=0;
- for(int i=n-k;i<n;++i) y[p++]=i;
- for(int i=0;i<n;++i) if(SA[i]>=k) y[p++]=SA[i]-k;
- for(int i=0;i<m;++i) cnt[i]=0;
- for(int i=0;i<n;++i) ++cnt[x[y[i]]];
- for(int i=1;i<m;++i) cnt[i]+=cnt[i-1];
- for(int i=n-1;i>=0;--i) SA[--cnt[x[y[i]]]]=y[i];
- p=1;
- swap(x,y);
- x[SA[0]]=0;
- for(int i=1;i<n;++i)
- x[SA[i]]=same(SA[i],SA[i-1],k,n)?p-1:p++;
- if(p>=n) break;
- m=p;
- }
- }
- void getHeight(char *s)
- {
- int n=strlen(s);
- for(int i=0;i<n;++i) rk[SA[i]]=i;
- int k=0;
- for(int i=0;i<n;++i){
- if(rk[i]==0){
- height[rk[i]]=k=0;
- }else{
- if(k) --k;
- int j=SA[rk[i]-1];
- while(i+k<n&&j+k<n&&s[i+k]==s[j+k])
- ++k;
- height[rk[i]]=k;
- }
- }
- }
- char str[N+5];
- void solve()
- {
- int n=strlen(str);
- int ans=0;
- for(int i=0;i<n;++i){
- if(height[i]>height[i-1])
- ans+=height[i]-height[i-1];
- }
- printf("%d\n",ans);
- }
- int main()
- {
- int T;
- scanf("%d",&T);
- while(T--)
- {
- scanf("%s",str);
- buildSA(str);
- getHeight(str);
- solve();
- }
- return 0;
- }
- # endif
CSU-1632 Repeated Substrings (后缀数组)的更多相关文章
- UVALive - 6869 Repeated Substrings 后缀数组
题目链接: http://acm.hust.edu.cn/vjudge/problem/113725 Repeated Substrings Time Limit: 3000MS 样例 sample ...
- CSU-1632 Repeated Substrings[后缀数组求重复出现的子串数目]
评测地址:https://cn.vjudge.net/problem/CSU-1632 Description 求字符串中所有出现至少2次的子串个数 Input 第一行为一整数T(T<=10)表 ...
- csu 1305 Substring (后缀数组)
http://acm.csu.edu.cn/OnlineJudge/problem.php?id=1305 1305: Substring Time Limit: 2 Sec Memory Limi ...
- POJ3415 Common Substrings —— 后缀数组 + 单调栈 公共子串个数
题目链接:https://vjudge.net/problem/POJ-3415 Common Substrings Time Limit: 5000MS Memory Limit: 65536K ...
- POJ1226 Substrings ——后缀数组 or 暴力+strstr()函数 最长公共子串
题目链接:https://vjudge.net/problem/POJ-1226 Substrings Time Limit: 1000MS Memory Limit: 10000K Total ...
- SPOJ - SUBST1 New Distinct Substrings —— 后缀数组 单个字符串的子串个数
题目链接:https://vjudge.net/problem/SPOJ-SUBST1 SUBST1 - New Distinct Substrings #suffix-array-8 Given a ...
- SPOJ- Distinct Substrings(后缀数组&后缀自动机)
Given a string, we need to find the total number of its distinct substrings. Input T- number of test ...
- SPOJ - DISUBSTR Distinct Substrings (后缀数组)
Given a string, we need to find the total number of its distinct substrings. Input T- number of test ...
- POJ 3415 Common Substrings 后缀数组+并查集
后缀数组,看到网上很多题解都是单调栈,这里提供一个不是单调栈的做法, 首先将两个串 连接起来求height 求完之后按height值从大往小合并. height值代表的是 sa[i]和sa[i ...
- POJ1226:Substrings(后缀数组)
Description You are given a number of case-sensitive strings of alphabetic characters, find the larg ...
随机推荐
- ruby环境的配置
安装 Ruby 解析器 一些Linux发行版本,MacOSX操作系统都自带Ruby解析器,但是我仍然建议自行下载ruby源代码编译安装.因为一方面可以自己定制ruby安装的路径,另一方面可以在编译过程 ...
- 杭电ACM1002
原题:http://acm.hdu.edu.cn/showproblem.php?pid=1002 #include <stdio.h> #include <string.h> ...
- 在linux中查询硬件相关信息
1.查询cpu的相关 a.查询CPU的统计信息 使用命令:lscpu 得到的结果如下: Architecture: x86_64 CPU op-mode(s): -bit, -bit Byte Ord ...
- Birt使用总结
把report放到其他服务器要重新建立Data Source ,这是配置,拷贝项目时不会同时拷贝 (1)在EXTJs中利用Report实现报表的刷新 Ext.getCmp("showview ...
- css3弹性盒子温习
弹性盒子由弹性容器(Flex container)和弹性子元素(Flex item)组成. 弹性容器通过设置 display 属性的值为 flex 或 inline-flex将其定义为弹性容器. 弹性 ...
- C语言程序设计第7堂作业
一.本次课主要内容: 本次以计算圆柱体体积为例,通过定义体积计算功能的函数和主函数调用的例子,引出函数定义的一般形式:函数首部加函数体,且在函数结尾处通过return 语句返回结果.本节要重 ...
- 初学AOP
src\dayday\Count.java package dayday;import org.springframework.stereotype.Component;/** * Created b ...
- oracle查询包含某个字段的表
select column_name,table_name,data_type ,data_length,data_precision,data_scale from DBA_TAB_COLUMNS ...
- 数组的sizeof
数组的sizeof值等于数组所占用的内存字节数,如: char a1[] = "abc"; int a2[3]; sizeof( a1 ); // 结果为4,字符 末尾还存在 ...
- HTML&CSS学习心德
学了html&css一周的时间,每天上课9小时,有空就看一下HTML+div+CSS视频,感觉还不错. 基本思路:从大的方面(整体结构)着手,将HTML的基本知识"解构"然 ...