Index Generation

Time Limit: 1000MS		Memory Limit: 10000K
Total Submissions: 230		Accepted: 89

Description

Most nonfiction and reference books have an index to help readers find references to specific terms or concepts in the text. Here is a sample index.

larch, 4, 237, 238, 414 + Monty Python and, 64, 65, 66 + planting of, 17 Lenny Kravitz, 50 + going his way, 53 lumbago, 107 mango + Chris Kattan, 380 + storage of, 87, 90 + use in Nethack, 500, 501 + Vitamin C content, 192

Each index entry contains a primary entry followed by zero or more secondary entries, which begin with a '+'. Entries will normally be followed by a list of page references, but a primary entry might not be if at least one secondary entry is present (as is the case with mango, above). Primary entries are sorted, and secondary entries following a primary entry are also sorted. Sorting is case-insensitive. Page references for an entry are in ascending order and do not include duplicates. (A duplicate could occur if there are two or more identical entries on the same page.)

Your task is to read a document that has index information embedded within it and produce the index. Documents consist of one or more lines of ASCII text. The page number starts at 1, and the character '&' indicates the start of a new page (which adds 1 to the current page number). Index entries are indicated by a marker, which in its most elaborate form has the following syntax:

{text%primary$secondary}
Here text is the text to be indexed, primary is an alternative primary entry, and secondary is a secondary entry. Both '%primary' and '$secondary' are optional, but if both are present they must appear in the order given. If primary is present then it is used as the primary entry, and if not then text is used as the primary entry. If secondary is present then the marker adds a page reference for that secondary entry; otherwise it adds a page reference for the primary entry. A single marker cannot add a page reference for both a primary and secondary entry. Here are examples of each of the four possible types of marker, which correspond to four of the entries in the sample index above.

... his {lumbago} was acting up, so ...
... {Lenny%Lenny Kravitz} lit up the crowd with his version of ...
... Monty Python often used the {larch$Monty Python and} in ...
... when storing {mangos%mango$storage of}, be sure to ...

Input

The input consists of one or more documents, followed by a line containing only '**' that signals the end of the input. Documents are implictly numbered starting with 1. Each document consists of one or more lines of text followed by a line containing only '*'. Each line of text will be at most 79 characters long, not counting end-of-line characters. For document i, output the line 'DOCUMENT i' followed by the sorted index using the exact output format shown in the examples.

Output

Note:

A document will contain at most 100 markers, with at most 20 primary entries.
A primary entry will have at most 5 secondary entries.
An entry will have at most 10 unique page references (not including duplicates).
The character '&' will not appear anywhere within a marker, and will appear at most 500 times within a document.
The character '*' is used only to signal the end of a document or the end of the input.
The characters '{', '}', '%', and '$' will only be used to define markers, and will not appear in any text or entries.
A marker may span one or more lines. Every end-of-line within a marker must be converted to a single space.
A space within a marker (including a converted end-of-line) is normally included in the text/entry, just like any other character. However, any space that immediately follows '{', immediately precedes '}', or is immediately adjacent to '%' or '$' must be ignored.
The total length of a marker, measured from the opening '{' to the closing '}', and in which all embedded end-of-lines are converted to spaces, will be at most 79 characters.

Sample Input

Call me Ishmael.

*

One {fish $unary}, two {fish$ binary},&red {fish $ scarlet}, blue {fish$

azure}. & By { Dr. Seuss }.

*

This is a {simple } & & { document} that &{

simply %simple

$adverb

} & {illustrates %vision} &&&&& one {simple-minded% simple} {Judge}'s {vision}

for what a {document } might { look % vision} like.

*

**

Sample Output

DOCUMENT 1

DOCUMENT 2

Dr. Seuss, 3

fish

+ azure, 2

+ binary, 1

+ scarlet, 2

+ unary, 1

DOCUMENT 3

document, 3, 10

Judge, 10

simple, 1, 10

+ adverb, 4

vision, 5, 10

 /*

     Name: Shangli_Cloud

     Copyright: Shangli_Cloud

     Author: Shangli_Cloud

     Date: 10/10/14 08:15

     Description:

     字符串处理题，

     在读取过程中，遇到’&'就PAGe++；

     我们要处理的对象为{}，

     对象有  primary,secondary,page属性,所以结构体存储。

     当遇到'{'是增加entry，遇标记，处理。

     排序，三个因素。

     我们遇标记处理，定义一个next_token（）标记函数，

     定义一个next_char()函数为next_token()函数服务。

     注意，当我们遇到某些字符时，需要读取下一个字符，之后再用next_char()的会又会

     读取下一个字符，这个字符就跳过了。

     所以增加一个变量  lookahead表示是否是有效字符的开始。

 */

 #include"iostream"

 #include"cstdio"

 #include"cstring"

 #include"string"

 #include"algorithm"

 #include"set"

 #include"map"

 #include"stack"

 #include"queue"

 #include"vector"

 #include"cstdlib"

 #include"ctime"

 using namespace std;

 const int EndOfDocument=-;

 const int EndOfFile=-;

 char ch;

 char token;

 int page;

 bool lookahead;

 struct Entry

 {

     string primary;

     string secondary;

     int page;

     Entry(string p,string s):primary(p),secondary(s),page(::page){};

 } ;

 vector<Entry>  entry;

 int string_compare(const string &s,const string &t)

 {

     int m=s.length();

     int n=t.length();

     int k=m<n?m:n;

     for(int i=;i<k;i++)

     {

         int a=toupper(s[i]);

         int b=toupper(t[i]);

         if(a!=b)

             return a-b;

     }

     return m==n?:m<n?-:;

 }

 bool less_than(const Entry &s,const Entry &t)

 {

     int cmp=string_compare(s.primary,t.primary);

     if(cmp<)

         return true;

     if(cmp>)

         return false;

     cmp=string_compare(s.secondary,t.secondary);

     if(cmp<)

         return true;

     if(cmp>)

         return false;

     return s.page<t.page;

 }

 inline char next_char()

 {

     if(lookahead)

         lookahead=false;

     else

         ch=cin.get();

     return ch;

 }

 char next_token ()

 {

     switch (next_char ())

     {

     case '*':

         token = (next_char () == '*') ? EndOfFile : EndOfDocument;

         break;

     case ' ':

     case '\n':

         next_char ();

         if (ch == '%' || ch == '$' || ch == '}')

             token = ch;

         else {

             token = ' ';

             lookahead = true;

             break;

     }

     case '{': case '%': case '$':

         token = ch;

         lookahead = ! isspace (next_char ());

         break;

     default:

         token = ch;

     }

     return token;

 }

 inline bool is_delimiter(char t)

 {

     return t=='%'||t=='$'||t=='}';

 }

 void add_entry ()

 {

     string primary, secondary;

     while (! is_delimiter (next_token ()))

         primary += token;

     if (token == '%')

     {

         primary.erase ();  //primary="";

         while (! is_delimiter (next_token ()))

             primary += token;

     }

     if (token == '$')

         while (! is_delimiter (next_token ()))

             secondary += token;

     entry.push_back (Entry (primary, secondary));

 }

 int main ()

 {

     for (int document = ; ; ++document)

     {

         if (next_token () == EndOfFile) break;

         cout << "DOCUMENT " << document;

         page = ;

         entry.clear ();

         entry.push_back (Entry ("", ""));

         //cout<<"----"<<entry.size()<<endl;

         do {

             if (token == '&')

                 ++page;

             else if (token == '{')

                 add_entry ();

         } while (next_token () != EndOfDocument);

         sort (entry.begin (), entry.end (), less_than);

         for (int i = ; i < entry.size(); ++i)

         {

             if (entry[i].primary == entry[i-].primary)

                 if (entry[i].secondary == entry[i-].secondary)

                 {

                     if (entry[i].page != entry[i-].page)

                         cout << ", " << entry[i].page;

                 }

                 else

                     cout<<"\n+ "<<entry[i].secondary<<", "<<entry[i].page;

             else

             {

                 cout << '\n' << entry[i].primary;

                 if (entry[i].secondary == "")

                     cout << ", " << entry[i].page;

                 else

                     cout<<"\n+ "<<entry[i].secondary<<", "<<entry[i].page;

             }

         }

         cout << endl;

     }

     return ;

 }

Index Generation的更多相关文章

MySQL: Building the best INDEX for a given SELECT
Table of Contents The ProblemAlgorithmDigressionFirst, some examplesAlgorithm, Step 1 (WHERE "c ...
Solr4.8.0源码分析(22)之SolrCloud的Recovery策略(三)
Solr4.8.0源码分析(22)之SolrCloud的Recovery策略(三) 本文是SolrCloud的Recovery策略系列的第三篇文章,前面两篇主要介绍了Recovery的总体流程,以及P ...
【MIT-6.824】Lab 1: MapReduce
Lab 1链接:https://pdos.csail.mit.edu/6.824/labs/lab-1.html Part I: Map/Reduce input and output Part I需 ...
[8]windows内核情景分析--窗口消息
消息与钩子众所周知,Windows系统是消息驱动的,现在我们就来看Windows的消息机制. 早期的Windows的窗口图形机制是在用户空间实现的,后来为了提高图形处理效率,将这部分移入内核空间,在 ...
[分布式系统学习] 6.824 LEC1 MapReduce 笔记
什么是Map-Reduce呢? Map指的是一个形如下面定义的函数. def Map(k, v): //return [(k1, v1), (k2, v2), (k3, v3), ...] pass ...
微软职位内部推荐-Senior Software Lead-Index Gen
微软近期Open的职位: Position: Senior Software Development Lead Bing Index Generation team is hiring! As one ...
用C++写一个没人用的ECS
github地址:https://github.com/yangrc1234/Resecs 在做大作业的时候自己实现了一个简单的ECS,起了个名字叫Resecs. 这里提一下一些实现的细节,作为回顾. ...
Mit6.824 Lab1-MapReduce
前言 Mit6.824 是我在学习一些分布式系统方面的知识的时候偶然看到的,然后就开始尝试跟课.不得不说,国外的课程难度是真的大,一周的时间居然要学一门 Go 语言,然后还要读论文,进而做MapRed ...
Generating Complex Procedural Terrains Using GPU
前言:感慨于居然不用tesselation也可以产生这么复杂的地形,当然致命的那个关于不能有洞的缺陷还是没有办法,但是这个赶脚生成的已经足够好了,再加上其它模型估计效果还是比较震撼的.总之好文共分享 ...

随机推荐

最短路+线段交 POJ 1556 好题
// 最短路+线段交 POJ 1556 好题 // 题意:从(0,5)到(10,5)的最短距离,中间有n堵墙,每堵上有两扇门可以通过 // 思路:先存图.直接n^2来暴力,不好写.分成三部分,起点终 ...
bzoj4578: [Usaco2016 OPen]Splitting the Field
2365: Splitting the Field 题意:n个点,求用两个矩形面积覆盖完所有点和一个矩形覆盖完少多少面积思路:枚举两个矩形的分割线,也就是把所有点分成两个部分,枚举分割点:先预处理每 ...
Maven仓库Nexus的安装配置
1.下载nexus,最新版本是nexus-2.8.0-05 参考文章下载nexus-latest-bundle.zip文件后,并解压到 D:\nexus下配置nexus的环境变量:先配置NE ...
[C语言 - 4] 指针
存放变量地址的变量 int a = 1; int *p; p = &a; 在64位系统中,占用8个字节直接引用间接引用 *p : 指针指向的变量的值不要使用未初始化的指针 1 ...
经典代码-C宏 #转字符串【瓦特芯笔记】
在调试C语言程序时,有时需要打印宏的名字.可以通过定义宏,宏名字的数组来获得. 例如: #include <stdio.h> #define MACRO_STR(x) {x, #x} ty ...
Educational Codeforces Round 1(D. Igor In the Museum) (BFS+离线访问)
题目链接:http://codeforces.com/problemset/problem/598/D 题意是给你一张行为n宽为m的图 k个询问点 ,求每个寻问点所在的封闭的一个上下左右连接的块所能 ...
hibernate[版本四]知识总结
1.hibernate是orm对象关系映射,是对jdbc的封装 2.hibernate版helloworld 2.1导入jar <dependencies> <dependency& ...
C语言简单实现sizeof功能代码
sizeof不是函数,而是运算符,C/C++语言编译器在预编译阶段的时候就已经处理完了sizeof的问题,也就是说sizeof类似于宏定义. 下面给出一个sizeof的一个宏定义实现版本 #defin ...
Servlet 总结
1,什么是Servlet2,Servlet有什么作用3,Servlet的生命周期4,Servlet怎么处理一个请求5,Servlet与JSP有什么区别6,Servlet里的cookie技术7,Serv ...
C#全角半角转换函数
Code#region 全角半角转换 /// <summary> /// 转全角的函数(SBC case) /// </summary> /// <param name= ...

Index Generation

Index Generation的更多相关文章

随机推荐

热门专题