[算法竞赛入门经典]Message Decoding,ACM/ICPC World Finals 1991,UVa213

Description

Some message encoding schemes require that an encoded message be sent in two parts. The first part, called the header, contains the characters of the message. The second part contains a pattern that represents the message. You must write a program that can decode messages under such a scheme.

The heart of the encoding scheme for your program is a sequence of “key” strings of 0’s and 1’s as follows:

0, 00, 01, 10, 000, 001, 010, 011, 100, 101, 110, 0000, 0001, . . . , 1011, 1110, 00000, . . .

The first key in the sequence is of length 1, the next 3 are of length 2, the next 7 of length 3, the next 15 of length 4, etc. If two adjacent keys have the same length, the second can be obtained from the first by adding 1 (base 2). Notice that there are no keys in the sequence that consist only of 1’s.

The keys are mapped to the characters in the header in order. That is, the first key (0) is mapped to the first character in the header, the second key (00) to the second character in the header, the kth key is mapped to the kth character in the header. For example, suppose the header is:

AB#TANCnrtXc

Then 0 is mapped to A, 00 to B, 01 to #, 10 to T, 000 to A, ..., 110 to X, and 0000 to c.

The encoded message contains only 0’s and 1’s and possibly carriage returns, which are to be ignored. The message is divided into segments. The first 3 digits of a segment give the binary representation of the length of the keys in the segment. For example, if the first 3 digits are 010, then the remainder of the segment consists of keys of length 2 (00, 01, or 10). The end of the segment is a string of 1’s which is the same length as the length of the keys in the segment. So a segment of keys of length 2 is terminated by 11. The entire encoded message is terminated by 000 (which would signify a segment in which the keys have length 0). The message is decoded by translating the keys in the segments one-at-a-time into the header characters to which they have been mapped.

Input

The input file contains several data sets. Each data set consists of a header, which is on a single line by itself, and a message, which may extend over several lines. The length of the header is limited only by the fact that key strings have a maximum length of 7 (111 in binary). If there are multiple copies of a character in a header, then several keys will map to that character. The encoded message contains only 0’s and 1’s, and it is a legitimate encoding according to the described scheme. That is, the message segments begin with the 3-digit length sequence and end with the appropriate sequence of 1’s. The keys in any given segment are all of the same length, and they all correspond to characters in the header. The message is terminated by 000.

Carriage returns may appear anywhere within the message part. They are not to be considered as part of the message.

Output

For each data set, your program must write its decoded message on a separate line. There should not be blank lines between messages.

Sample input

TNM AEIOU

0010101100011

1010001001110110011

11000

$#

0100000101101100011100101000

Sample output

TAN ME ##*$

Analyze

看紫书的时候卡在这题老久了，题目倒是看懂了，但是刘老师的代码前几眼实在是有点抽象，不过懂了之后确实感觉很巧妙。

题目会给你二进制递增(在各自的位数里递增)序列，然后先输入一串你自定义的编码头，将这个串的每个字符和二进制序列的每个数建立映射，再根据后面输入的编码按要求一一对应输出。

什么是各自的位数呢，比如：

1位： 0

2位： 00 01

3位： 000 001 010 011 100 101 110 111

...

将它们写在一行就是:

0,00,01,10,000,001,010,011,100,101,110,111,0000, . . . (无限长)

假设编码头是AB#TANC，那么映射是这样的：

0 00 01 10 000 001 010

A B # T A N C

编码的读入要求就不用我多说了吧。

解读一下核心readCodes函数：

readCodes()函数,作用是读取每组数据首行的编码头

int readCodes() {

    memset(code, 0, sizeof(code));

    code[1][0] = readChar();		// 先把编码头的第一个字母读进来

    for(int len = 2; len <= 7; len++) {

        for(int i = 0; i < (1 << len) - 1; i++) {

            int ch = getchar();

            // 文件结束，终止程序

            if(ch == EOF) {

                return 0;

            }

            // 读一行

            if(ch == '\n' || ch == '\r') {

                return 1;

            }

            code[len][i] = ch;

        }

    }

    return 1;

}

怎么理解readCodes()呢?之前列举的编码头AB#TANC对应的映射二进制数是

0 00 01 10 000 001 010

位数： 1 2 2 2 3 3 3

可以看到1位二进制的只有0一种情况，不用循环处理，所以直接由code[1][0] = readChar();读金第一个字符。

你会问为什么不从code[0][0]开始呢？因为仅从自然语义来说更容易操作，假设1位就写code[1][xxx]而不是code[0][xxx]。

从二位的二进制数开始，每种位(n)二进制数的组合就有2^{n-1种(为什么不是2}n种呢，因为题目要求后面输入的编码全为1的代表结束，所以和编码头映射时全为1的编码无任何意义，直接舍弃)。

所以我们会发现函数中的第二层循环for(int i = 0; i < (1 << len) - 1; i++)算的正是每种位二进制数所有的取值对应的10进制数，并将算出的i映射到code数组的第二个维度下标上。

而第一层循环负责控制二进制位数的范围。

就之前的例子来说，AB#TANC映射到数组里是这样的：

位数为2的映射： code[2][0(00)] = B的asc2值 code[2][1(01)] = # 的asc2值 code[2][2(10)] = T的asc值

位数为3的映射： code[3][0(000)] = A的asc2值 code[3][1(001)] = N的asc2值 code[3][2(010)] = C的asc值

我觉得我解释得够详细了吧？难点几乎就是在这了

Code

#include <stdio.h>

#include <string.h>

int code[8][1<<8];

int readChar() {

    while(true) {

        int ch = getchar();

        // 读到非换行符为止

        if(ch != '\n' && ch != '\r') {

            return ch;

        }

    }

}

int readInt(int c) {	//将指定的下c位二进制转换为10进制

    int v = 0;

    while(c--) {

        v = v * 2 + readChar() - '0';

    }

    return v;

}

int readCodes() {

    memset(code, 0, sizeof(code));

    // 读取编码头的第一个字符

    code[1][0] = readChar();

    for(int len = 2; len <= 7; len++) {

        for(int i = 0; i < (1 << len) - 1; i++) {

            int ch = getchar();

            if(ch == EOF) {

                return 0;

            }

            if(ch == '\n' || ch == '\r') {

                return 1;

            }

            code[len][i] = ch;

        }

    }

    return 1;

}

int main() {

    while(readCodes()) {	//读编码头

        while(true) {

            int len = readInt(3);

            if(!len) {

                break;

            }

            while(true) {			//读编码

                int v = readInt(len);

                if(v == (1 << len) - 1) {

                    break;

                }

                putchar(code[len][v]);

            }

        }

        putchar('\n');

    }

    return 0;

}