Western Subregional of NEERC, Minsk, Wednesday, November 4, 2015 Problem K. UTF-8 Decoder 模拟题
Problem K. UTF-8 Decoder
题目连接:
http://opentrains.snarknews.info/~ejudge/team.cgi?SID=c75360ed7f2c7022&all_runs=1&action=140
Description
UTF-8 is a character encoding capable of encoding all possible characters, or code points, in Unicode.
Nowadays UTF-8 is the dominant character encoding for the World Wide Web, accounting for 85.1% of
all Web pages in September 2015.
Peter works in a large company as a software engineer and develops a new Internet search engine.
Its crawler needs a UTF-8 decoder to parse Web pages and put them into index. Peter has already
checked if there are any ready-made solutions available. He used his own search engine to look for opensource
implementations on the Web and found nothing that satisfied him. Several huge libraries following
‘batteries included’ philosophy were rejected because they are too heavy and contain tons of code. Several
small but relevant libraries didn’t get to the top of search results page because Peter’s search engine is
not perfect at present. . . So Peter decided to invent the wheel and write his custom lightweight UTF-8
decoder.
Let’s define a code point as an integer from range [0, 2
31). One code point is encoded into variable-length
sequence of 8-bit units (bytes).
The design of UTF-8 can be seen in this table (the x characters are replaced by the bits of the code point):
One-byte codes are used only for the ASCII code point values 0 through 127. In this case the UTF-8 code
has the same value as the ASCII code. The high-order bit of these codes is always 0. This means that
ASCII text is valid UTF-8.
Code points larger than 127 are represented by multi-byte sequences, composed of a leading byte and
one or more continuation bytes. The leading byte has two or more high-order 1s followed by a 0, while
continuation bytes all have 10 in the high-order position. UTF-8 offers clear distinction between multibyte
and single-byte characters. The high order bits of every byte determine the type of byte; single bytes
(0xxxxxxx), leading bytes (11xxxxxx), and continuation bytes (10xxxxxx) do not share values.
The number of high-order 1s in the leading byte of a multi-byte sequence indicates the number of bytes
in the sequence. The remaining bits of the encoding (the x bits in the above patterns) are used for the
bits of the code point being encoded, padded with high-order 0s if necessary. The high-order bits go in
the lead byte, lower-order bits in succeeding continuation bytes.
The standard specifies that the correct encoding of a code point use only the minimum number of bytes
required to hold the significant bits of the code point. Longer encodings are called overlong and are not
valid UTF-8 representations of the code point. This rule maintains a one-to-one correspondence between
code points and their valid encodings, so that there is a unique valid encoding for each code point. This
ensures that string comparisons and searches are well-defined.
Modern real-life UTF-8 encoding contains more restrictions. For instance, RFC 3629 removed all 5-, 6-
byte sequences and some 4-byte sequences in order to match the constraints of the UTF-16 character
encoding. Peter wants his decoder to be flexible and to be able to decode as much texts as possible, that’s
why Peter does not implement these additional restrictions.
Input
The first line of input contains an integer N (1 ≤ N ≤ 100 000). The second line contains the values of
N bytes (in range between 0 and 255 each, inclusive) given in hexadecimal. A value consists of two hex
digits. The symbols 0–9 represent digits zero to nine, and A, B, C, D, E, F represent digits ten to fifteen.
Values of bytes are separated by single spaces.
Output
If the input sequence of N bytes can be decoded successfully into sequence of L code points, then in the
first line print the number L and in the second line print L code point values (31-bit integers in usual
decimal notation with no leading zeros) separated by spaces.
If the input cannot be decoded, output a single line Epic Fail.
Sample Input
1
24
Sample Output
1
36
Hint
题意
其实就是让你写一个UTF8译码器。
大体上来说,0xxxxx表示这是一个单bit的,10xxxx表示这个是填补的,11xxx0xxxx这个表示后面有多少个bit
然后把所有的数转成二进制就好了。
你还得判断是否非法。
然后这个数必须得使用最简单的表示方法才行。
题解:
模拟题,把题目讲的东西全部模拟一遍就好了……
代码
#include<bits/stdc++.h>
using namespace std;
string s[100005];
string tmp;
vector<long long>ans;
string get(char c){
if(c=='0')return "0000";
if(c=='1')return "0001";
if(c=='2')return "0010";
if(c=='3')return "0011";
if(c=='4')return "0100";
if(c=='5')return "0101";
if(c=='6')return "0110";
if(c=='7')return "0111";
if(c=='8')return "1000";
if(c=='9')return "1001";
if(c=='A')return "1010";
if(c=='B')return "1011";
if(c=='C')return "1100";
if(c=='D')return "1101";
if(c=='E')return "1110";
if(c=='F')return "1111";
}
int main(){
int n;
scanf("%d",&n);
for(int i=0;i<n;i++){
cin>>tmp;
s[i]+=get(tmp[0]);
s[i]+=get(tmp[1]);
}
string now;
for(int i=0;i<n;i++){
if(s[i][0]=='1'&&s[i][1]=='0'){
printf("Epic Fail");
return 0;
}
int j;
for(j=0;j<s[i].size();j++)
if(s[i][j]=='0')break;
if(j==8||j==7){
printf("Epic Fail");
return 0;
}
for(int t=j+1;t<s[i].size();t++)
now+=s[i][t];
if(n<i+j){
printf("Epic Fail");
return 0;
}
for(int t=i+1;t<i+j;t++){
if(s[t][0]!='1'||s[t][1]!='0'){
printf("Epic Fail");
return 0;
}
for(int k=2;k<s[t].size();k++)
now+=s[t][k];
}
if(j!=0)i=i+j-1;
reverse(now.begin(),now.end());
int k = 0;
for(int t=0;t<now.size();t++)
if(now[t]=='1')k=t;
if(now.size()==11&&k<7){
printf("Epic Fail");
return 0;
}
if(now.size()==16&&k<11){
printf("Epic Fail");
return 0;
}
if(now.size()==21&&k<16){
printf("Epic Fail");
return 0;
}
if(now.size()==26&&k<21){
printf("Epic Fail");
return 0;
}
if(now.size()==31&&k<26){
printf("Epic Fail");
return 0;
}
long long tmp = 1;
long long Ans = 0;
for(int t=0;t<now.size();t++){
if(now[t]=='1')Ans+=tmp;
tmp*=2;
}
ans.push_back(Ans);
now="";
}
cout<<ans.size()<<endl;
for(int i=0;i<ans.size();i++)
cout<<ans[i]<<" ";
cout<<endl;
}
Western Subregional of NEERC, Minsk, Wednesday, November 4, 2015 Problem K. UTF-8 Decoder 模拟题的更多相关文章
- Western Subregional of NEERC, Minsk, Wednesday, November 4, 2015 Problem I. Alien Rectangles 数学
Problem I. Alien Rectangles 题目连接: http://opentrains.snarknews.info/~ejudge/team.cgi?SID=c75360ed7f2c ...
- Western Subregional of NEERC, Minsk, Wednesday, November 4, 2015 Problem H. Parallel Worlds 计算几何
Problem H. Parallel Worlds 题目连接: http://opentrains.snarknews.info/~ejudge/team.cgi?SID=c75360ed7f2c7 ...
- Western Subregional of NEERC, Minsk, Wednesday, November 4, 2015 Problem F. Turning Grille 暴力
Problem F. Turning Grille 题目连接: http://opentrains.snarknews.info/~ejudge/team.cgi?SID=c75360ed7f2c70 ...
- Western Subregional of NEERC, Minsk, Wednesday, November 4, 2015 Problem C. Cargo Transportation 暴力
Problem C. Cargo Transportation 题目连接: http://opentrains.snarknews.info/~ejudge/team.cgi?SID=c75360ed ...
- Western Subregional of NEERC, Minsk, Wednesday, November 4, 2015 Problem G. k-palindrome dp
Problem G. k-palindrome 题目连接: http://opentrains.snarknews.info/~ejudge/team.cgi?SID=c75360ed7f2c7022 ...
- Western Subregional of NEERC, Minsk, Wednesday, November 4, 2015 Problem A. A + B
Problem A. A + B 题目连接: http://opentrains.snarknews.info/~ejudge/team.cgi?SID=c75360ed7f2c7022&al ...
- 2010 NEERC Western subregional
2010 NEERC Western subregional Problem A. Area and Circumference 题目描述:给定平面上的\(n\)个矩形,求出面积与周长比的最大值. s ...
- 2009-2010 ACM-ICPC, NEERC, Western Subregional Contest
2009-2010 ACM-ICPC, NEERC, Western Subregional Contest 排名 A B C D E F G H I J K L X 1 0 1 1 1 0 1 X ...
- 【GYM101409】2010-2011 ACM-ICPC, NEERC, Western Subregional Contest
A-Area and Circumference 题目大意:在平面上给出$N$个三角形,问周长和面积比的最大值. #include <iostream> #include <algo ...
随机推荐
- 让你的HTML5&CSS3网站在老IE中也能正常显示的3种方法
起初,IE其实也是一款非常有进取心的浏览器.但经过一段时间的蛰伏后,它已经成为了我们生活中的一道障碍.微软现在又重新开始向其它浏览器发起挑战,但事实情况是,新版的现代IE浏览器一直滞后于谷歌浏览器和火 ...
- c++ 函数指针简单实例
一开始看函数指针的时候我是很懵的,因为不知道它有什么用,之后慢慢就发现了自己的愚昧无知. 假设我们想实现一个数据结构,比如二叉搜索树,堆.又或者是一个快排,归并排序. 我们一般是直接在两个数要比较的时 ...
- RabbitMQ消费端消息的获取方式(.Net Core)
1[短链接]:BasicGet(String queue, Boolean autoAck) 通过request的方式独自去获取消息,断开式,一次次获取,如果返回null,则说明队列中没有消息. 隐患 ...
- Linux内核入门(六)—— __attribute__ 机制【转】
转自:https://blog.csdn.net/yunsongice/article/details/5538020 GNU C的一大特色(却不被初学者所知)就是__attribute__机制.__ ...
- 协议中UART的两种模式 【转】
转自:http://wjf88223.blog.163.com/blog/static/3516800120104179327286/ ^^…… 协议栈中UART有两种模式:1.中断2.DMA 对于这 ...
- 串口硬流控原理验证RTS与CTS
物理连接(交叉连接) 主机的RTS(输出)信号,连接到从机的CTS(输入)信号. 主机的CTS(输入)信号,连接到从机的RTS(输出)信号. 主机发送过程: 主机查询主机的CTS脚信号,此信号连接到从 ...
- 高可用的MongoDB集群【转】
刚接触MongoDB,就要用到它的集群,只能硬着头皮短时间去看文档和尝试自行搭建.迁移历史数据更是让人恼火,近100G的数据文件,导入.清理垃圾数据执行的速度蜗牛一样的慢.趁着这个时间,把这几天关于M ...
- MAC系统下Sublime Text3 配置Python3详细教程
MAC系统下Sublime Text3 配置Python3详细教程(亲测有效) https://blog.csdn.net/weixin_41768008/article/details/798590 ...
- Webservice soap wsdl区别之个人见解
原文:http://blog.csdn.net/pautcher_0/article/details/6798351 Web Service实现业务诉求:Web Service是真正“办事”的那个,提 ...
- MVC5使用EF6 Code First--创建EF数据模型(一)
此Web应用程序演示如何使用Entity Framework 6和Visual Studio 2015创建ASP.NET MVC 5应用程序.本教程使用“Code First ”即代码先行.有关如何在 ...