393. UTF-8 Validation

A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules:

  1. For 1-byte character, the first bit is a 0, followed by its unicode code.
  2. For n-bytes character, the first n-bits are all one's, the n+1 bit is 0, followed by n-1 bytes with most significant 2 bits being 10.

This is how the UTF-8 encoding would work:

Char. number range  |        UTF-8 octet sequence
(hexadecimal) | (binary)
--------------------+---------------------------------------------
0000 0000-0000 007F | 0xxxxxxx
0000 0080-0000 07FF | 110xxxxx 10xxxxxx
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Given an array of integers representing the data, return whether it is a valid utf-8 encoding.

Note:

The input is an array of integers. Only the least significant 8 bits of each integer is used to store the data. This means each integer represents only 1 byte of data.

Example 1:

data = [197, 130, 1], which represents the octet sequence: 11000101 10000010 00000001.

Return true.
It is a valid utf-8 encoding for a 2-bytes character followed by a 1-byte character.

Example 2:

data = [235, 140, 4], which represented the octet sequence: 11101011 10001100 00000100.

Return false.
The first 3 bits are all one's and the 4th bit is 0 means it is a 3-bytes character.
The next byte is a continuation byte which starts with 10 and that's correct.
But the second continuation byte does not start with 10, so it is invalid.
算法分析

算法很简单,只需要依次检查每个数字是否是在合法的范围内即可:如果一个数字在0x00~0x7F之间,说明是 1-byte 字符,检查下一个字符;如果一个数字在0xC00xDF之间,则应为2-byte字符,那么接下来的一个数字应该在0x800xBF之间;如果一个数字在0xE00xEF之间,则应为3-byte字符,那么接下来的两个数字应该在0x800xBF之间;如果一个数字在0xF00xF7之间,则应为4-byte字符,那么接下来的三个数字应该在0x800xBF之间。

Java算法实现:

public class Solution {
public boolean validUtf8(int[] data) {
int len=data.length;
int index=0;
int num,num1,num2,num3;
while(index<len){
num=data[index];
num&=0xff;
if(num>=0&&num<=0x7f){
//is 1 byte character
index++;
}
else if(num>=0xc0&&num<=0xdf){
//is 2-byte character
if(index+1<len){
num1=data[index+1];
num1&=0xff;
if(!(num1<=0xbf&&num1>=0x80)){
return false;
}
//the second byte is right
index+=2;
}
else{
return false;
}
}
else if(num>=0xe0&&num<=0xef){
//it is a 3-byte character
if(index+2<len){
num1=data[index+1];
num2=data[index+2];
num1&=0xff;
num2&=0xff;
if(!(num1>=0x80&&num1<=0xbf&&num2>=0x80&&num2<=0xbf)){
return false;
}
index+=3;
}
else{
return false;
}
}
else if(num>=0xf0&&num<=0xf7){
//is a 4-byte character
if(index+3<len){
num1=data[index+1];
num2=data[index+2];
num3=data[index+3];
num1&=0xff;
num2&=0xff;
num3&=0xff;
if(!(num1>=0x80&&num1<=0xbf&&num2>=0x80&&num2<=0xbf&&num3>=0x80&&num3<=0xbf)){
return false;
}
index+=4;
}
else{
return false;
}
}
else{
return false;
}
}
return true;
}
}

LeetCode赛题393----UTF-8 Validation的更多相关文章

  1. LeetCode赛题515----Find Largest Element in Each Row

    问题描述 You need to find the largest element in each row of a Binary Tree. Example: Input: 1 / \ 2 3 / ...

  2. LeetCode赛题----Find Left Most Element

    问题描述 Given a binary tree, find the left most element in the last row of the tree. Example 1: Input: ...

  3. LeetCode赛题395----Longest Substring with At Least K Repeating Characters

    395. Longest Substring with At least K Repeating Characters Find the length of the longest substring ...

  4. LeetCode赛题394----Decode String

    394. Decode String Given an encoded string, return it's decoded string. The encoding rule is: k[enco ...

  5. LeetCode赛题392---- Is Subsequence

    392. Is Subsequence Given a string s and a string t, check if s is subsequence of t. You may assume ...

  6. LeetCode赛题391----Perfect Rectangle

    #391. Perfect Rectangle Given N axis-aligned rectangles where N > 0, determine if they all togeth ...

  7. LeetCode赛题390----Elimination Game

    # 390. Elimination Game There is a list of sorted integers from 1 to n. Starting from left to right, ...

  8. C#LeetCode刷题-位运算

    位运算篇 # 题名 刷题 通过率 难度 78 子集   67.2% 中等 136 只出现一次的数字 C#LeetCode刷题之#136-只出现一次的数字(Single Number) 53.5% 简单 ...

  9. 这样leetcode简单题都更完了

    这样leetcode简单题都更完了,作为水题王的我开始要更新leetcode中等题和难题了,有些挖了很久的坑也将在在这个阶段一一揭晓,接下来的算法性更强,我就要开始分专题更新题目,而不是再以我的A题顺 ...

随机推荐

  1. Eclipse 的SVN 插件

    Eclipse 的SVN 插件 简介  Subversive Eclipse 团队开发的SVN 插件. Subclipse Apache 的SVN 团队开发的Eclipse 插件.   Subvers ...

  2. Windows下部署安装Docker

    好长时间没用Docker,最近准备部署一下,做个记录,今天早上去官网下载,发现Docker开始区分Docker Community Edition(社区版)和Docker Enterprise Edi ...

  3. 豆瓣电影信息爬取(json)

    豆瓣电影信息爬取(json) # a = "hello world" # 字符串数据类型# b = {"name":"python"} # ...

  4. CODEVS-新斯诺克

    原题地址:新斯诺克 题目描述 Description 斯诺克又称英式台球,是一种流行的台球运动.在球桌上,台面四角以及两长边中心位置各有一个球洞,使用的球分别为1 个白球,15 个红球和6 个彩球(黄 ...

  5. 【算法笔记】A1063 Set Similarity

    1063 Set Similarity (25 分)   Given two sets of integers, the similarity of the sets is defined to be ...

  6. P4859 已经没有什么好害怕的了

    传送门 见计数想容斥 首先题目可以简单转化一下, 求 糖果比药片能量大的组数比药片比糖果能量大的组数多 $k$ 组 的方案数 因为所有能量各不相同,所以就相当于求 糖果比药片能量大的组数为 $(n+k ...

  7. Visual Studio 跨平台開發實戰(2) - Xamarin.iOS 基本控制項介紹 (转帖)

    前言 在上一篇文章中, 我們介紹了Xamarin 以及簡單的HelloWorld範例, 這次我們針對iOS的專案目錄架構以及基本控制項進行說明. 包含UIButton,, UISlider, UISw ...

  8. jQuery $(document).ready()和JavaScript window.onload()事件的区别

    一. 在网上查了一下,发现$(document).ready()是在DOM树加载完成时触发,而window.onload()则是在整个页面全部加载完成时触发.下面是一些验证. var start=+n ...

  9. Android中Handler的使用方法及实例(基础回顾)

    Handler使用例1 这个例子是最简单的介绍handler使用的,是将handler绑定到它所建立的线程中.本次实验完成的功能是:单击Start按钮,程序会开始启动线程,并且线程程序完成后延时1s会 ...

  10. python中的生成器(二)

    一. 剖析一下生成器对象 先看一个简单的例子,我们创建一个生成器函数,然后生成一个生成器对象 def gen(): print('start ..') for i in range(3): yield ...