A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules:

For 1-byte character, the first bit is a 0, followed by its unicode code.
For n-bytes character, the first n-bits are all one's, the n+1 bit is 0, followed by n-1 bytes with most significant 2 bits being 10.
This is how the UTF-8 encoding would work: Char. number range | UTF-8 octet sequence
(hexadecimal) | (binary)
--------------------+---------------------------------------------
0000 0000-0000 007F | 0xxxxxxx
0000 0080-0000 07FF | 110xxxxx 10xxxxxx
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
Given an array of integers representing the data, return whether it is a valid utf-8 encoding. Note:
The input is an array of integers. Only the least significant 8 bits of each integer is used to store the data. This means each integer represents only 1 byte of data. Example 1: data = [197, 130, 1], which represents the octet sequence: 11000101 10000010 00000001. Return true.
It is a valid utf-8 encoding for a 2-bytes character followed by a 1-byte character.
Example 2: data = [235, 140, 4], which represented the octet sequence: 11101011 10001100 00000100. Return false.
The first 3 bits are all one's and the 4th bit is 0 means it is a 3-bytes character.
The next byte is a continuation byte which starts with 10 and that's correct.
But the second continuation byte does not start with 10, so it is invalid.

这道题题干给出了判断 one single UTF-8 char的方法,然后给一个UTF-8 char sequence,判断是不是正确sequence. (读题读了很久)

这道题关键是要学到用 & 取出一个bit sequence当中几位的方法

二进制数表示法:在前面加 0b, 八进制加0o, 十六进制加0x

 public class Solution {
public boolean validUtf8(int[] data) {
if (data==null || data.length==0) return false;
for (int i=0; i<data.length; i++) {
if (data[i] > 255) return false;
int moreChecks = 0; //moreCheck is the number of more bytes that need to check for this char
if ((data[i] & 0b10000000) == 0) moreChecks = 0;
else if ((data[i] & 0b11100000) == 0b11000000) moreChecks = 1;
else if ((data[i] & 0b11110000) == 0b11100000) moreChecks = 2;
else if ((data[i] & 0b11111000) == 0b11110000) moreChecks = 3;
else return false;
for (int j=1; j<=moreChecks; j++) {
if (i+j >= data.length) return false;
if ((data[i+j] & 0b11000000) != 0b10000000) return false;
}
i = i + moreChecks;
}
return true;
}
}

Leetcode: UTF-8 Validation的更多相关文章

  1. [LeetCode] 393. UTF-8 Validation 编码验证

    A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules: For 1-byte char ...

  2. [LeetCode] UTF-8 Validation 编码验证

    A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules: For 1-byte char ...

  3. LeetCode赛题393----UTF-8 Validation

    393. UTF-8 Validation A character in UTF8 can be from 1 to 4 bytes long, subjected to the following ...

  4. 【LeetCode】393. UTF-8 Validation 解题报告(Python)

    作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 题目地址: https://leetcode.com/problems/utf-8-va ...

  5. LeetCode All in One 题目讲解汇总(持续更新中...)

    终于将LeetCode的免费题刷完了,真是漫长的第一遍啊,估计很多题都忘的差不多了,这次开个题目汇总贴,并附上每道题目的解题连接,方便之后查阅吧~ 477 Total Hamming Distance ...

  6. leetcode N-Queens/N-Queens II, backtracking, hdu 2553 count N-Queens, dfs 分类: leetcode hdoj 2015-07-09 02:07 102人阅读 评论(0) 收藏

    for the backtracking part, thanks to the video of stanford cs106b lecture 10 by Julie Zelenski for t ...

  7. 393. UTF-8 Validation

    393. UTF-8 Validation 这个题很明确,刚开始我以为只能是一个utf,长度大于5的都判断为false,后来才明白题意. 有个小trick,就是长度大于1的时候,判断第一个数字开始1的 ...

  8. LeetCode 53. Maximum Subarray(最大的子数组)

    Find the contiguous subarray within an array (containing at least one number) which has the largest ...

  9. LeetCode 26. Remove Duplicates from Sorted Array (从有序序列里移除重复项)

    Given a sorted array, remove the duplicates in place such that each element appear only once and ret ...

随机推荐

  1. 大话数据结构(五)(java程序)——顺序存储结构的插入与删除

    获得元素操作 对于线性表的顺序存储结构来说,我们要实现getElement操作,即将线性表的第i个位置元素返回即可 插入操作 插入算法思路: 1.如果插入位置不合理,抛出异常 2.如果插入表的长度大于 ...

  2. JBoss的安装与配置(对应eclipse配置)【转】

    安装JBoss纯粹是目的就是学习EJB3...至少现在是这样的 EJB需要运行在EJB容器中.每个J2EE应用服务器都含有EJB容器和Web容器.这样,既支持运行EJB,也可以运行Web应用 目前EJ ...

  3. 将真彩色转换成增强色的方法(即RGB32位或RGB24位颜色转换成RGB16位颜色的函数)

    今天由于程序需要,需要将真彩色转换成增强色进行颜色匹配,上网搜了一下没搜到相应函数,于是研究了一下RGB16位的增强色,写了这个函数: public static int RGB16(int argb ...

  4. MVC3中几个小知识点

    1.ViewBag.Name~ViewBag.name等价,即不区分大小写.在此小心,下次见到不要奇怪,不过最好还是写成一样的比较好. 2.JS字符串不允许有换行符,\'等字符,需提前处理.

  5. uploadify

    uploadify 返回值(回调函数)总结: 最近使用开发一个图片上传模块的时候使用了一个jq插件--uploadify,但是下面就是让人很苦逼的一个下午……一直调试不好,无法接收返回值.google ...

  6. JavaScript函数小结

    JS基础知识 /********************** 1:基础知识 1 创建脚本块 1: <script language=”JavaScript”> 2: JavaScript ...

  7. zepto源码--qsa--学习笔记

    zepto内部选择器qsa方法的实现. 简述实现原理: 通过判断传入的参数类型: 如果是'#id',则使用getElementById(id)来获取元素,并且将结果包装成数组形式: 如果是'.clas ...

  8. sublime修改TAB缩进

    菜单:Preferences ->Settings – User 添加配置信息: "tab_size": 4, "translate_tabs_to_spaces& ...

  9. android通过pc脚本执行sqlite3脚本

    最近在调研市面上的一些android db框架,需要经常重复的输入一堆比如 adb shell cd /data/data/com.example.testandroiddb/databases sq ...

  10. android Listview item 中有button,item就不响应触摸事件

    为button设置 beanButton.getButton().setFocusable(false); beanButton.getButton().setFocusableInTouchMode ...