UTF-8和Unicode
Is it true that UPDATE Many are saying unicode is a standard not an encoding,but most editors support save as Unicode encoding actually. |
As Rasmus states in his article "The difference between UTF-8 and Unicode?" (link fixed):
If asked the question, "What is the difference between UTF-8 and Unicode?", would you confidently reply with a short and precise answer? In these days of internationalization all developers should be able to do that. I suspect many of us do not understand these concepts as well as we should. If you feel you belong to this group, you should read this ultra short introduction to character sets and encodings.
Actually, comparing UTF-8 and Unicode is like comparing apples and oranges:
UTF-8 is an encoding - Unicode is a character set
A character set is a list of characters with unique numbers (these numbers are sometimes referred to as "code points"). For example, in the Unicode character set, the number for A is 41.
An encoding on the other hand, is an algorithm that translates a list of numbers to binary so it can be stored on disk. For example UTF-8 would translate the number sequence 1, 2, 3, 4 like this:
00000001 00000010 00000011 00000100
Our data is now translated into binary and can now be saved to disk.
All together now
Say an application reads the following from the disk:
1101000 1100101 1101100 1101100 1101111
The app knows this data represent a Unicode string encoded with UTF-8 and must show this as text to the user. First step, is to convert the binary data to numbers. The app uses the UTF-8 algorithm to decode the data. In this case, the decoder returns this:
104 101 108 108 111
Since the app knows this is a Unicode string, it can assume each number represents a character. We use the Unicode character set to translate each number to a corresponding character. The resulting string is "hello".
Conclusion
So when somebody asks you "What is the difference between UTF-8 and Unicode?", you can now confidently answer short and precise:
UTF-8 and Unicode cannot be compared. UTF-8 is an encoding used to translate numbers into binary data. Unicode is a character set used to translate characters into numbers.
|
UTF-8和Unicode的更多相关文章
- Unicode、UTF-8 和 ISO8859-1到底有什么区别
说明:本文转载于新浪博客,旨在方便知识总结.原文地址:http://blog.sina.com.cn/s/blog_673c81990100t1lc.html 本文主要包括以下几个方面:编码基本知识, ...
- UNICODE UTF编码方式解析
先明确几个概念 基础概念部分 1.字符编码方式CEF(Character Encoding Form) 对符号进行编码,便于处理与显示 常用的编码方式有 GB2312(汉字国标码 2字节) ASCII ...
- RapidJSON 代码剖析(三):Unicode 的编码与解码
根据 RFC-7159: 8.1 Character Encoding JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32. The defa ...
- Unicode与UTF8相互转化(使用MultiByteToWideChar)
1.简述 最近在发送网络请求时遇到了中文字符乱码的问题,在代码中调试字符正常,用抓包工具抓的包中文字符显示正常,就是发送到服务器就显示乱码了,那就要将客户端和服务器设置统一的编码(UTF-8),而我们 ...
- Unicode其实是Latin1的扩展。只有一个低字节的Uncode字符其实就是Latin1字符——附各种字符编码表及转换表
一.概念 1,ASCII ASCII(American Standard Code for Information Interchange),中文名称为美国信息交换标准代码.是 ...
- 关于JAVA字符编码:Unicode,ISO-8859-1,GBK,UTF-8编码及相互转换
我们最初学习计算机的时候,都学过ASCII编码. 但是为了表示各种各样的语言,在计算机技术的发展过程中,逐渐出现了很多不同标准的编码格式, 重要的有Unicode.UTF.ISO-8859-1和中国人 ...
- ASCII码、ISO8859-1、Unicode、GBK和UTF-8 的区别
为什么需要编码? 计算机中最小的存储单位是字节(byte),一个字节所能表示的字符数又有限,1byte=8bit,一个字节最多也只能表示255个字符,而世界上的语种又多,都有各种不同的字符,无法用一个 ...
- 精确解释Unicode
来自:http://blog.csdn.net/gqqnb/article/details/6266542 ---------------------------------------------- ...
- Unicode与UTF-8/UTF-16/UTF-32的区别
Unicode的最初目标,是用1个16位的编码来为超过65000字符提供映射.但这还不够,它不能覆盖全部历史上的文字,也不能解决传输的问题 (implantation head-ache's),尤其在 ...
- 使用 WideCharToMultiByte Unicode 与 UTF-8互转
1.简述 最近在发送网络请求时遇到了中文字符乱码的问题,在代码中调试字符正常,用抓包工具抓的包中文字符显示正常,就是发送到服务器就显示乱码了,那就要将客户端和服务器设置统一的编码(UTF-8),而我们 ...
随机推荐
- x01.Game.Main: 从零开始
一切从零开始,一切皆有可能. 浅墨,90后,<逐梦之旅>深入浅出,堪比大师. 1.安装 DXSDK_June10.exe 或更新版本. 2.运行 vs2012,新建 VC Win32 空项 ...
- Java Eclipse解决中文字体太小
解决方式有两种: 一.把字体设置为Courier New 操作步骤:打开Elcipse,点击菜单栏上的“Windows”——点击“Preferences”——点击“Genneral”——点击 ...
- C语言中怎么将文件里的数据创建到(读到)链表中?
定义的结构体: struct student { ]; //学生学号 ]; //学生姓名 struct student *next; //next 指针 指向 struct student 类型的变量 ...
- [转]Stop Sharing Session State between Multiple Tabs of Browser
本文转自:http://jinaldesai.net/stop-sharing-session-state-between-multiple-tabs-of-browser/ Scenario: By ...
- Lucene 4.x Spellcheck使用说明
Spellcheck是Lucene新版本的功能,在介绍spellcheck之前,我们需要弄清楚Spellcheck支持几种数据源.Spellcheck构造函数需要传入Dictionary接口: pac ...
- MMORGP大型游戏设计与开发(客户端架构 part13 of vegine)
一些数据是需要不断改动的,程序不可能因为这些改动而不厌其烦的去改动代码,早期的这种做法就成了程序员们最悲哀的痛苦.自从有了数据管理后,程序的世界逐渐清晰,这些烦恼也不再出现,不过若是要很好的管理数据可 ...
- MMORPG大型游戏设计与开发(客户端架构 part6 of vegine)
客户端的变量模块部分主要是将一些常用可变的值集中管理,如窗口的大小,是否开启音乐,音量的大小等等.这些变量通常会应该到客户端的操作,一般来说变量改变的时候会调用一个回调进行处理.下面我们就看看该模块的 ...
- 为什么 Java 不提供无符号类型呢?
网上查资料,无意中找到一个java写的开源论坛,用的人还挺多 http://jforum.net/ 查MD5,了解到 Java getBytes方法详解(字符集问题) http://liushilan ...
- u3d_shader_surface_shader_5
CubeMap 的实现 参考: http://blog.csdn.net/candycat1992/article/details/21827365 制作cubeMap三维纹理,surface ...
- u3d_shader_surface_shader_1
http://docs.unity3d.com/Manual/SL-SurfaceShaders.html 一:surface shader是啥 Writing shaders that intera ...