The bytes/str dichotomy in Python 3 [transport]
reference and transporting from: http://eli.thegreenplace.net/2012/01/30/the-bytesstr-dichotomy-in-python-3/
Arguably the most significant new feature of Python 3 is a much cleaner separation between text and binary data. Text is always Unicode and is represented by the str type, and binary data is represented by the bytes type. What makes the separation particularly clean is that str and bytes can't be mixed in Python 3 in any implicit way. You can't concatenate them, look for one inside another, and generally pass one to a function that expects the other. This is a good thing.
However, boundaries between strings and bytes are inevitable, and this is where the following diagram is always important to keep in mind:

Strings can be encoded to bytes, and bytes can be decoded back to strings.
>>> '€20'.encode('utf-8')
b'\xe2\x82\xac20'
>>> b'\xe2\x82\xac20'.decode('utf-8')
'€20'
Think of it this way: a string is an abstract representation of text. A string consists of characters, which are also abstract entities not tied to any particular binary representation. When manipulating strings, we're living in blissful ignorance. We can split and slice them, concatenate and search inside them. We don't care how they are represented internally and how many bytes it takes to hold each character in them. We only start caring about this when encoding strings into bytes (for example, in order to send them over a communication channel), or decoding strings from bytes (for the other direction).
The argument given to encode and decode is the encoding (or codec). The encoding is a way to represent abstract characters in binary data. There are many possible encodings. UTF-8, shown above, is one. Here's another:
>>> '€20'.encode('iso-8859-15')
b'\xa420'
>>> b'\xa420'.decode('iso-8859-15')
'€20'
>>> '你好啊,傻傻分不清'.encode('utf-8')
b'\xe4\xbd\xa0\xe5\xa5\xbd\xe5\x95\x8a\xef\xbc\x8c\xe5\x82\xbb\xe5\x82\xbb\xe5\x88\x86\xe4\xb8\x8d\xe6\xb8\x85'
>>> b'\xe4\xbd\xa0\xe5\xa5\xbd\xe5\x95\x8a\xef\xbc\x8c\xe5\x82\xbb\xe5\x82\xbb\xe5\x88\x86\xe4\xb8\x8d\xe6\xb8\x85'.decode('utf-8')
'你好啊,傻傻分不清'
The encoding is a crucial part of this translation process. Without the encoding, the bytes object b'\xa420' is just a bunch of bits. The encoding gives it meaning. Using a different encoding, this bunch of bits can have a different meaning:
>>> b'\xa420'.decode('windows-1255')
'₪20'
That's 80% of the money lost due to using the wrong encoding, so be careful.
The bytes/str dichotomy in Python 3 [transport]的更多相关文章
- The bytes/str dichotomy in Python 3
The bytes/str dichotomy in Python 3 - Eli Bendersky's website https://eli.thegreenplace.net/2012/01/ ...
- 小白的Python之路 day1 Python3的bytes/str之别
原文:The bytes/str dichotomy in Python 3 Python 3最重要的新特性大概要算是对文本和二进制数据作了更为清晰的区分.文本总是Unicode,由str类型表示,二 ...
- python2 与python3中最大的区别(编码问题bytes&str
1,在python2.x 中是不区分bytes和str类型的,在python3中bytes和str中是区分开的,str的所有操作bytes都支持 python2 中 >>> s = ...
- Python3的bytes/str之别
Python 3最重要的新特性大概要算是对文本和二进制数据作了更为清晰的区分.文本总是Unicode,由str类型表示,二进制数据则由bytes类型表示.Python 3不会以任意隐式的方式混用str ...
- python bytes/str
http://eli.thegreenplace.net/2012/01/30/the-bytesstr-dichotomy-in-python-3/
- repr. str, ascii in Python
repr和str a="Hello" print(str(a)) print(repr(a)) 结果: Hello 'Hello' 可以看出,repr的结果中多了左右两个引号. r ...
- python——TypeError: 'str' does not support the buffer interface
import socket import sys port=51423 host="localhost" data=b"x"*10485760 #在字符串前加 ...
- Python学习之路-Day2-Python基础2
Python学习之路第二天 学习内容: 1.模块初识 2.pyc是什么 3.python数据类型 4.数据运算 5.bytes/str之别 6.列表 7.元组 8.字典 9.字符串常用操作 1.模块初 ...
- Centos7 环境下 Python2.7 换成 Python3.7 运行 scrapy 应用所遇到的问题记录
参考网友的安装过程 Linux系统Centos安装Python3.7 设置Python默认为Python3.7 mv /usr/bin/python /usr/bin/python.bak ln -s ...
随机推荐
- Java 编程规范,常见规范,命名规范,复杂度
方法/步骤 1. *不允许把多个短语句写在一行中,即一行只写一条语句 1. 示例:如下例子不符合规范. LogFilename now = null; LogFilename t ...
- JS 数组的一些方法
1.push() //可以接受任意参数,然后添加到数组的末尾 2.pop()//栈方法,在数组末尾删除一条数据,并且返回这条数据 3.shift()//队列方法,与pop()相似,但是与其相反,在数组 ...
- 【推荐系统】Netflix 推荐系统:第二部分
原文链接:http://techblog.netflix.com/2012/06/netflix-recommendations-beyond-5-stars.htm 在 blog 的第一部分,我们详 ...
- 实现Unity对Dictionary的序列化
若有尝试过想在unity的inspector检视面板中像List或者数组那样可以编辑Dictionary变量的童鞋应该知道,Dictionary变量不会出现在inspector中,unity并不会直接 ...
- JSP注释
------------------siwuxie095 在 JSP 文件中可以使用 HTML 注释 HTML 注释使用 < ...
- C++中const型数据的小结
由于与对象又管的const型数据种类较多,形式又有些相似,往往难记,容易混淆,因此总结一下相关用法,具体用法可查看下方链接 C++中对象的常引用 C++中指向对象的常指针和指向常对象的指针 C++中的 ...
- p2590&bzoj1036 树的统计
传送门(洛谷) 传送门(bzoj) 题目 一棵树上有n个节点,编号分别为1到n,每个节点都有一个权值w.我们将以下面的形式来要求你对这棵树完成一些操作: I. CHANGE u t : 把结点u的权值 ...
- 在Visual Studio开发的项目中引用GAC中的dll
Open the windows Run dialog (Windows Key + r) Type C:\Windows\assembly\gac_msil. This is some sort o ...
- Multi-task Pose-Invariant Face Recognition 论文笔记
摘要: 在不受限制的环境中拍摄的人脸图像通常包含显著的姿态变化,这会显著降低设计用于识别正面的算法的性能.本文提出了一种新颖的面部识别框架,能够处理±90°偏航范围内的全方位姿势变化.所提出的框架首先 ...
- 更改数据,ExecuteNonQuery()
using (mycon) { mycon.Open(); string MyTime; DateTime dtDate; MyTime = textBox1.Text.ToString(); str ...