The bytes/str dichotomy in Python 3 [transport]
reference and transporting from: http://eli.thegreenplace.net/2012/01/30/the-bytesstr-dichotomy-in-python-3/
Arguably the most significant new feature of Python 3 is a much cleaner separation between text and binary data. Text is always Unicode and is represented by the str type, and binary data is represented by the bytes type. What makes the separation particularly clean is that str and bytes can't be mixed in Python 3 in any implicit way. You can't concatenate them, look for one inside another, and generally pass one to a function that expects the other. This is a good thing.
However, boundaries between strings and bytes are inevitable, and this is where the following diagram is always important to keep in mind:
Strings can be encoded to bytes, and bytes can be decoded back to strings.
>>> '€20'.encode('utf-8')
b'\xe2\x82\xac20'
>>> b'\xe2\x82\xac20'.decode('utf-8')
'€20'
Think of it this way: a string is an abstract representation of text. A string consists of characters, which are also abstract entities not tied to any particular binary representation. When manipulating strings, we're living in blissful ignorance. We can split and slice them, concatenate and search inside them. We don't care how they are represented internally and how many bytes it takes to hold each character in them. We only start caring about this when encoding strings into bytes (for example, in order to send them over a communication channel), or decoding strings from bytes (for the other direction).
The argument given to encode and decode is the encoding (or codec). The encoding is a way to represent abstract characters in binary data. There are many possible encodings. UTF-8, shown above, is one. Here's another:
>>> '€20'.encode('iso-8859-15')
b'\xa420'
>>> b'\xa420'.decode('iso-8859-15')
'€20'
>>> '你好啊,傻傻分不清'.encode('utf-8')
b'\xe4\xbd\xa0\xe5\xa5\xbd\xe5\x95\x8a\xef\xbc\x8c\xe5\x82\xbb\xe5\x82\xbb\xe5\x88\x86\xe4\xb8\x8d\xe6\xb8\x85'
>>> b'\xe4\xbd\xa0\xe5\xa5\xbd\xe5\x95\x8a\xef\xbc\x8c\xe5\x82\xbb\xe5\x82\xbb\xe5\x88\x86\xe4\xb8\x8d\xe6\xb8\x85'.decode('utf-8')
'你好啊,傻傻分不清'
The encoding is a crucial part of this translation process. Without the encoding, the bytes object b'\xa420' is just a bunch of bits. The encoding gives it meaning. Using a different encoding, this bunch of bits can have a different meaning:
>>> b'\xa420'.decode('windows-1255')
'₪20'
That's 80% of the money lost due to using the wrong encoding, so be careful.
The bytes/str dichotomy in Python 3 [transport]的更多相关文章
- The bytes/str dichotomy in Python 3
The bytes/str dichotomy in Python 3 - Eli Bendersky's website https://eli.thegreenplace.net/2012/01/ ...
- 小白的Python之路 day1 Python3的bytes/str之别
原文:The bytes/str dichotomy in Python 3 Python 3最重要的新特性大概要算是对文本和二进制数据作了更为清晰的区分.文本总是Unicode,由str类型表示,二 ...
- python2 与python3中最大的区别(编码问题bytes&str
1,在python2.x 中是不区分bytes和str类型的,在python3中bytes和str中是区分开的,str的所有操作bytes都支持 python2 中 >>> s = ...
- Python3的bytes/str之别
Python 3最重要的新特性大概要算是对文本和二进制数据作了更为清晰的区分.文本总是Unicode,由str类型表示,二进制数据则由bytes类型表示.Python 3不会以任意隐式的方式混用str ...
- python bytes/str
http://eli.thegreenplace.net/2012/01/30/the-bytesstr-dichotomy-in-python-3/
- repr. str, ascii in Python
repr和str a="Hello" print(str(a)) print(repr(a)) 结果: Hello 'Hello' 可以看出,repr的结果中多了左右两个引号. r ...
- python——TypeError: 'str' does not support the buffer interface
import socket import sys port=51423 host="localhost" data=b"x"*10485760 #在字符串前加 ...
- Python学习之路-Day2-Python基础2
Python学习之路第二天 学习内容: 1.模块初识 2.pyc是什么 3.python数据类型 4.数据运算 5.bytes/str之别 6.列表 7.元组 8.字典 9.字符串常用操作 1.模块初 ...
- Centos7 环境下 Python2.7 换成 Python3.7 运行 scrapy 应用所遇到的问题记录
参考网友的安装过程 Linux系统Centos安装Python3.7 设置Python默认为Python3.7 mv /usr/bin/python /usr/bin/python.bak ln -s ...
随机推荐
- Java中String和byte[]间的 转换
数据库的字段中使用了blob类型时,在entity中此字段可以对应为byte[] 类型,保存到数据库中时需要把传入的参数转为byte[]类型,读取的时候再通过将byte[]类型转换为String类型. ...
- 机器学习:k-NN算法(也叫k近邻算法)
一.kNN算法基础 # kNN:k-Nearest Neighboors # 多用于解决分裂问题 1)特点: 是机器学习中唯一一个不需要训练过程的算法,可以别认为是没有模型的算法,也可以认为训练数据集 ...
- spark提交模式
spark基本的提交语句: ./bin/spark-submit \ --class <main-class> \ --master <master-url> \ --depl ...
- 【转】深刻理解render 和 redirect_to
由于最近老是在表单提交后出现没有反应的现象,发现是在action中的使用render 和 redirect_to的原因,于是就想搞清楚他两真正的区别在哪里,上一遍的blog也谈到了这二者的区别,但是有 ...
- Python模块-shelve模块
shelve模块也是用来序列化的,可以持久化任何pickle可支持的python数据格式,比pickle好用,也是python专属,可以dump多次数据,也可以直接修改数据 序列化 # -*- cod ...
- Python模块-requests(二)
会话对象 会话对象能够跨请求保持某些参数. 它也会在同一个 Session 实例发出的所有请求之间保持 cookie, 期间使用 urllib3 的 connection pooling 功能. 所以 ...
- docker 部署服务时,node(结点)显示no such image
1. 问题描述 ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS 9cn5x84lnmga getstartedlab_web.1 ...
- iOS开发者福利之精品源码汇总!免费下载
汇总一些看着不错的源码,有需要的朋友过来下载吧!{:4_102:} 1.用swift制作的色彩炫丽的进度条-KDCircularProgressKDCircularProgress是使用swift制作 ...
- LAMP 1.2 Apache编译安装
1.下载 ...
- OpenGL 使用GLFW创建全屏窗口
OpenGL 使用GLFW创建全屏窗口 GLFW库里面的glfwCreateWindow()函数是用来创建窗口的函数. 这样函数的原型是: GLFWwindow* glfwCreateWindow(i ...