reference and transporting from: http://eli.thegreenplace.net/2012/01/30/the-bytesstr-dichotomy-in-python-3/

Arguably the most significant new feature of Python 3 is a much cleaner separation between text and binary data. Text is always Unicode and is represented by the str type, and binary data is represented by the bytes type. What makes the separation particularly clean is that str and bytes can't be mixed in Python 3 in any implicit way. You can't concatenate them, look for one inside another, and generally pass one to a function that expects the other. This is a good thing.

However, boundaries between strings and bytes are inevitable, and this is where the following diagram is always important to keep in mind:

Strings can be encoded to bytes, and bytes can be decoded back to strings.

>>> '€20'.encode('utf-8')
b'\xe2\x82\xac20'
>>> b'\xe2\x82\xac20'.decode('utf-8')
'€20'

Think of it this way: a string is an abstract representation of text. A string consists of characters, which are also abstract entities not tied to any particular binary representation. When manipulating strings, we're living in blissful ignorance. We can split and slice them, concatenate and search inside them. We don't care how they are represented internally and how many bytes it takes to hold each character in them. We only start caring about this when encoding strings into bytes (for example, in order to send them over a communication channel), or decoding strings from bytes (for the other direction).

The argument given to encode and decode is the encoding (or codec). The encoding is a way to represent abstract characters in binary data. There are many possible encodings. UTF-8, shown above, is one. Here's another:

>>> '€20'.encode('iso-8859-15')
b'\xa420'
>>> b'\xa420'.decode('iso-8859-15')
'€20'
>>> '你好啊,傻傻分不清'.encode('utf-8')
b'\xe4\xbd\xa0\xe5\xa5\xbd\xe5\x95\x8a\xef\xbc\x8c\xe5\x82\xbb\xe5\x82\xbb\xe5\x88\x86\xe4\xb8\x8d\xe6\xb8\x85'
>>> b'\xe4\xbd\xa0\xe5\xa5\xbd\xe5\x95\x8a\xef\xbc\x8c\xe5\x82\xbb\xe5\x82\xbb\xe5\x88\x86\xe4\xb8\x8d\xe6\xb8\x85'.decode('utf-8')
'你好啊,傻傻分不清'

The encoding is a crucial part of this translation process. Without the encoding, the bytes object b'\xa420' is just a bunch of bits. The encoding gives it meaning. Using a different encoding, this bunch of bits can have a different meaning:

>>> b'\xa420'.decode('windows-1255')
'₪20'

That's 80% of the money lost due to using the wrong encoding, so be careful.

The bytes/str dichotomy in Python 3 [transport]的更多相关文章

  1. The bytes/str dichotomy in Python 3

    The bytes/str dichotomy in Python 3 - Eli Bendersky's website https://eli.thegreenplace.net/2012/01/ ...

  2. 小白的Python之路 day1 Python3的bytes/str之别

    原文:The bytes/str dichotomy in Python 3 Python 3最重要的新特性大概要算是对文本和二进制数据作了更为清晰的区分.文本总是Unicode,由str类型表示,二 ...

  3. python2 与python3中最大的区别(编码问题bytes&str

    1,在python2.x 中是不区分bytes和str类型的,在python3中bytes和str中是区分开的,str的所有操作bytes都支持 python2 中 >>> s = ...

  4. Python3的bytes/str之别

    Python 3最重要的新特性大概要算是对文本和二进制数据作了更为清晰的区分.文本总是Unicode,由str类型表示,二进制数据则由bytes类型表示.Python 3不会以任意隐式的方式混用str ...

  5. python bytes/str

    http://eli.thegreenplace.net/2012/01/30/the-bytesstr-dichotomy-in-python-3/

  6. repr. str, ascii in Python

    repr和str a="Hello" print(str(a)) print(repr(a)) 结果: Hello 'Hello' 可以看出,repr的结果中多了左右两个引号. r ...

  7. python——TypeError: 'str' does not support the buffer interface

    import socket import sys port=51423 host="localhost" data=b"x"*10485760 #在字符串前加 ...

  8. Python学习之路-Day2-Python基础2

    Python学习之路第二天 学习内容: 1.模块初识 2.pyc是什么 3.python数据类型 4.数据运算 5.bytes/str之别 6.列表 7.元组 8.字典 9.字符串常用操作 1.模块初 ...

  9. Centos7 环境下 Python2.7 换成 Python3.7 运行 scrapy 应用所遇到的问题记录

    参考网友的安装过程 Linux系统Centos安装Python3.7 设置Python默认为Python3.7 mv /usr/bin/python /usr/bin/python.bak ln -s ...

随机推荐

  1. mysqllog

    -- mysql delete log online 1  mysql命令purge mysql> purge master logs to "mysql-bin.000410&quo ...

  2. jenkins pipline 用法收集

    1.下载多个项目 node { stage('clone'){ dir('test1'){ checkout([$class: 'GitSCM', branches: [[name: '*/maste ...

  3. Erlang pool management -- Emysql pool

    从这篇开始,这一系列主要分析在开源社区中,Erlang 相关pool 的管理和使用. 在开源社区,Emysql 是Erlang 较为受欢迎的一个MySQL 驱动. Emysql 对pool 的管理和使 ...

  4. HDUj2612(两个起点找到最近的目的地)

    Find a way Time Limit: 3000/1000 MS (Java/Others)    Memory Limit: 32768/32768 K (Java/Others)Total ...

  5. HDOJ1677(铺砖问题)

    Nested Dolls Time Limit: 3000/1000 MS (Java/Others)    Memory Limit: 32768/32768 K (Java/Others)Tota ...

  6. VisualGDB系列4:概述-Linux程序与VS

    根据VisualGDB官网(https://visualgdb.com)的帮助文档大致翻译而成.主要是作为个人学习记录.有错误的地方,Robin欢迎大家指正. 本文将会阐述如何使用VisualGDB来 ...

  7. shell入门-grep-3-egrep

    grep -E == egrep [root@wangshaojun ~]# grep --color 'r\?o' 1.txt == egrep --color 'r?o' 1.txt ^C[roo ...

  8. Linux查看并修改mysql的编码

    系统:阿里云 一.查看mysql字符集 输入:show variables like 'character_set_%'; 二.修改某一个数据库的编码 输入:alter database dbname ...

  9. 13. CTF综合靶机渗透(六)

    靶机说明 Breach1.0是一个难度为初级到中级的BooT2Root/CTF挑战. VM虚机配置有静态IP地址(192.168.110.140),需要将虚拟机网卡设置为host-only方式组网,并 ...

  10. hdu1069

    #include <iostream> #include <algorithm> #include <cstring> using namespace std; c ...