以下是小白的爬虫学习历程中遇到并解决的一些困难,希望写出来给后来人,如有疏漏恳请大牛指正,不胜感谢!

  首先,我的代码是这样的


 import requests

 url = 'http://www.acfun.tv/'
html = requests.get(url) print(html.text)

python2中解决方法(题外话)

参考:http://www.cnblogs.com/zhaoyl/p/3770340.html

在前面加上以下代码即可

import sys
reload(sys) # Python2.5 初始化后会删除 sys.setdefaultencoding 这个方法,我们需要重新载入
sys.setdefaultencoding('utf-8')

一般就能解决了

而在Python3中:

如果在控制台中运行,就遇到了如下的UnicodeEncodeError:

  

aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAoUAAABwCAIAAADni50AAAAMKklEQVR4nO3dzbHruhGFUYZ103IQHjoNT50CI3BCb+KB66koAr2x8UeQ4rfqlIsHBBsNSERf6enI23///c/Pz3/+9Y8NAABcj3oMAMB61GMAANajHgMAsB71GACA9ajHAACsRz0GAGA96jEAAOvdvB7vf1s1tD67JDEAwA/qr8cDa1I21Kiyp4tr1F4syZ1ZAQCwbZX1OPuicGC9nF2PG+KLs9RjAMAwaT3eD06ddXu2Jb3k+GvaTfTPtotGP09xiR43279nHQAAL3Wqx81FIip1bcdpnP27JDcH1HkWZ9ewPj1zBwC8hajHtU7Xjq3Hp+NRAUW3qol0Xk49BoC3q6rH+990h/7jYszOIufXQlG8qccAgGGq3q8u1uPNKJnHONlS55TA/iKXhormG7WY61O7DgCAN6r6PJfpWN5EKKeenVrS42z/KH42z+y44vJjZ51tmqe5DgCA15n3fSC6zLynCLEOAICySfU4ekWoXyn+HtYBAGC5+fdlAgDwCtRjAADWox4DALAe9RgAgPWoxwAArJf9PpDrP/07arieOHzaeapRHzWf/eTkaQBgjbQeb99fWXVZJqP2wbY4U3fhp+zvF+QZrXPt+o/619u84ABQJ/t+9ZL9aGE9nl2MH7G5X5PnfeqxjpA9O/V1OYC3M+vxcSc6nd0Pspc0tPfHjyaS0jtvZz4ijkhmn7bOzqScPBvyOZ712/Xozfk48xWh/DwBwOW/Pv40RrvSsbH2WJ9qi2NuncXJRjEb5uXIrnPP2s7IU8TRAas6mwmYudXOtzYlAOhSW4+j8nA8G0VIO0f7Y3P8U39nBcRkN7seF6fsZHLKPx3XXwenv5m/SFLnrxfWD2gmEI3bPGJtSgDQpeH1cdTo14lsJtnLa+NH12rR5r7V1ONiKJO/+J3tpz5VeTbEb0vYTGD44yKCVOUJAK4L6rFzHJ0aFacoCnJKoycfM5mp6zwqz4b4otEcVPQfO1+dZFWeAOAy//+Ps42nU1tu+4s2O705Zod24p9aqnZPcWHzvLJZFXPQQcxxxfpk51iVpzNudrj0qqjdGTcbR4wbhUobnRUAgJH4fi4AANajHgMAsB71GACA9ajHAACsRz0GAGA96jEAAOtRjwEAWK+zHqd/DFr8a07/L02dobOjjBqi303SSM1LTDwuDXGiCPNWdeyTZ2CeQ0L5U6tdh3s+z4En6X99rPffeftpNG5tPRi4iUytHCJING4x4IwNtHb9/WhOe6exyY8KMjBObah5nQGcja3Hm13eJtXjNJn01937F0NPPneIXxz3ESXt4i1+eD0eYmwmU0vsfRYNeJ7hr49P7X610P2jodMDcXkUP0ome0l2y94TUag0majRzzO6RASJhohGNNfh8+vpoJmecrbR6a+H23OPV1vmfp46jgguhhCXpMHFQMV5FRMGUDaqHke3aHRzZjcRfYmfzJALnW3LSfu4MulBbUCdZ3FeDevcM/d+elVnJLkfNKddm6d5edq+9z2vivm05T/v+QD8snmvj9Ozon1PNGQihmu4dsh2FvUZFVB0q5pI5+U9y+6PLtp1YkfmcGb/2vw71z+bZ5q2HqvqQczmr9dzyLoBbzT8vx+bZ8U+1UOHLW6yw7ezqE9PQL9b2r/2wv5UOzmrOjCx4ZMy86wKsn0/oFvH86qYj8jfTBWA64b1uPOWFqH2vzmXO3GyU3Cm1rw/iuGc/jp+dn1q10Go6hwlKTLxj53hzPWpyr82n2KcPX4O9By35V+7OAC+DPz7Y3EqvY11e8eEBoQ6bm0iTrQfpftXdgvLBo86izyz4zrzSofIxmlYB5FAQ+fsGp7axbrp5PW4w/MXeZoBs9dG6xMln+0v4vjx/ekAOOP7uSJ6f3nP7jNqHZ6+Yk/PH8DdUY+zii8yXrI1sw4AcBHqMQAA61GPAQBYj3oMAMB61GMAANajHgMAsN5T6vHAj/Jm/5hyVNj7fOT4PpkAAMp+sh5X/b2siNxQz0blWTtofzIAgJWeUo99ur6mp6L+bcXMv2rs6/LZowAA5rq4Hn/Kw+mt3eOvpxKSfR94/xa1p6eKKek4Ir4eJdutOIUo7ClnkY+fEgBgpetfH6fl7XScLS2iPunSZbbvQUmOgqT99ShmPs68zPk2pAQAWOZu9disc8PrcTHsqXFqPT7Knq0akXoMAA9wt3rsXKUvFHF0iXXyuaYe60uqRqQYA8Az/HA9NgteMZ+osRjt1Bj1NBckmpeYr15SAMCNrKrHpxKVtkT9dQ07XSISyPbMlrps8ROhoqvE1KL5OuPqONkVAADcztrXxwAAYNsW/b0TJXmJv/784Ycffn77Z/U2gw6/930giHCvAr+Ne/zZqMfvwb0K/Dbu8WejHr8H9yrw27jHn416/B7cq8Bv4x5/Nurxe3CvAr+Ne/zZTvVY/DGuNuoj0+anr9M/9p30ge275dPjc6/66UVz0XO8YPq3XeSj+2f4MWk9H/EwNbtg28nuJ2LcqB7f7VGozeeapY5OZTsPSekc5G712A+VrtSQ0e+fT7OGeiw6iyDX3Px322JSYumuTsUwaT0XPkyzx4028eGR9a9Hj67HIsMLnpzOIg9P4xxwVD2emGJft353yycaujg69fhi19zSo1yw5V3pgnHnPb66Noghnvt+9fL7wql9OklzH1bMerx/yyYxsD198kX9P8dT8+zPJ+2QbY/ii/6b/Tz4/70a9Symmk1JHzv0eur24rqJ9iH5RMlswTrsibZ8iuO2tUdDFJMsxt+6n+dV89oTzePq+ZqnssPpeUWX655pPXbWJ4qWJmCuZ237nhgSpziv7By3eJF12Kpx83lW1WORU3Zd5h1vwdKvOo7yOXaLUj22F+OncXzi387OuNFxWzJm8EnHtfnU5mwOUdX5Dse1eX5+3eOnSrb/6UJnrChOVZ5i3CpphOwiOOtclYz5frUzrohQNZfiA+GkYebfMK8t94jv34rtzc5xLqjHUTf/WCzN57gh5wvy2b4f7GyoY7vIOYrjm1GP2zLJBnfycZIRc2nOJx1C59M2RHHQqjWJBu1Zz4ZJmUOkU3b6m+Omwf1xm6VBxKDROtdmcpN6XAwbLY7ZPxq3YV5VMf2EGwbdtofUY2smTXFm51PbLsYqDh2d/bj562Mnn57H98p8GoYYnk9t/4a5FPOMwoqxsqEa8izG8U81y06tONmelO5Tj3XwtL1qKZznQ+3q1caJptA86LY9//3q468XjFubT9oe9a+Nn14Vnf3Qn/XYvxVTEqn6Fj5eU/Nxhrgyn1XrFp0y+xfjO3Fq+4+VpuGvc0NKN6nHxfhisk7/5nGdSTXH3/+mB40S2Lbc94HsB8W8T/3TbiJpMZlinHTQ6EJ/3Hn5+IM68bdW5mcvs2t4Gj37a1tWtUuULm9bnOZ8tmRZjh2qhmjOR4w7ez0b8symKtYzO2hDnv461E62SjScmX/VWPrzXGIB/VmYS+e0R6Gc/iJC87x62mvl58X3c71H2/vVAJ5i3t87sS1cgXr8Hv771VdlBGCkSfWYzeEi1OP3eO53BQBwcI8/G/X4PbhXgd/GPf5s1OP34F4Ffhv3+LNRj9+DexX4bdzjz0Y9fo+//vzhhx9+fvtn9TaDDubfH4v2Ue4Z/9SfDxkCAKaIXh9HVWd2NVoSX4wY/bukdtCq/gCA16Eed75iHt4fAPBG/fU4eh/42Hg8q983NqvXnkgvj14K77n3n9OUivPK9j8dN8R3plzVnuZTXIriZBvyAQAonfU42t/Fvq+HMPfxY4HZvHoTNeoRdf/m6TvrM2rQOxwDAAqG1OOjtJu+RMePfLp9/vd0EIVKG/WIun9zWYrWTaiK03bcFqdzXgCAbRv6+jjqVrWnm5v4p9v+XYYb4usRdf/Z0yzmM2r9e+KI9AAArknvV39+/Sj21+NG3fbvMtwQP3ut0z89m87Xyc2ZsnPtnY8BAAX674/T7VW3Z4c4nTp2duL7MZvjZ0fM9j8dZJMUoaL4YqbF/k57NNlofbJzieJ0zgsAsG3Tvp9L7OO/7YaTvWFKAICzed+X+arXSbed7G0TAwB84furAQBYj3oMAMB61GMAANajHgMAsB71GACA9ajHAACsRz0GAGA96jEAAOtRjwEAWI96DADAetRjAADWox4DALAe9RgAgOX+B09x50g7NXTrAAAAAElFTkSuQmCC" alt="" />

1.原因

  #参考了http://www.tuicool.com/articles/nEjiEv

  首先,代码中的html.text会自动将获取的内容解析为unicode  (与html.content不同。两者区别就是html.content的类型是bytes,而html.text类型是str,bytes通过解码(decode)可以得到str,str通过编码(encode)得到bytes)    html.text这种字符串如果要输出应当用utf-8来编码。而cmd中,(对于多数中国人所用的是中文的系统)默认字符编码是gbk

  从而导致此种现象:

  python要将utf-8编码的字符串,在gbk的cmd的中打印出来。于是出现了编码错误

2.解决方法

   1. 运行CMD;

2. 输入 CHCP,回车查看当前的编码;

3. 输入CHCP 65001(序号65001代表utf-8),回车;

4. 仅如此,还是不能支持UTF8的正常显示,你还要在窗体上右键,选择属性,来设置字体;

5. 操作完上面几步后,即使你原来的字体里面没有显示Lucida Console这个字体,现在应该也能看到了。选择它。如果原来就有,可以选上它先试试,不行在执行上述步骤。

(参考http://blog.useasp.net/archive/2012/04/24/how_to_use_UTF8_encoding_in_Windows_CMD.aspx)

如果还是不懂,请自行百度:在cmd上显示utf-8。

你也可以使用Pycharm这个IDE来运行查看结果,中文部分就能正常显示了。

  

aaarticlea/png;base64," alt="" />

  

写入文件时引发的UnicodeEncodeError:

参考:https://segmentfault.com/a/1190000004269037

  在测试过程中多次出现在写入文件时报告错误“UnicodeEncodeError: 'ascii' codec can't encode character '\u56de' in position 0: ordinal not in range(128)”,这是由于我们在抓取网页的时候采用的是UTF-8编码,而存储时没有指定编码,在存储到文件的过程中就会报错。

  解决办法为:
在读取文件时加入指定UTF-8编码的选项

f = open('content.txt','a',encoding='UTF-8')

另外需要注意的是使用requests获取到网页之后同样要指定编码

html = requests.get(url)
html = re.sub(r'charset=(/w*)', 'charset=UTF-8', html.text)

Python3 关于UnicodeDecodeError/UnicodeEncodeError: ‘gbk’ codec can’t decode/encode bytes类似的文本编码问题的更多相关文章

  1. 解决python3.6的UnicodeEncodeError: 'gbk' codec can't encode character '\xbb' in position 28613: illegal multibyte sequence

    这是python3.6的print()函数自身有限制,不能完全打印所有的unicode字符. 主要的是windows下python的默认编码不是'utf-8',改一下python的默认编码成'utf- ...

  2. 解决python3 UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX

    从网上抓了一些字节流,想打印出来结果发生了一下错误: UnicodeEncodeError: 'gbk' codec can't encode character '\xbb' in position ...

  3. python3.4 UnicodeDecodeError: 'gbk' codec can't decode byte 0xff in position

    python3.4 UnicodeDecodeError: 'gbk' codec can't decode byte 0xff in position 实用python的时候 打开一个csv的文件出 ...

  4. python3安装xadmin出现 UnicodeDecodeError: 'gbk' codec can't decode byte 0xa4 in position 3444: illegal multibyte sequence

    python3的环境安装xadmin时,直接pip install xadmin出现 Downloading xadmin-0.6.1.tar.gz (1.0MB) 100% |███████████ ...

  5. 解决python3 UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX

    从网上抓了一些字节流,想打印出来结果发生了一下错误: UnicodeEncodeError: 'gbk' codec can't encode character '\xbb' in position ...

  6. 解决python3 UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX(转)

    原文地址:https://www.cnblogs.com/feng18/p/5646925.html 从网上抓了一些字节流,想打印出来结果发生了一下错误: UnicodeEncodeError: 'g ...

  7. python3 UnicodeEncodeError: 'gbk' codec can't encode character '\xa0' in position 30: illegal multibyte sequence

    昨天用用python3写个日志文件,结果报错UnicodeEncodeError: 'gbk' codec can't encode character '\xa0' in position 30: ...

  8. python基础===解决python3 UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX(转载)

    本文转自:解决python3 UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX 从网上抓了一些字 ...

  9. Python_编码错误解决办法 python3 UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX

    先说解决办法:头部加几行代码 import io import sys sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='gb1803 ...

随机推荐

  1. CRM odata方法 js容易出现的错误,大小写区分 Value Id

    Id Value  注意大小写,I大写,V大写,typeResults.result[0].yt_category.Value; 否则会报 错,Result.yt_businessunit_terri ...

  2. KeystoneJS+mongo搭建简易博客

    KeystoneJS 是一款基于 Express 和 MongoDB 的开源免费 Node.js CMS 网站开发框架. 一. 安装node.js,mongodb 二. 命令行安装KeystoneJS ...

  3. JavaWeb核心编程之(三.6)HttpServlet

    之前都是集成的Servlet真的太过于繁琐了, Servlet接口提供了一个实现类 为HttpServlet  只要实现doGet 和doPost方法就可以了 仍然以一个表单为例 新建一个web工程 ...

  4. Python进阶之模块与包

    模块 .note-content {font-family: "Helvetica Neue",Arial,"Hiragino Sans GB","S ...

  5. jq方法

    DOM属性-获取和设置页面元素的DOM属性 .addClass()..attr()..prop()..hasClass()..html()..removeAttr()..removeClass().. ...

  6. Centos for php+mysql+apache

    一.安装 MySQL 首先来进行 MySQL 的安装.打开超级终端,输入: [root@localhost ~]# yum install mysql mysql-server 安装完毕,让 MySQ ...

  7. Flink资料(5) -- Job和调度

    该文档翻译自Jobs and Scheduling ----------------------------------------------- 该文档简单描述了Flink是如何调度Job的,以及如 ...

  8. linux select函数 shutdown函数

    #include<sys/select.h> #include<sys/time.h> int select(int maxfdp1,fd_set *readset,fd_se ...

  9. Linux 安装xtrabackup的依赖问题

    问题: 尝试安装xtrabackup rpm -ivh percona-xtrabackup-2.2.11-1.el7.x86_64.rpm 报错 perl(DBD::mysql) 被 percona ...

  10. WEB Application Development Integrator : 应用设置

    1.1.       系统安装 应用 Oracle EBS WEB Application Development Integrator WEB ADI在Oracle EBS 11.5.10.* 版本 ...