1. I/O API工具

读取函数	写入函数
read_csv	to_csv
read_excel	to_excel
read_hdf	to_hdf
read_sql	to_sql
read_json	to_json
read_html	to_html
read_stata	to_stata
read_clipboard	to_clipboard
read_pickle	to_pickle
read_msgpack	to_mspack
read_gbq	to_gbq

2. 读写CSV文件

文件的每一行的元素是用逗号隔开，这种格式的文件就叫CSV文件。

2.1. 从CSV中读取数据

简单读取

white,read,blue,green,animal

1,5,2,3,cat

2,7,8,5,dog

3,3,6,7,horse

2,2,8,3,duck

4,4,2,1,mouse

>>> csvframe = pd.read_csv('E:\\Python\\Codes\\excited.csv')

>>> csvframe

   white  read  blue  green animal

0      1     5     2      3    cat

1      2     7     8      5    dog

2      3     3     6      7  horse

3      2     2     8      3   duck

4      4     4     2      1  mouse

用header和names指定表头

1,5,2,3,cat

2,7,8,5,dog

3,3,6,7,horse

2,2,8,3,duck

4,4,2,1,mouse

>>> csvframe = pd.read_csv('E:\\Python\\Codes\\excited.csv', header=None)

>>> csvframe

   0  1  2  3      4

0  1  5  2  3    cat

1  2  7  8  5    dog

2  3  3  6  7  horse

3  2  2  8  3   duck

4  4  4  2  1  mouse

>>> csvframe = pd.read_csv('E:\\Python\\Codes\\excited.csv', names=['white', 'red', 'blue', 'green', 'animal'])

>>> csvframe

   white  red  blue  green animal

0      1    5     2      3    cat

1      2    7     8      5    dog

2      3    3     6      7  horse

3      2    2     8      3   duck

4      4    4     2      1  mouse

创建等级结构的DataFrame

color,status,item1,item2,item3

black,up,3,4,6

black,down,2,6,7

white,up,5,5,5

white,down,3,3,2

white,left,1,2,1

red,up,2,2,2

red,down,1,1,4

>>> csvframe = pd.read_csv('E:\\Python\\Codes\\excited.csv', index_col=['color', 'status'])

>>> csvframe

              item1  item2  item3

color status

black up          3      4      6

      down        2      6      7

white up          5      5      5

      down        3      3      2

      left        1      2      1

red   up          2      2      2

      down        1      1      4

2.2. 写入数据到CSV中

简单写入

>>> frame = pd.DataFrame(np.arange(16).reshape((4,4)), columns = ['red', 'blue', 'orange', 'black'], index = ['a', 'b', 'c', 'd'])

>>> frame

   red  blue  orange  black

a    0     1       2      3

b    4     5       6      7

c    8     9      10     11

d   12    13      14     15

>>> frame.to_csv('E:\\Python\\Codes\\excited.csv')

,red,blue,orange,black

a,0,1,2,3

b,4,5,6,7

c,8,9,10,11

d,12,13,14,15

可以发现第一行的前面有一个','，因为列名前面有一个空白。

取消索引和列的写入

>>> frame.to_csv('E:\\Python\\Codes\\excited.csv', index = False, header = False)

0,1,2,3

4,5,6,7

8,9,10,11

12,13,14,15

处理NaN元素

>>> frame = pd.DataFrame([[3, 2, np.NaN], [np.NaN, np.NaN, np.NaN], [2, 3, 3]], index = ['a', 'b', 'c'], columns = ['red', 'black', 'orange'])

>>> frame

   red  black  orange

a  3.0    2.0     NaN

b  NaN    NaN     NaN

c  2.0    3.0     3.0

>>> frame.to_csv('E:\\Python\\Codes\\excited.csv')

使用np_rep参数把空字段替换

>>> frame.to_csv('E:\\Python\\Codes\\excited.csv', na_rep = 'lalala')

,red,black,orange

a,3.0,2.0,

b,,,

c,2.0,3.0,3.0

可以发现所有的NaN就是为空的

替换

,red,black,orange

a,3.0,2.0,lalala

b,lalala,lalala,lalala

c,2.0,3.0,3.0

这里发现列首的第一个还是没有东西，因为它本身不存在？

3. 读写TXT文件

TXT文件不一定是以逗号或者分号分割数据的，这种时候要用正则表达式。通常还要配合'*'号表示匹配任意多个。

例如'\s*'.

符号	意义
.	换行符以外的单个字符
\d	数字
\D	非数字字符
\s	空白字符
\S	非空白字符
\n	换行符
\t	制表符
\uxxxx	用十六进制数字xxxx表示的Unicode字符

简单读取

乱加空格和制表符

white red blue green

 1   5 2 3

2 7  8   5

2 3 3 3

>>> pd.read_table('E:\\Python\\Codes\\excited.txt', sep = '\s*')

__main__:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.

E:\Python\Python3\lib\site-packages\pandas\io\parsers.py:2137: FutureWarning: split() requires a non-empty pattern match.

  yield pat.split(line.strip())

E:\Python\Python3\lib\site-packages\pandas\io\parsers.py:2139: FutureWarning: split() requires a non-empty pattern match.

  yield pat.split(line.strip())

   white  red  blue  green

0      1    5     2      3

1      2    7     8      5

2      2    3     3      3

第一次尝试的时候报错了,于是按照提示加上

>>> pd.read_table('E:\\Python\\Codes\\excited.txt', sep = '\s*', engine = 'python')

   white  red  blue  green

0      1    5     2      3

1      2    7     8      5

2      2    3     3      3

成功了，其中'*'号的意思是匹配任意多个

读取时排除一些行

12#$@!%$!$#!@$!@$!@

#$%^$^%$#!

@#%!

white red blue green

!$#$!@$#!@$

 1   5 2 3

2 7  8   5

2 3 3 3

^&##$^@FGSDQAS

>>> pd.read_table('E:\\Python\\Codes\\excited.txt', sep = '\s*', engine = 'python', skiprows = [0, 1, 2, 4, 8])

   white  red  blue  green

0      1    5     2      3

1      2    7     8      5

2      2    3     3      3

列表内代表要跳过的行

读取部分数据

sep也可以用在read_csv啊原来。nrows代表读取几行的数据，例如nrows=3那么就读取3行的数据。

chunksize是把文件分割成一块一块的，chunksize=3的话就是每一块的行数为3.

white red blue green black orange golden

 1   5 2 3 111 222 233

100 7    8   5 2333 23333 233333

20 3 3 3 12222 1222 23232

2000 7   8   5 2333 23333 233333

300 3 3 3 12222 1222 23232

>>> frame = pd.read_csv('E:\\Python\\Codes\\excited.txt', sep = '\s*', skiprows=[2], nrows = 3, engine = 'python')

>>> frame

   white  red  blue  green  black  orange  golden

0      1    5     2      3    111     222     233

1     20    3     3      3  12222    1222   23232

2   2000    7     8      5   2333   23333  233333

从头开始读三行，并且跳过了第三行

>>> pieces = pd.read_csv('E:\\Python\\Codes\\excited.txt', sep = '\s*', chunksize = 2, engine = 'python')

>>> for piece in pieces:

...   print (piece)

...   print (type(piece))

...

   white  red  blue  green  black  orange  golden

0      1    5     2      3    111     222     233

1    100    7     8      5   2333   23333  233333

<class 'pandas.core.frame.DataFrame'>

   white  red  blue  green  black  orange  golden

2     20    3     3      3  12222    1222   23232

3   2000    7     8      5   2333   23333  233333

<class 'pandas.core.frame.DataFrame'>

   white  red  blue  green  black  orange  golden

4    300    3     3      3  12222    1222   23232

<class 'pandas.core.frame.DataFrame'>

每两个为一块。并且类型都是DataFrame。

3.2. 写入数据到TXT中

写入数据的话和csv是一样的。

4. 读写HTML文件

4.1. 写入数据到HTML文件中

先看看to_html()方法

>>> frame

   white  red  blue  green  black  orange  golden

0      1    5     2      3    111     222     233

1    100    7     8      5   2333   23333  233333

2     20    3     3      3  12222    1222   23232

3   2000    7     8      5   2333   23333  233333

4    300    3     3      3  12222    1222   23232

>>> print(frame.to_html())

<table border="1" class="dataframe">

  <thead>

    <tr style="text-align: right;">

      <th></th>

      <th>white</th>

      <th>red</th>

      <th>blue</th>

      <th>green</th>

      <th>black</th>

      <th>orange</th>

      <th>golden</th>

    </tr>

  </thead>

  <tbody>

    <tr>

      <th>0</th>

      <td>1</td>

      <td>5</td>

      <td>2</td>

      <td>3</td>

      <td>111</td>

      <td>222</td>

      <td>233</td>

    </tr>

    <tr>

      <th>1</th>

      <td>100</td>

      <td>7</td>

      <td>8</td>

      <td>5</td>

      <td>2333</td>

      <td>23333</td>

      <td>233333</td>

    </tr>

    <tr>

      <th>2</th>

      <td>20</td>

      <td>3</td>

      <td>3</td>

      <td>3</td>

      <td>12222</td>

      <td>1222</td>

      <td>23232</td>

    </tr>

    <tr>

      <th>3</th>

      <td>2000</td>

      <td>7</td>

      <td>8</td>

      <td>5</td>

      <td>2333</td>

      <td>23333</td>

      <td>233333</td>

    </tr>

    <tr>

      <th>4</th>

      <td>300</td>

      <td>3</td>

      <td>3</td>

      <td>3</td>

      <td>12222</td>

      <td>1222</td>

      <td>23232</td>

    </tr>

  </tbody>

</table>

可以发现DataFrame.to_html()可以将DataFrame直接变成html的表格内容。因此我们要把一个DataFrame变成可以浏览的html文件的时候，只需要插入一些其他的东西。

>>> s = ['<HTML>']

>>> s.append('<HEAD><TITLE>DataFrame</TITLE></HEAD>')

>>> s.append('<BODY>')

>>> s.append(frame.to_html())

>>> s.append('</BODY></HTML>')

>>> html = ''.join(s)

>>> html_file = open('E:\\Python\\Codes\\DataFrame.html', 'w')

>>> html_file.write(html)

1193

>>> html_file.close()

	white	red	blue	green	black	orange	golden
0	1	5	2	3	111	222	233
1	100	7	8	5	2333	23333	233333
2	20	3	3	3	12222	1222	23232
3	2000	7	8	5	2333	23333	233333
4	300	3	3	3	12222	1222	23232

4.2. 从HTML文件中读取数据

read_html()方法会返回页面所有的表格，因此得到的是一个DataFrame数组。

从上例读取

>>> web_frames = pd.read_html('E:\\Python\\Codes\\DataFrame.html')

>>> for web_frame in web_frames:

...   print (web_frame)

...

   Unnamed: 0  white  red  blue  green  black  orange  golden

0           0      1    5     2      3    111     222     233

1           1    100    7     8      5   2333   23333  233333

2           2     20    3     3      3  12222    1222   23232

3           3   2000    7     8      5   2333   23333  233333

4           4    300    3     3      3  12222    1222   23232

最厉害的是，read_html()可以以网址作为参数，直接解析并抽取网页中的表格。

于是试了试百度百科四谎的剧集

>>> favors = pd.read_html('http://baike.baidu.com/item/%E5%9B%9B%E6%9C%88%E6%98%AF%E4%BD%A0%E7%9A%84%E8%B0%8E%E8%A8%80/13382872#viewPageContent')

>>> now = favors[0].copy()

>>> now = now.set_index(0)

>>> now.columns = now.ix['话']

>>> now.index.name = None

>>> now.drop('话')

话             标题(日/中)               剧本  \

1    モノトーン・カラフル 单调·多彩          吉 冈 孝 夫

2             友人A 友人A             石黑恭平

3             春の中 春光里              神户守

4              旅立ち 启程  岩田和也 河野亚矢子 石黑恭平

5          どんてんもよう 阴天             石滨真史

6              帰り道 归途             井端义秀

7         カゲささやく 暗影低语              神户守

8               响け 回响             后藤圭二

9               共鸣 共鸣              神户守

10     君といた景色 与你共赏的景色             中村章子

11           命の灯 生命之光             朝仓海斗

12  トゥインクル リトルスター 小星星              神户守

13         爱の悲しみ 爱的忧伤             仓田绫子

14              足迹 足迹             柴山智隆

15            うそつき 骗子              神户守

16        似たもの同士 相似的人             黑木美幸

17          トワイライト 暮光              神户守

18          心重ねる 心心相印             石井俊匡

19     さよならヒーロー 再见了英雄             井端义秀

20            手と手 手与手              神户守

21                雪 雪        仓田绫子 柴山智隆

22              春风 春风             石黑恭平

23            MOMENTS             岩田和也   

话                                         分镜  \

1                                       石黑恭平

2                                       原田孝宏

3                                       岩田和也

4   三木俊明 河合拓也 牧田昌也 野野下伊织 山田慎也 菅井爱明 小泉初荣 浅贺和行

5                                  石滨真史 小岛崇史

6                                      野野下伊织

7                                       间岛崇宽

8                                       高桥英俊

9                                       黑木美幸

10                                      原田孝宏

11                                 石黑恭平 川越崇弘

12                                      福岛利规

13                                     野野下伊织

14                                      小泉初荣

15                                       矢岛武

16                 山田真也 野野下伊织 小泉初荣 三木俊明 浅贺和行

17                                     河野亚矢子

18                                      河合拓也

19                                       こさや

20                                       矢岛武

21            野野下伊织 小泉初荣 门之园惠美 高野绫 河合拓也 山田真也

22                                 石黑恭平 黑木美幸

23                      爱敬由纪子 奥田佳子 山田真也 伊藤香织   

话                                                  演出       作画监督 演奏 作画监督 总作画监督

1                                               爱敬由纪子       浅贺和行       -   NaN

2                                           三木俊明 小林惠祐      爱敬由纪子     NaN   NaN

3                                                河合拓也        NaN     NaN   NaN

4                                           浅贺和行 仓田绫子  爱敬由纪子 高野绫     NaN   NaN

5                                                小岛崇史          -   爱敬由纪子   NaN

6                                                浅贺和行        NaN     NaN   NaN

7                                                山田真也          -     NaN   NaN

8                                                河合拓也       浅贺和行     NaN   NaN

9                                                小泉初荣        NaN     NaN   NaN

10                                                高野绫        NaN     NaN   NaN

11                                           山下惠 中野彰子          -     NaN   NaN

12                                               长森佳容       浅贺和行     NaN   NaN

13                                                NaN        NaN     NaN   NaN

14                                                  -        NaN     NaN   NaN

15                  北岛勇树 山下惠 C Company NAMU Animation       浅贺和行     NaN   NaN

16                                                  -        高野绫     NaN   NaN

17                                           三木俊明 高田晃       浅贺和行   爱敬由纪子   NaN

18                                                NaN        NaN     NaN   NaN

19                           小泉初荣 野野下伊织 高野绫 山田真也 河合拓也        NaN     NaN   NaN

20  野野下伊织 小泉初荣 河合拓也 山田真也 高野绫 薗部爱子 奥田佳子 加藤万由子 高田晃 薮本和彦        NaN     NaN   NaN

21                                                NaN        NaN     NaN   NaN

22       奥田桂子 河合拓也 野野下伊织 高野绫 小泉初荣 伊藤香织 浅贺和行 高田晃 爱敬由纪子        NaN     NaN   NaN

23                                                NaN        NaN     NaN   NaN

很强大。但是因为外移了一行..搞了挺久终于完美显示了。

5. 其他格式

除了表列出来的文件格式，还有HDF5格式、pickle格式等。

Python之Pandas库学习（二）：数据读写的更多相关文章

Python之Pandas库学习（一）：简介
官方文档 1. 安装Pandas windos下cmd:pip install pandas 导入pandas包:import pandas as pd 2. Series对象带索引的一维数组创建 ...
python的pandas库学习笔记
导入: import pandas as pd from pandas import Series,DataFrame 1.两个主要数据结构:Series和DataFrame (1)Series是一种 ...
Python之Pandas库学习（三）：数据处理
1. 合并可以将其理解为SQL中的JOIN操作,使用一个或多个键把多行数据结合在一起. 1.1. 简单合并参数on表示合并依据的列,参数how表示用什么方式操作(默认是内连接). >> ...
pandas库学习笔记（二）DataFrame入门学习
Pandas基本介绍——DataFrame入门学习前篇文章中,小生初步介绍pandas库中的Series结构的创建与运算,今天小生继续“死磕自己”为大家介绍pandas库的另一种最为常见的数据结构D ...
Python数据分析Pandas库之熊猫(10分钟二)
pandas 10分钟教程(二) 重点发法分组 groupby('列名') groupby(['列名1','列名2',.........]) 分组的步骤 (Splitting) 按照一些规则将数据分 ...
Python之Pandas库常用函数大全（含注释）
前言:本博文摘抄自中国慕课大学上的课程<Python数据分析与展示>,推荐刚入门的同学去学习,这是非常好的入门视频. 继续一个新的库,Pandas库.Pandas库围绕Series类型和D ...
python爬虫解析库学习
一.xpath库使用: 1.基本规则: 2.将文件转为HTML对象: html = etree.parse('./test.html', etree.HTMLParser()) result = et ...
使用python调用zxing库生成二维码图片
(1) 安装Jpype 用python调用jar包须要安装jpype扩展,在Ubuntu上能够直接使用apt-get安装jpype扩展 $ sudo apt-get install pytho ...
【C++实现python字符串函数库】二：字符串匹配函数startswith与endswith
[C++实现python字符串函数库]字符串匹配函数startswith与endswith 这两个函数用于匹配字符串的开头或末尾,判断是否包含另一个字符串,它们返回bool值.startswith() ...

随机推荐

matlab进行离散点的曲线拟合
原文:matlab进行离散点的曲线拟合 ployfit是matlab中基于最小二乘法的多项式拟合函数.最基础的用法如下: C=polyfit(X,Y,N) 其中: X : 需要拟合的点的横坐标 Y:需 ...
jquery layer插件弹出弹层结构紧凑，功能强大
/* 去官方网站下载最新的js http://sentsin.com/jquery/layer/ ①引用jquery ②引用layer.min.js */ 事件触发炸弹层可以自由绑定,例如: $('# ...
2-19-使用apache搭建web网站
1 搭建一台测试web服务器案例: 部门内部搭建一台WEB服务器,采用的IP地址和端口为192.168.10.34:80,首页采用index.html 文件.管理员E-mail地址为 xuegod@ ...
StackLayout
堆栈式地放置内容可以在xaml中完成视图,也可以在cs代码中完成视图 Xamarin的所有视图和布局都是可以 1.在xaml中完成 2.在cs代码中完成视图 (类比WPF) 示例在cs代码中完成视图 ...
Tab切换效果的实现
 <link rel="stylesheet" href="~/Content/bootstrap.m ...
在Windows系统上以C++打印出当前活动用户的环境变量
在Windows系统上以C++打印出当前活动用户的环境变量,代码如下(QT环境): void getEnvironmentVariables() { DWORD sessionId = WTSGetA ...
js 超链接点击
<!DOCTYPE html><html lang="en" xmlns="http://www.w3.org/1999/xhtml"> ...
discuz电脑访问手机版域名怎么跳转到电脑版本
用discuz论坛访问手机版本的域名不会自动跳转到电脑版本,而是会跳转到域名+misc.php?mod=mobile体验很不好.现提供修改方法:打开论坛根目录找到文件./source/class/di ...
asp.net文件流下载的代码摘要
try { var workbook = new XLWorkbook(); if (Workbook != null) { workbook = Workbook; } if (this.Expor ...
UWP应用载入SVG图片的兼容性方案
原文 UWP应用载入SVG图片的兼容性方案新版本<纸书科学计算器>的更新点之一,就是优化了表达式的显示方式.在旧版本中,表达式里的符号是用png图片显示的,当用户放大看的时候会发现一些锯 ...

Python之Pandas库学习（二）：数据读写