pandas知识点(基本功能)
1.重新索引
In [3]: obj = Series([4.5,7.2,-5.3,3.6], index=["d","b","a","c"])
In [4]: obj
Out[4]:
d 4.5
b 7.2
a -5.3
c 3.6
dtype: float64
In [6]: obj2 = obj.reindex(["a","b","c","d","e"])
In [7]: obj2
Out[7]:
a -5.3
b 7.2
c 3.6
d 4.5
e NaN
dtype: float64
In [8]: obj3 = Series(["blue","purple","yellow"], index=[0,2,4])
In [9]: obj3.reindex(range(6), method="ffill")
Out[9]:
0 blue
1 blue
2 purple
3 purple
4 yellow
5 yellow
dtype: object
In [12]: obj = Series(np.arange(5.), index=["a","b","c","d","e"])
In [13]: new_obj = obj.drop("c")
In [14]: new_obj
Out[14]:
a 0.0
b 1.0
d 3.0
e 4.0
dtype: float64
DataFrame可以删除任意轴上的索引值
In [4]: obj = Series(np.arange(4.), index=["a","b","c","d"])Out[6]:
a 0.0
b 1.0
dtype: float64
In [7]: obj[obj<2]
Out[7]:
a 0.0
b 1.0
dtype: float64
In [8]: obj["b":"c"]
Out[8]:
b 1.0
c 2.0
dtype: float64
In [10]: data
Out[10]:
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
In [11]: data['two']
Out[11]:
Ohio 1
Colorado 5
Utah 9
New York 13
Name: two, dtype: int32
In [12]: data[:2]
Out[12]:
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
In [13]: data > 5
Out[13]:
one two three four
Ohio False False False False
Colorado False False True True
Utah True True True True
New York True True True True
In [18]: data.ix['Colorado',['two','three']]
Out[18]:
two 5
three 6
Name: Colorado, dtype: int32
In [19]: data.ix[['Colorado','Utah'],[3,0,1]]
Out[19]:
four one two
Colorado 7 4 5
Utah 11 8 9
In [20]: s1 = Series([7.3,-2.5,3.4,1.5],index=['a','c','d','e'])
In [21]: s2 = Series([-2.1, 3.6, -1.5, 4, 3.1],index=['a','c','e','f','g'])
In [22]: s1+s2
Out[22]:
a 5.2
c 1.1
d NaN
e 0.0
f NaN
g NaN
dtype: float64
In [26]: df1
Out[26]:
b d e
Utah 0.0 1.0 2.0
Ohio 3.0 4.0 5.0
Texas 6.0 7.0 8.0
Oregon 9.0 10.0 11.0
In [27]: df2
Out[27]:
b c d
Ohio 0.0 1.0 2.0
Texas 3.0 4.0 5.0
Colorado 6.0 7.0 8.0
In [28]: df1+df2
Out[28]:
b c d e
Colorado NaN NaN NaN NaN
Ohio 3.0 NaN 6.0 NaN
Oregon NaN NaN NaN NaN
Texas 9.0 NaN 12.0 NaN
Utah NaN NaN NaN NaN
In [30]: df2.add(df1,fill_value=0)
Out[30]:
b c d e
Colorado 6.0 7.0 8.0 NaN
Ohio 3.0 1.0 6.0 5.0
Oregon 9.0 NaN 10.0 11.0
Texas 9.0 4.0 12.0 8.0
Utah 0.0 NaN 1.0 2.0
In [31]: arr = np.arange(12.).reshape((3,4))
In [32]: arr
Out[32]:
array([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.]])
In [33]: arr - arr[1]
Out[33]:
array([[-4., -4., -4., -4.],
[ 0., 0., 0., 0.],
[ 4., 4., 4., 4.]])
In [35]: frame = DataFrame(np.arange(12.).reshape((4,3)),columns=list('bde'),index=['Utah','Ohio','Texas','Oregon'])
In [39]: series = frame.iloc[0]
In [40]: frame
Out[40]:
b d e
Utah 0.0 1.0 2.0
Ohio 3.0 4.0 5.0
Texas 6.0 7.0 8.0
Oregon 9.0 10.0 11.0
In [41]: series
Out[41]:
b 0.0
d 1.0
e 2.0
Name: Utah, dtype: float64
In [43]: frame - series
Out[43]:
b d e
Utah 0.0 0.0 0.0
Ohio 3.0 3.0 3.0
Texas 6.0 6.0 6.0
Oregon 9.0 9.0 9.0
In [45]: frame + series2
Out[45]:
b d e f
Utah 0.0 NaN 3.0 NaN
Ohio 3.0 NaN 6.0 NaN
Texas 6.0 NaN 9.0 NaN
Oregon 9.0 NaN 12.0 NaN
In [46]: series3 = frame['d']
In [47]: frame.sub(series3, axis=0)
Out[47]:
b d e
Utah -1.0 0.0 1.0
Ohio -1.0 0.0 1.0
Texas -1.0 0.0 1.0
Oregon -1.0 0.0 1.0
In [49]: frame = DataFrame(np.random.randn(4,3), columns=list('bde'),index=['Utah','Ohio','Texas','Oregon'])
In [50]: frame
Out[50]:
b d e
Utah 0.913051 -1.289725 -0.590573
Ohio 1.417612 -1.835357 -0.010755
Texas 0.328839 -0.121878 -1.209583
Oregon 1.315330 -1.026557 -1.777427 In [51]: np.abs(frame)
Out[51]:
b d e
Utah 0.913051 1.289725 0.590573
Ohio 1.417612 1.835357 0.010755
Texas 0.328839 0.121878 1.209583
Oregon 1.315330 1.026557 1.777427
DataFrame的apply方法可以实现将函数应用到由各行或列形成的一维数组上:
In [52]: f = lambda x:x.max() - x.min()
In [53]: frame.apply(f)
Out[53]:
b 1.088773
d 1.713479
e 1.766671
dtype: float64
In [54]: frame.apply(f, axis=1)
Out[54]:
Utah 2.202776
Ohio 3.252969
Texas 1.538421
Oregon 3.092757
dtype: float64
In [57]: obj = Series(range(4), index=['d','a','b','c'])
In [58]: obj
Out[58]:
d 0
a 1
b 2
c 3
dtype: int64
In [59]: obj.sort_index
Out[59]:
<bound method Series.sort_index of d 0
a 1
b 2
c 3
dtype: int64>
In [62]: frame.sort_index()
Out[62]:
b d e
Ohio 1.417612 -1.835357 -0.010755
Oregon 1.315330 -1.026557 -1.777427
Texas 0.328839 -0.121878 -1.209583
Utah 0.913051 -1.289725 -0.590573
In [63]: frame.sort_index(axis=1)
Out[63]:
b d e
Utah 0.913051 -1.289725 -0.590573
Ohio 1.417612 -1.835357 -0.010755
Texas 0.328839 -0.121878 -1.209583
Oregon 1.315330 -1.026557 -1.777427
In [65]: frame.sort_index(axis=1,ascending=False)
Out[65]:
e d b
Utah -0.590573 -1.289725 0.913051
Ohio -0.010755 -1.835357 1.417612
Texas -1.209583 -0.121878 0.328839
Oregon -1.777427 -1.026557 1.315330
In [67]: frame.sort_values(by='b')
Out[67]:
b d e
Texas 0.328839 -0.121878 -1.209583
Utah 0.913051 -1.289725 -0.590573
Oregon 1.315330 -1.026557 -1.777427
Ohio 1.417612 -1.835357 -0.010755
In [70]: obj
Out[70]:
0 7
1 -5
2 7
3 4
4 2
5 0
6 4
dtype: int64
In [71]: obj.rank()
Out[71]:
0 6.5
1 1.0
2 6.5
3 4.5
4 3.0
5 2.0
6 4.5
dtype: float64
In [72]: obj.rank(method='first')
Out[72]:
0 6.0
1 1.0
2 7.0
3 4.0
4 3.0
5 2.0
6 5.0
dtype: float64
In [73]: obj = Series(range(5),index=['a','a','b','b','c'])
In [74]: obj
Out[74]:
a 0
a 1
b 2
b 3
c 4
dtype: int64
In [75]: obj.index.is_unique
Out[75]: False
In [76]: obj['a']
Out[76]:
a 0
a 1
dtype: int64
DataFrame也是同样的道理
pandas知识点(基本功能)的更多相关文章
- 机器学习-Pandas 知识点汇总(吐血整理)
Pandas是一款适用很广的数据处理的组件,如果将来从事机械学习或者数据分析方面的工作,咱们估计70%的时间都是在跟这个框架打交道.那大家可能就有疑问了,心想这个破玩意儿值得花70%的时间吗?咱不是还 ...
- pandas的基本功能(一)
第16天pandas的基本功能(一) 灵活的二进制操作 体现在2个方面 支持一维和二维之间的广播 支持缺失值数据处理 四则运算支持广播 +add - sub *mul /div divmod()分区和 ...
- Python数据分析--Pandas知识点(三)
本文主要是总结学习pandas过程中用到的函数和方法, 在此记录, 防止遗忘. Python数据分析--Pandas知识点(一) Python数据分析--Pandas知识点(二) 下面将是在知识点一, ...
- pandas的筛选功能,跟excel的筛选功能类似,但是功能更强大。
Select rows from a DataFrame based on values in a column -pandas 筛选 https://stackoverflow.com/questi ...
- Python数据分析--Pandas知识点(二)
本文主要是总结学习pandas过程中用到的函数和方法, 在此记录, 防止遗忘. Python数据分析--Pandas知识点(一) 下面将是在知识点一的基础上继续总结. 13. 简单计算 新建一个数据表 ...
- pandas知识点脑图汇总
参考文献: [1]Pandas知识点脑图汇总
- (数据科学学习手札134)pyjanitor:为pandas补充更多功能
本文示例代码及文件已上传至我的Github仓库https://github.com/CNFeffery/DataScienceStudyNotes 1 简介 pandas发展了如此多年,所包含的功能已 ...
- Python之Pandas知识点
很多人都分不清Numpy,Scipy,pandas三个库的区别. 在这里简单分别一下: NumPy:数学计算库,以矩阵为基础的数学计算模块,包括基本的四则运行,方程式以及其他方面的计算什么的,纯数学: ...
- python数据分析之Pandas:基本功能介绍
Pandas有两个主要的数据结构:Series和DataFrame. Series是一种类似于一维数组的对象,它由一组数据以及一组与之相关的数据标签构成.来看下它的使用过程 In [1]: from ...
随机推荐
- MapReduce的输入格式
1. InputFormat接口 InputFormat接口包含了两个抽象方法:getSplits()和creatRecordReader().InputFormat决定了Hadoop如何对文件进行分 ...
- pat1064. Complete Binary Search Tree (30)
1064. Complete Binary Search Tree (30) 时间限制 100 ms 内存限制 65536 kB 代码长度限制 16000 B 判题程序 Standard 作者 CHE ...
- Java中的continue语句——通过示例学习Java编程(12)
作者:CHAITANYA SINGH 来源:https://www.koofun.com//pro/kfpostsdetail?kfpostsid=23 continue语句主要是用在循环代码块中.当 ...
- window下安装scala搭载Intellij IDE
最近由于公司业务需求,要用到scala,编写还是windows下较好,linux下运行比较靠谱,废话少说,直接上步骤! 1.首先安装java环境 jdk下载地址:http://www.oracle.c ...
- 怎么旋转PDF文件的方向并保存成功
http://jingyan.baidu.com/article/59a015e39d7802f79488651e.html PDF格式的文档是非常普遍的一种阅读电子书格式,基本上非常好用了,不过有时 ...
- cms-数据库设计
业务相关的3张表 1.类型表: CREATE TABLE `t_arctype` (`id` int(11) NOT NULL AUTO_INCREMENT,//id`typeName` varcha ...
- 编程之美2015 资格赛 hihocoder 题目2: 回文字符序列
思路:暴力搜,用BFS的方式,生成每一种可能,再对每一种可能进行判断是否回文,进行统计.严重超时!计算一个25个字符的,大概要20多秒! #include <iostream> #incl ...
- pat甲级1123
1123 Is It a Complete AVL Tree(30 分) An AVL tree is a self-balancing binary search tree. In an AVL t ...
- 如何用WebIDE打开并运行CRM Fiori应用
访问Web IDE url 在Web IDE里进行项目clone操作: https://:8080/#/admin/projects/fnf/customer/cus.crm.opportunity ...
- xtrabackup 安装
xtrabackup 安装 yum install -y perl-DBI perl-DBD-MySQL perl-Time-HiRes perl-IO-Socket-SSL perl-Dige ...