pandas知识点(基本功能)
1.重新索引
In [3]: obj = Series([4.5,7.2,-5.3,3.6], index=["d","b","a","c"])
In [4]: obj
Out[4]:
d 4.5
b 7.2
a -5.3
c 3.6
dtype: float64
In [6]: obj2 = obj.reindex(["a","b","c","d","e"])
In [7]: obj2
Out[7]:
a -5.3
b 7.2
c 3.6
d 4.5
e NaN
dtype: float64
In [8]: obj3 = Series(["blue","purple","yellow"], index=[0,2,4])
In [9]: obj3.reindex(range(6), method="ffill")
Out[9]:
0 blue
1 blue
2 purple
3 purple
4 yellow
5 yellow
dtype: object
In [12]: obj = Series(np.arange(5.), index=["a","b","c","d","e"])
In [13]: new_obj = obj.drop("c")
In [14]: new_obj
Out[14]:
a 0.0
b 1.0
d 3.0
e 4.0
dtype: float64
DataFrame可以删除任意轴上的索引值
In [4]: obj = Series(np.arange(4.), index=["a","b","c","d"])Out[6]:
a 0.0
b 1.0
dtype: float64
In [7]: obj[obj<2]
Out[7]:
a 0.0
b 1.0
dtype: float64
In [8]: obj["b":"c"]
Out[8]:
b 1.0
c 2.0
dtype: float64
In [10]: data
Out[10]:
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
In [11]: data['two']
Out[11]:
Ohio 1
Colorado 5
Utah 9
New York 13
Name: two, dtype: int32
In [12]: data[:2]
Out[12]:
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
In [13]: data > 5
Out[13]:
one two three four
Ohio False False False False
Colorado False False True True
Utah True True True True
New York True True True True
In [18]: data.ix['Colorado',['two','three']]
Out[18]:
two 5
three 6
Name: Colorado, dtype: int32
In [19]: data.ix[['Colorado','Utah'],[3,0,1]]
Out[19]:
four one two
Colorado 7 4 5
Utah 11 8 9
In [20]: s1 = Series([7.3,-2.5,3.4,1.5],index=['a','c','d','e'])
In [21]: s2 = Series([-2.1, 3.6, -1.5, 4, 3.1],index=['a','c','e','f','g'])
In [22]: s1+s2
Out[22]:
a 5.2
c 1.1
d NaN
e 0.0
f NaN
g NaN
dtype: float64
In [26]: df1
Out[26]:
b d e
Utah 0.0 1.0 2.0
Ohio 3.0 4.0 5.0
Texas 6.0 7.0 8.0
Oregon 9.0 10.0 11.0
In [27]: df2
Out[27]:
b c d
Ohio 0.0 1.0 2.0
Texas 3.0 4.0 5.0
Colorado 6.0 7.0 8.0
In [28]: df1+df2
Out[28]:
b c d e
Colorado NaN NaN NaN NaN
Ohio 3.0 NaN 6.0 NaN
Oregon NaN NaN NaN NaN
Texas 9.0 NaN 12.0 NaN
Utah NaN NaN NaN NaN
In [30]: df2.add(df1,fill_value=0)
Out[30]:
b c d e
Colorado 6.0 7.0 8.0 NaN
Ohio 3.0 1.0 6.0 5.0
Oregon 9.0 NaN 10.0 11.0
Texas 9.0 4.0 12.0 8.0
Utah 0.0 NaN 1.0 2.0
In [31]: arr = np.arange(12.).reshape((3,4))
In [32]: arr
Out[32]:
array([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.]])
In [33]: arr - arr[1]
Out[33]:
array([[-4., -4., -4., -4.],
[ 0., 0., 0., 0.],
[ 4., 4., 4., 4.]])
In [35]: frame = DataFrame(np.arange(12.).reshape((4,3)),columns=list('bde'),index=['Utah','Ohio','Texas','Oregon'])
In [39]: series = frame.iloc[0]
In [40]: frame
Out[40]:
b d e
Utah 0.0 1.0 2.0
Ohio 3.0 4.0 5.0
Texas 6.0 7.0 8.0
Oregon 9.0 10.0 11.0
In [41]: series
Out[41]:
b 0.0
d 1.0
e 2.0
Name: Utah, dtype: float64
In [43]: frame - series
Out[43]:
b d e
Utah 0.0 0.0 0.0
Ohio 3.0 3.0 3.0
Texas 6.0 6.0 6.0
Oregon 9.0 9.0 9.0
In [45]: frame + series2
Out[45]:
b d e f
Utah 0.0 NaN 3.0 NaN
Ohio 3.0 NaN 6.0 NaN
Texas 6.0 NaN 9.0 NaN
Oregon 9.0 NaN 12.0 NaN
In [46]: series3 = frame['d']
In [47]: frame.sub(series3, axis=0)
Out[47]:
b d e
Utah -1.0 0.0 1.0
Ohio -1.0 0.0 1.0
Texas -1.0 0.0 1.0
Oregon -1.0 0.0 1.0
In [49]: frame = DataFrame(np.random.randn(4,3), columns=list('bde'),index=['Utah','Ohio','Texas','Oregon'])
In [50]: frame
Out[50]:
b d e
Utah 0.913051 -1.289725 -0.590573
Ohio 1.417612 -1.835357 -0.010755
Texas 0.328839 -0.121878 -1.209583
Oregon 1.315330 -1.026557 -1.777427
In [51]: np.abs(frame)
Out[51]:
b d e
Utah 0.913051 1.289725 0.590573
Ohio 1.417612 1.835357 0.010755
Texas 0.328839 0.121878 1.209583
Oregon 1.315330 1.026557 1.777427
DataFrame的apply方法可以实现将函数应用到由各行或列形成的一维数组上:
In [52]: f = lambda x:x.max() - x.min()
In [53]: frame.apply(f)
Out[53]:
b 1.088773
d 1.713479
e 1.766671
dtype: float64
In [54]: frame.apply(f, axis=1)
Out[54]:
Utah 2.202776
Ohio 3.252969
Texas 1.538421
Oregon 3.092757
dtype: float64
In [57]: obj = Series(range(4), index=['d','a','b','c'])
In [58]: obj
Out[58]:
d 0
a 1
b 2
c 3
dtype: int64
In [59]: obj.sort_index
Out[59]:
<bound method Series.sort_index of d 0
a 1
b 2
c 3
dtype: int64>
In [62]: frame.sort_index()
Out[62]:
b d e
Ohio 1.417612 -1.835357 -0.010755
Oregon 1.315330 -1.026557 -1.777427
Texas 0.328839 -0.121878 -1.209583
Utah 0.913051 -1.289725 -0.590573
In [63]: frame.sort_index(axis=1)
Out[63]:
b d e
Utah 0.913051 -1.289725 -0.590573
Ohio 1.417612 -1.835357 -0.010755
Texas 0.328839 -0.121878 -1.209583
Oregon 1.315330 -1.026557 -1.777427
In [65]: frame.sort_index(axis=1,ascending=False)
Out[65]:
e d b
Utah -0.590573 -1.289725 0.913051
Ohio -0.010755 -1.835357 1.417612
Texas -1.209583 -0.121878 0.328839
Oregon -1.777427 -1.026557 1.315330
In [67]: frame.sort_values(by='b')
Out[67]:
b d e
Texas 0.328839 -0.121878 -1.209583
Utah 0.913051 -1.289725 -0.590573
Oregon 1.315330 -1.026557 -1.777427
Ohio 1.417612 -1.835357 -0.010755
In [70]: obj
Out[70]:
0 7
1 -5
2 7
3 4
4 2
5 0
6 4
dtype: int64
In [71]: obj.rank()
Out[71]:
0 6.5
1 1.0
2 6.5
3 4.5
4 3.0
5 2.0
6 4.5
dtype: float64
In [72]: obj.rank(method='first')
Out[72]:
0 6.0
1 1.0
2 7.0
3 4.0
4 3.0
5 2.0
6 5.0
dtype: float64
In [73]: obj = Series(range(5),index=['a','a','b','b','c'])
In [74]: obj
Out[74]:
a 0
a 1
b 2
b 3
c 4
dtype: int64
In [75]: obj.index.is_unique
Out[75]: False
In [76]: obj['a']
Out[76]:
a 0
a 1
dtype: int64
DataFrame也是同样的道理
pandas知识点(基本功能)的更多相关文章
- 机器学习-Pandas 知识点汇总(吐血整理)
Pandas是一款适用很广的数据处理的组件,如果将来从事机械学习或者数据分析方面的工作,咱们估计70%的时间都是在跟这个框架打交道.那大家可能就有疑问了,心想这个破玩意儿值得花70%的时间吗?咱不是还 ...
- pandas的基本功能(一)
第16天pandas的基本功能(一) 灵活的二进制操作 体现在2个方面 支持一维和二维之间的广播 支持缺失值数据处理 四则运算支持广播 +add - sub *mul /div divmod()分区和 ...
- Python数据分析--Pandas知识点(三)
本文主要是总结学习pandas过程中用到的函数和方法, 在此记录, 防止遗忘. Python数据分析--Pandas知识点(一) Python数据分析--Pandas知识点(二) 下面将是在知识点一, ...
- pandas的筛选功能,跟excel的筛选功能类似,但是功能更强大。
Select rows from a DataFrame based on values in a column -pandas 筛选 https://stackoverflow.com/questi ...
- Python数据分析--Pandas知识点(二)
本文主要是总结学习pandas过程中用到的函数和方法, 在此记录, 防止遗忘. Python数据分析--Pandas知识点(一) 下面将是在知识点一的基础上继续总结. 13. 简单计算 新建一个数据表 ...
- pandas知识点脑图汇总
参考文献: [1]Pandas知识点脑图汇总
- (数据科学学习手札134)pyjanitor:为pandas补充更多功能
本文示例代码及文件已上传至我的Github仓库https://github.com/CNFeffery/DataScienceStudyNotes 1 简介 pandas发展了如此多年,所包含的功能已 ...
- Python之Pandas知识点
很多人都分不清Numpy,Scipy,pandas三个库的区别. 在这里简单分别一下: NumPy:数学计算库,以矩阵为基础的数学计算模块,包括基本的四则运行,方程式以及其他方面的计算什么的,纯数学: ...
- python数据分析之Pandas:基本功能介绍
Pandas有两个主要的数据结构:Series和DataFrame. Series是一种类似于一维数组的对象,它由一组数据以及一组与之相关的数据标签构成.来看下它的使用过程 In [1]: from ...
随机推荐
- [知乎作答]·关于在Keras中多标签分类器训练准确率问题
[知乎作答]·关于在Keras中多标签分类器训练准确率问题 本文来自知乎问题 关于在CNN中文本预测sigmoid分类器训练准确率的问题?中笔者的作答,来作为Keras中多标签分类器的使用解析教程. ...
- shell 文件测试 蛮全的
文件状态测试 -b filename : 当filename 存在并且是块文件时返回真(返回0)-c filename : 当filename 存在并且是字符文件时返回真-d pathname : 当 ...
- JS展示预览PDF。
刚好遇到需求,需要在手机端--展示一个电子收据,电子收据返回是PDF格式的,所以需要在前端上面去做PDF预览. 在学习过程中,了解到一种很简单,不需要任何插件的方法做PDF预览,但是这方法有局限性. ...
- @b.windows.last.use
@b.windows.last.use @b.windows.first.use be_true 一般用在step文件中
- VS2017无法进入断点调试且移动到breakpoint上的时候报错“breakpoint will not currently be hit. the source code is different from original version. ”
我尝试了网上的很多其他办法也翻阅了很多外网资源,这些方法并不能解决我的问题 当然我非常震惊正当我尝试着在stack overflow上发表评论交流一下究竟如何解决的时候,却发现有方法灵验了 ,但是每个 ...
- HTTPS与SSL(一)
1. HTTPS HTTPS(全称:Hypertext Transfer Protocol over Secure Socket Layer),是以安全为目标的HTTP通道,简单讲是HTTP的安全版 ...
- Oracle Form个性化案例(一)
业务场景: 现有Form A,需通过A中的菜单栏中调用另一Form B,需将某值作为参数传入Form B中:
- mysql 5.7以上安装遇到的问题
参考地址: https://blog.csdn.net/u012278016/article/details/80455439 本人在window上安装mysql 5.7版本以上的mysql,出现很 ...
- Maven建立spring-web项目
参考博客网址: https://blog.csdn.net/caoxuekun/article/details/77336444 1.eclipse集成maven 2.maven创建web项目 3.搭 ...
- python super用法
普通继承 class FooParent(object): def __init__(self): self.parent = 'I\'m the parent.' print 'Parent' de ...