python数据分析基础—

参考pandas官方文档:

http://pandas.pydata.org/pandas-docs/stable/10min.html#min

1.pandas中的数据类型

Series 带有索引标记的一维数组，可以存储任何数据类型

 #基本方法

 >>s =pd.Series(data, index=index)

 >>import pandas as pd

 >>import numpy as np

 # 使用ndarray创建

 >>indexs = ['a', 'b', 'c']

 >>s  = pd.Series(np.random.randn(3), index=indexs)

 >>s

 a   -1.817485

 b    0.012912

 c    0.866929

 dtype: float64

 >>s.index

 Index(['a', 'b', 'c'], dtype='object')

 #默认索引值

 >>s  = pd.Series(np.random.randn(3))

 >>s

 0    1.985833

 1    0.467035

 2    0.636828

 dtype: float64

 #使用dict创建

 #默认使用dict的索引

 >>d = {'a' : 0., 'b' : 1., 'c' : 2.}

 >>pd.Series(d)

 a    0.0

 b    1.0

 c    2.0

 dtype: float64

 #指明索引值

 >>pd.Series(d, index=['b', 'c', 'd', 'a'])

 b    1.0

 c    2.0

 d    NaN

 a    0.0

 dtype: float64

 #使用标量值创建

 >>pd.Series(5., index=['a', 'b', 'c', 'd', 'e'])

 a    5.0

 b    5.0

 c    5.0

 d    5.0

 e    5.0

 dtype: float64

Series 类似ndarray，可以使用Numpy的很多语法

>>s = pd.Series(np.random.randn(5),index=['a', 'b', 'c', 'd', 'e'])

>>s

a   -1.329486

b    0.396057

c   -1.156737

d   -1.152107

e   -0.787661

dtype: float64

# 索引

>>s[0]

-1.3294860342555725

#切片

>>s[:3]

a   -1.329486

b    0.396057

c   -1.156737

dtype: float64

# 推导式

>>s[s > s.median()]

b    0.396057

e   -0.787661

dtype: float64

# 按序索引

>>s[[4,3,1]]

e   -0.787661

d   -1.152107

b    0.396057

dtype: float64

>>np.exp(s)

a    0.264613

b    1.485954

c    0.314511

d    0.315970

e    0.454908

dtype: float64

Series 类似dict类型，可以操作索引值

>>s['a']

-1.3294860342555725

>>s['e']=12

>>s

a    -1.329486

b     0.396057

c    -1.156737

d    -1.152107

e    12.000000

dtype: float64

>>'e' in s

True

>>s.get('e')

12.0

>>s+s

a    -2.658972

b     0.792115

c    -2.313474

d    -2.304214

e    24.000000

dtype: float64

>>s*2

a    -2.658972

b     0.792115

c    -2.313474

d    -2.304214

e    24.000000

dtype: float64

#索引值自动对齐

#s[1:]中有a, s[:-1]中有e

>>s[1:] + s[:-1]

a         NaN

b    0.792115

c   -2.313474

d   -2.304214

e         NaN

dtype: float64

Series的name属性，创建新对象

#注意 name属性

>>s = pd.Series(np.random.randn(5),name='sth')

>>s

0    1.338578

1    2.074678

2   -0.462777

3    0.518763

4   -0.372692

Name: sth, dtype: float64

# 使用rename方法

>>s2 = s.rename('dif')

>>s2

0    1.338578

1    2.074678

2   -0.462777

3    0.518763

4   -0.372692

Name: dif, dtype: float64

>>id(s)

2669465319632

>>id(s2)

2669465320416

#s 与 s2是不同的对象，两者尽管值相同，但地址不同

DataFrame 带索引值的二维数组，类似SQL的表，列项通常是不同的数据类型

index 行索引，columns列索引

#使用Series字典或字典创建DataFrame

>>d= {'one':pd.Series([1.,2.,3.], index=['a','b','c']),         'two':pd.Series([1.,2.,3.,4.], index=['a','b','c','d'])}

>>df = pd.DataFrame(d)

>>df

   one  two

a  1.0  1.0

b  2.0  2.0

c  3.0  3.0

d  NaN  4.0

# 按序输出

>>pd.DataFrame(d, index=['d','b','a'])

   one  two

d  NaN  4.0

b  2.0  2.0

a  1.0  1.0

>>df.index

Index(['a', 'b', 'c', 'd'], dtype='object')

>>df.columns

Index(['one', 'two'], dtype='object')

#使用ndarrays/list字典

>>d = {'one':[1.,2.,3.,4.],'two':[4.,3.,2.,1.]}

>>pd.DatdFrame(d)

   one  two

0  1.0  4.0

1  2.0  3.0

2  3.0  2.0

3  4.0  1.0

#指定index

>>pd.DataFrame(d,index=['a','b','c','d'])

   one  two

a  1.0  4.0

b  2.0  3.0

c  3.0  2.0

d  4.0  1.0

DataFrame操作

列选择、添加、删除

>>df['one']

a    1.0

b    2.0

c    3.0

d    NaN

Name: one, dtype: float64

#添加 three 与 flag 列，总在尾部添加

>>df['three'] = df['one'] * df['two']

>>df['flag']=df['one']>2

>>df

   one  two  three   flag

a  1.0  1.0    1.0  False

b  2.0  2.0    4.0  False

c  3.0  3.0    9.0   True

d  NaN 4.0   NaN  False

# 删除

>>del df['two']

>>three = df.pop('three')

>>three

a    1.0

b    4.0

c    9.0

d    NaN

Name: three, dtype: float64

>>df

   one   flag

a  1.0  False

b  2.0  False

c  3.0   True

d  NaN  False

#可以将列数据截断

>>df['one_trunc'] = df['one'][:2]

   one   flag  one_trunc

a  1.0  False        1.0

b  2.0  False        2.0

c  3.0   True        NaN

d  NaN  False       NaN

>>df['foo'] = 'bar'

>>df

   one   flag  one_trunc  foo

a  1.0  False        1.0     bar

b  2.0  False        2.0     bar

c  3.0   True        NaN    bar

d  NaN  False      NaN    bar

#使用insert函数可以在指定列后插入

#在第1列后插入

>>df.insert(1,'ba',df['one'])

>>df

   one   ba     flag  one_trunc  foo

a  1.0  1.0    False        1.0  bar

b  2.0  2.0    False        2.0  bar

c  3.0  3.0     True        NaN  bar

d  NaN  NaN  False       NaN  bar

索引、选择行

选择列　　 df[col] Series

按照标签选择行　 df.loc[label]　 Series

按照索引值选择行 df.iloc[loc]　　Series

切分行　　　　　　df[5:10] DataFrame

按照布尔向量选择行 df[bool_vec] DataFrame

>>d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),

     'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}

>>df = pd.DataFrame(d)

>>df

   one  two

a  1.0  1.0

b  2.0  2.0

c  3.0  3.0

d  NaN  4.0

#按照标签选择行

>>df.loc['b']

one    2.0

two    2.0

Name: b, dtype: float64

>>type(df.loc['b'])

pandas.core.series.Series

#按照索引值选择行

>>df.iloc[2]

one    3.0

two    3.0

Name: c, dtype: float64

#切分行

>>df[1:3]

   one  two

b  2.0  2.0

c  3.0  3.0

>>type(df[1:3])

pandas.core.frame.DataFrame

选择列

>>df.one

a    1.0

b    2.0

c    3.0

d    NaN

Name: one, dtype: float64

>>df['one']

a    1.0

b    2.0

c    3.0

d    NaN

Name: one, dtype: float64

数据对齐与计算

对齐：列与行标签自动对齐

>>da = pd.DataFrame(np.random.randn(10, 4), columns=['A', 'B', 'C', 'D'])

>>db = pd.DataFrame(np.random.randn(7, 3), columns=['A', 'B', 'C'])

>>da +db

          A            B              C           D

0 -0.920370 -0.529455 -2.386419  NaN

1 -1.277148  1.292130  1.196099   NaN

2  1.182199  0.454546  0.381586   NaN

3  1.100170 -1.830894  1.105932   NaN

4  0.507649  1.291516 -2.084368   NaN

5 -1.198811 -2.180978  0.342185   NaN

6  0.667211  2.141364  0.044136   NaN

7       NaN       NaN            NaN      NaN

8       NaN       NaN            NaN      NaN

9       NaN       NaN            NaN      NaN

#支持Numpy操作

>>np.exp(da)

>>np.asarray(da)

3维数据类型Penel，在0.20.0及其后续版本中不再支持

新的类型xarray，用于支持多维数据

python数据分析基础——pandas Tutorial的更多相关文章

Python数据分析基础——Numpy tutorial
参考link https://docs.scipy.org/doc/numpy-dev/user/quickstart.html 基础 Numpy主要用于处理多维数组,数组中元素通常是数字,索引值为 ...
Python数据分析库pandas基本操作
Python数据分析库pandas基本操作2017年02月20日 17:09:06 birdlove1987 阅读数:22631 标签: python 数据分析 pandas 更多个人分类: Pyt ...
Python数据分析之pandas基本数据结构：Series、DataFrame
1引言本文总结Pandas中两种常用的数据类型: (1)Series是一种一维的带标签数组对象. (2)DataFrame,二维,Series容器 2 Series数组 2.1 Series数组构成 ...
Numpy使用大全（python矩阵相关运算大全)-Python数据分析基础2
//2019.07.10python数据分析基础——numpy(数据结构基础) import numpy as np: 1.python数据分析主要的功能实现模块包含以下六个方面:(1)numpy—— ...
Python数据分析基础教程
Python数据分析基础教程(第2版)(高清版)PDF 百度网盘链接:https://pan.baidu.com/s/1_FsReTBCaL_PzKhM0o6l0g 提取码:nkhw 复制这段内容后 ...
Python数据分析基础PDF
Python数据分析基础(高清版)PDF 百度网盘链接:https://pan.baidu.com/s/1ImzS7Sy8TLlTshxcB8RhdA 提取码:6xeu 复制这段内容后打开百度网盘手 ...
Python 数据分析：Pandas 缺省值的判断
Python 数据分析:Pandas 缺省值的判断背景我们从数据库中取出数据存入 Pandas None 转换成 NaN 或 NaT.但是,我们将 Pandas 数据写入数据库时又需要转换成 No ...
python数据分析基础
---恢复内容开始--- Python数据分析基础(1) //2019.07.09python数据分析基础总结1.python数据分析主要使用IDE是Pycharm和Anaconda,最为常用和方便的 ...
Python数据分析之Pandas操作大全
从头到尾都是手码的,文中的所有示例也都是在Pycharm中运行过的,自己整理笔记的最大好处在于可以按照自己的思路来构建矿建,等到将来在需要的时候能够以最快的速度看懂并应用=_= 注:为方便表述,本章设 ...

随机推荐

CSU - 2031 Barareh on Fire （两层bfs）
传送门: http://acm.csu.edu.cn/csuoj/problemset/problem?pid=2031 Description The Barareh village is on f ...
vue 方法相互调用注意事项与详解
vue在同一个组件内: methods中的一个方法调用methods中的另外一个方法: 可以直接这样调用:this.$options.methods.test(); this.$options.met ...
weex图片加载更多方法loadmore的使用
首先,放一个weex中loadmore使用的demo,可以看一下http://dotwe.org/vue/8dd2a10c69e149ae8971f8298cc8bebf 1.在list标签上添加 @ ...
React最佳实践（1）
React最佳实践不敢妄谈,但最差实践非知乎莫属. 旧版知乎看起来土了点,但体验流畅,起码用起来舒服. 新版知乎看起来UI现代化,技术实现上采用了React,但是可能因为知乎缺钱,请不起高水平的前端工 ...
复习宝典之Redis
查看更多宝典,请点击<金三银四,你的专属面试宝典> 第八章:Redis Redis是一个key-value的nosql数据库.先存到内存中,会根据一定的策略持久化到磁盘,即使断电也不会丢失 ...
HTML | video的封面平铺方法
<video style="object-fit:fill;"></video>
Python入门 —— 04字符串解析
字符串 -字符串是 Python 中最常用的数据类型.(可以说是大多数语言都常用) 1. 创建字符串 ( '' 或 "" 和 '''''')(单,双和三引号)(字符串可以为空) - ...
【rabbitmq消息队列配置】
#erlang语言支持包 #rabbitmq-server安装支持 #添加用户 #删除用户 #用户角色 #启动 #登录 #管理界面 #guest登录不了: Rabbitmq.conf文件添加 #开启管 ...
前端js转换时间戳为时间类型显示
//时间戳转换 function add0(m){return m<10?'0'+m:m } function formatDate(timestamp) { //timestamp是整数,否则 ...
vue使用axios调用豆瓣API跨域问题
最近做了一个vue小demo,使用了豆瓣开源的API,通过ajax请求时需要跨域才能使用. 封面.jpg 一.以下是豆瓣常用的开源接口: 正在热映 :https://api.douban.com/ ...

python数据分析基础——pandas Tutorial

python数据分析基础——pandas Tutorial的更多相关文章

随机推荐

热门专题