Pandas选择与索引
Series和DataFrame对象与Numpy数组和标准Python字典的数据索引和选择模式一样。
字典形式选择和索引
Series
In [1]: import pandas as pd
In [2]: data = pd.Series([0.25,0.5,0.75,1.0],index=['a','b','c','d'])
In [3]: data
Out[3]:
a 0.25
b 0.50
c 0.75
d 1.00
dtype: float64
#用判断字典键的方法,判断某字段是否包含在Series的Index对象内
In [4]: 'a' in data
Out[4]: True
#用获取字典键的方法,获取Index对象
In [6]: data.keys()
Out[6]: Index(['a', 'b', 'c', 'd'], dtype='object')
#用获取字典键值对象的方法,获取Index/value映射对象
In [9]: list(data.items())
Out[9]: [('a', 0.25), ('b', 0.5), ('c', 0.75), ('d', 1.0)]
#用添加字典键值对象的方法,添加Series数据
In [10]: data['e'] = 2
In [11]: data
Out[11]:
a 0.25
b 0.50
c 0.75
d 1.00
e 2.00
dtype: float64
DataFrame:相当于若干个Series对象构成的字典
In [13]: area = pd.Series({'California':43231,'Texas':53432,'New York':67632,
'Florida':89734,'Illinois':63321})
In [14]: pop = pd.Series({'California':453231,'Texas':533432,'New York':676732,
'Florida':897534,'Illinois':633241})
#以Series对象为字典对象进行创建DataFrame对象
In [15]: data = pd.DataFrame({'area':area,'pop':pop})
In [16]: data
Out[16]:
area pop
California 43231 453231
Texas 53432 533432
New York 67632 676732
Florida 89734 897534
Illinois 63321 633241
#获取某列(相当于获取字典某个键值)
In [17]: data['area']
Out[17]:
California 43231
Texas 53432
New York 67632
Florida 89734
Illinois 63321
Name: area, dtype: int64
#以属性形式获取某列
In [18]: data.area
Out[18]:
California 43231
Texas 53432
New York 67632
Florida 89734
Illinois 63321
Name: area, dtype: int64
#以字典形式赋值一个新列
In [19]: data['density'] = data['pop']/data['area']
In [20]: data
Out[20]:
area pop density
California 43231 453231 10.483935
Texas 53432 533432 9.983381
New York 67632 676732 10.006092
Florida 89734 897534 10.002162
Illinois 63321 633241 10.000490
数组形式选择和索引
Series相当于Numpy的一维数组
In [48]: data
Out[48]:
a 0.25
b 0.50
c 0.75
d 1.00
dtype: float64
#以显式索引进行切片,包含最后一个索引对应的值
In [49]: data['a':'c']
Out[49]:
a 0.25
b 0.50
c 0.75
dtype: float64
#以隐式索引进行切片,不包含最后一个索引对应的值
In [50]: data[0:2]
Out[50]:
a 0.25
b 0.50
dtype: float64
#利用布尔索引进行取值
In [51]: data[(data > 0.3) & (data < 0.8)]
Out[51]:
b 0.50
c 0.75
dtype: float64
#利用花哨索引进行取值
In [52]: data[['a','d']]
Out[52]:
a 0.25
d 1.00
dtype: float64
DataFrame相当于Numpy的二维数组
In [22]: data
Out[22]:
area pop density
California 43231 453231 10.483935
Texas 53432 533432 9.983381
New York 67632 676732 10.006092
Florida 89734 897534 10.002162
Illinois 63321 633241 10.000490
#获取DataFrame的数组值,返回的是Numpy二维数组
In [23]: data.values
Out[23]:
array([[4.32310000e+04, 4.53231000e+05, 1.04839351e+01],
[5.34320000e+04, 5.33432000e+05, 9.98338075e+00],
[6.76320000e+04, 6.76732000e+05, 1.00060918e+01],
[8.97340000e+04, 8.97534000e+05, 1.00021619e+01],
[6.33210000e+04, 6.33241000e+05, 1.00004896e+01]])
#对DataFrame的数组进行装置,相当于Numpy二维数组的转置
In [24]: data.T
Out[24]:
California Texas New York Florida Illinois
area 43231.000000 53432.000000 67632.000000 89734.000000 63321.00000
pop 453231.000000 533432.000000 676732.000000 897534.000000 633241.00000
density 10.483935 9.983381 10.006092 10.002162 10.00049
#获得某一行的数组(方法同Numpy数组)
In [25]: data.values[0]
Out[25]: array([4.32310000e+04, 4.53231000e+05, 1.04839351e+01])
#布尔索引
In [27]: data[data > 10]
Out[27]:
area pop density
California 43231 453231 10.483935
Texas 53432 533432 NaN
New York 67632 676732 10.006092
Florida 89734 897534 10.002162
Illinois 63321 633241 10.000490
In [28]: data[data['density']>10]
Out[28]:
area pop density
California 43231 453231 10.483935
New York 67632 676732 10.006092
Florida 89734 897534 10.002162
Illinois 63321 633241 10.000490
索引选择器(iloc 和 loc)
为了避免在Index和Columns为整数时,造成切片和取值时的混乱,Panda提供了一些索引器(indexer)属性来进行取值。
iloc:隐私索引(顺序整数)
#Series
In [53]: data
Out[53]:
a 0.25
b 0.50
c 0.75
d 1.00
dtype: float64
In [54]: data.iloc[0:2]
Out[54]:
a 0.25
b 0.50
dtype: float64
In [55]: data.iloc[2]
Out[55]: 0.75
#DataFrame
In [22]: data
Out[22]:
area pop density
California 43231 453231 10.483935
Texas 53432 533432 9.983381
New York 67632 676732 10.006092
Florida 89734 897534 10.002162
Illinois 63321 633241 10.000490
In [31]: data.iloc[:3,:2]
Out[31]:
area pop
California 43231 453231
Texas 53432 533432
New York 67632 676732
In [32]: data.iloc[0,2] = 100
In [33]: data
Out[33]:
area pop density
California 43231 453231 100.000000
Texas 53432 533432 9.983381
New York 67632 676732 10.006092
Florida 89734 897534 10.002162
Illinois 63321 633241 10.000490
In [34]: data.iloc[3,2]
Out[34]: 10.002161945305012
In [35]: data.iloc[:3,2]
Out[35]:
California 100.000000
Texas 9.983381
New York 10.006092
Name: density, dtype: float64
In [36]: data.iloc[3,:2]
Out[36]:
area 89734.0
pop 897534.0
Name: Florida, dtype: float64
#切出的是DataFrame结构
In [37]: data.iloc[2:3,:2]
Out[37]:
area pop
New York 67632 676732
loc:显示索引(自定义)
#Series
In [53]: data
Out[53]:
a 0.25
b 0.50
c 0.75
d 1.00
dtype: float64
In [56]: data.loc['b']
Out[56]: 0.5
In [57]: data.loc['a':'b']
Out[57]:
a 0.25
b 0.50
dtype: float64
In [58]: data.loc[['a','c']]
Out[58]:
a 0.25
c 0.75
dtype: float64
#DataFrame
In [38]: data
Out[38]:
area pop density
California 43231 453231 100.000000
Texas 53432 533432 9.983381
New York 67632 676732 10.006092
Florida 89734 897534 10.002162
Illinois 63321 633241 10.000490
#显性索引切片时与整数切片不同,最后一个切片位置也要包含在内
In [39]: data['California':'Texas']
Out[39]:
area pop density
California 43231 453231 100.000000
Texas 53432 533432 9.983381
In [40]: data.loc['California':'Texas']
Out[40]:
area pop density
California 43231 453231 100.000000
Texas 53432 533432 9.983381
In [41]: data.loc[data['density']>10,['area','density']]
Out[41]:
area density
California 43231 100.000000
New York 67632 10.006092
Florida 89734 10.002162
Illinois 63321 10.000490
In [44]: data.loc[:,['pop']]
Out[44]:
pop
California 453231
Texas 533432
New York 676732
Florida 897534
Illinois 633241
多级索引应用(Hierarchical Indexing)
为解决高维数组表达的需求而引入多级索引(Hierarchical indexing)。Pandas的MultiIndex类型提供了丰富的操作手法。
多级索引创建
#直接创建,index写成多级模式,MultiIndex在后台自动创建
In [3]: data = pd.DataFrame(np.random.rand(4,2),
index = [['a','a','b','b'],[1,2,1,2]],
columns=['data1','data2'])
In [4]: data
Out[4]:
data1 data2
a 1 0.154166 0.773928
2 0.859236 0.777436
b 1 0.468589 0.010060
2 0.849230 0.929585
#使用元组字典进行创建,自动把元组元素分割转换成MultiIndex
In [5]: data = {('California',2000):3334323,('California',2010):33444332,
('Texas',2000):5555999,('Texas',2010):7778839}
In [6]: pd.Series(data)
Out[6]:
California 2000 3334323
2010 33444332
Texas 2000 5555999
2010 7778839
dtype: int64
#使用列表进行创建
In [59]: pd.MultiIndex.from_arrays([['a','a','b','b'],[1,2,1,2]])
Out[59]:
MultiIndex([('a', 1),
('a', 2),
('b', 1),
('b', 2)],)
#使用元组进行创建
In [60]: pd.MultiIndex.from_tuples([('a',1),('a',2),('b',1),('b',2)])
Out[60]:
MultiIndex([('a', 1),
('a', 2),
('b', 1),
('b', 2)],
)
#使用笛卡尔积进行创建
In [62]: pd.MultiIndex.from_product([['a','b'],[1,2]])
Out[62]:
MultiIndex([('a', 1),
('a', 2),
('b', 1),
('b', 2)],
)
#直接利用levels和labels(codes)进行创建
In [63]: pd.MultiIndex(levels=[['a','b'],[1,2]],labels=[[0,0,1,1],[0,1,0,1]])
D:\Anaconda\install\Scripts\ipython:1: FutureWarning: the 'labels' keyword is deprecated, use 'codes' instead
Out[63]:
MultiIndex([('a', 1),
('a', 2),
('b', 1),
('b', 2)],
)
In [64]: pd.MultiIndex(levels=[['a','b'],[1,2]],codes=[[0,0,1,1],[0,1,0,1]])
Out[64]:
MultiIndex([('a', 1),
('a', 2),
('b', 1),
('b', 2)],
)
多级索引取值与切片
Series
In [1]: import pandas as pd
In [2]: pop_index = pd.MultiIndex.from_product([['California','Texas','New York'],
['2000','2010']])
In [3]: data = [23343,33453,78893,34569,56493,89392]
In [4]: pop = pd.Series(data, index=pop_index, name='population')
In [5]: pop
Out[5]:
California 2000 23343
2010 33453
Texas 2000 78893
2010 34569
New York 2000 56493
2010 89392
Name: population, dtype: int64
#添加索引名称
In [6]: pop.index.names = ['state','year']
In [7]: pop
Out[7]:
state year
California 2000 23343
2010 33453
Texas 2000 78893
2010 34569
New York 2000 56493
2010 89392
Name: population, dtype: int64
In [23]: pop
Out[23]:
state year
California 2000 23343
2010 33453
Texas 2000 78893
2010 34569
New York 2000 56493
2010 89392
Name: population, dtype: int64
#利用显性索引进行局部切片时,需要MultiIndex按顺序排列的(利用sort_index函数实现)
In [25]: pop
Out[25]:
state year
California 2000 23343
2010 33453
Texas 2000 78893
2010 34569
New York 2000 56493
2010 89392
Name: population, dtype: int64
In [26]: pop = pop.sort_index()
In [27]: pop
Out[27]:
state year
California 2000 23343
2010 33453
New York 2000 56493
2010 89392
Texas 2000 78893
2010 34569
Name: population, dtype: int64
#显性索引局部切片
In [28]: pop.loc['California':'New York']
Out[28]:
state year
California 2000 23343
2010 33453
New York 2000 56493
2010 89392
Name: population, dtype: int64
In [29]: pop.loc['California':'New York','2000']
Out[29]:
state year
California 2000 23343
New York 2000 56493
Name: population, dtype: int64
#显性+花哨索引取值
In [30]: pop.loc[['California','Texas']]
Out[30]:
state year
California 2000 23343
2010 33453
Texas 2000 78893
2010 34569
Name: population, dtype: int64
#隐性索引局部切片
In [32]: pop.iloc[:2]
Out[32]:
state year
California 2000 23343
2010 33453
Name: population, dtype: int64
#布尔索引取值
In [33]: pop[pop > 30000]
Out[33]:
state year
California 2010 33453
New York 2000 56493
2010 89392
Texas 2000 78893
2010 34569
Name: population, dtype: int64
DataFrame
In [1]: import pandas as pd
In [4]: import numpy as np
#行标签
In [2]: index = pd.MultiIndex.from_product([[2019,2020],[1,2,3,4]],
names=['year','quarter'])
#列标签
In [3]: columns = pd.MultiIndex.from_product([['Bill','Andy','Coco'],
['Temp','HR']],names=['name','type'])
#获取模拟数据
In [12]: data = np.round(np.random.randn(8,6),1)
In [15]: data[:,::2] *10
Out[15]:
array([[ 8., -5., -10.],
[ 4., 22., 4.],
[ 14., -19., -18.],
[ 8., -1., -9.],
[ 2., -5., 4.],
[ 17., 7., 4.],
[ 0., 3., 2.],
[ 3., 11., -11.]])
In [17]: data += 37
In [18]: data
Out[18]:
array([[45. , 35.4, 32. , 38.1, 27. , 37.7],
[41. , 36.9, 59. , 35.4, 41. , 37.1],
[51. , 37.1, 18. , 37.5, 19. , 36.7],
[45. , 37.6, 36. , 37.6, 28. , 37.5],
[39. , 35.9, 32. , 37.2, 41. , 38. ],
[54. , 36.8, 44. , 38.7, 41. , 36.9],
[37. , 37. , 40. , 35.1, 39. , 37.9],
[40. , 37.5, 48. , 37.5, 26. , 35.7]])
#创建带行列标签的DataFrame
In [19]: health_data = pd.DataFrame(data, index=index, columns=columns)
In [20]: health_data
Out[20]:
name Bill Andy Coco
type Temp HR Temp HR Temp HR
year quarter
2019 1 45.0 35.4 32.0 38.1 27.0 37.7
2 41.0 36.9 59.0 35.4 41.0 37.1
3 51.0 37.1 18.0 37.5 19.0 36.7
4 45.0 37.6 36.0 37.6 28.0 37.5
2020 1 39.0 35.9 32.0 37.2 41.0 38.0
2 54.0 36.8 44.0 38.7 41.0 36.9
3 37.0 37.0 40.0 35.1 39.0 37.9
4 40.0 37.5 48.0 37.5 26.0 35.7
#直接根据标签名称索引获取值
In [21]: health_data['Bill']
Out[21]:
type Temp HR
year quarter
2019 1 45.0 35.4
2 41.0 36.9
3 51.0 37.1
4 45.0 37.6
2020 1 39.0 35.9
2 54.0 36.8
3 37.0 37.0
4 40.0 37.5
In [22]: health_data['Bill', 'HR']
Out[22]:
year quarter
2019 1 35.4
2 36.9
3 37.1
4 37.6
2020 1 35.9
2 36.8
3 37.0
4 37.5
Name: (Bill, HR), dtype: float64
#隐性索引获取值
In [24]: health_data.iloc[:2,:2]
Out[24]:
name Bill
type Temp HR
year quarter
2019 1 45.0 35.4
2 41.0 36.9
#显性索引获取值
In [23]: health_data.loc[:,('Bill','HR')]
Out[23]:
year quarter
2019 1 35.4
2 36.9
3 37.1
4 37.6
2020 1 35.9
2 36.8
3 37.0
4 37.5
Name: (Bill, HR), dtype: float64
#使用IndexSlice对象对显性索引进行分割
In [25]: idx = pd.IndexSlice
In [28]: health_data.loc[idx[:,1],idx[:,'HR']]
Out[28]:
name Bill Andy Coco
type HR HR HR
year quarter
2019 1 35.4 38.1 37.7
2020 1 35.9 37.2 38.0
In [29]: health_data.loc[idx[:,2],idx[:,'HR']]
Out[29]:
name Bill Andy Coco
type HR HR HR
year quarter
2019 2 36.9 35.4 37.1
2020 2 36.8 38.7 36.9
多级索引排序和转换
In [18]: frame = pd.DataFrame(np.arange(12).reshape((4,3)),
...: index=[list('aabb'),list('1212')],
...: columns = [['Ohio','Ohio','Colorado'],
...: ['Green','Red','Green']])
In [19]: frame
Out[19]:
Ohio Colorado
Green Red Green
a 1 0 1 2
2 3 4 5
b 1 6 7 8
2 9 10 11
In [20]: frame.index.names = ['key1','key2']
In [21]: frame.columns.names = ['state','color']
In [22]: frame
Out[22]:
state Ohio Colorado
color Green Red Green
key1 key2
a 1 0 1 2
2 3 4 5
b 1 6 7 8
2 9 10 11
#交换Index
In [26]: frame.swaplevel('key1','key2')
Out[26]:
state Ohio Colorado
color Green Red Green
key2 key1
1 a 0 1 2
2 a 3 4 5
1 b 6 7 8
2 b 9 10 11
In [27]: frame
Out[27]:
state Ohio Colorado
color Green Red Green
key1 key2
a 1 0 1 2
2 3 4 5
b 1 6 7 8
2 9 10 11
In [28]: frame.sort_index(level=1)
Out[28]:
state Ohio Colorado
color Green Red Green
key1 key2
a 1 0 1 2
b 1 6 7 8
a 2 3 4 5
b 2 9 10 11
In [29]: frame.swaplevel(0,1).sort_index(level=0)
Out[29]:
state Ohio Colorado
color Green Red Green
key2 key1
1 a 0 1 2
b 6 7 8
2 a 3 4 5
b 9 10 11
多级索引行列转换和重置(stack、unstack、set_index、reset_index)
Series
In [6]: pop
Out[6]:
state year
California 2000 23343
2010 33453
Texas 2000 78893
2010 34569
New York 2000 56493
2010 89392
Name: population, dtype: int64
#通过level参数设置转换索引的层级
In [7]: pop.unstack(level=0)
Out[7]:
state California New York Texas
year
2000 23343 56493 78893
2010 33453 89392 34569
In [8]: pop.unstack(level=1)
Out[8]:
year 2000 2010
state
California 23343 33453
New York 56493 89392
Texas 78893 34569
In [9]: pop.unstack(level='year')
Out[9]:
year 2000 2010
state
California 23343 33453
New York 56493 89392
Texas 78893 34569
In [10]: pop.unstack(level='year').stack(level='year')
Out[10]:
state year
California 2000 23343
2010 33453
New York 2000 56493
2010 89392
Texas 2000 78893
2010 34569
dtype: int64
#reset_index()把层级索引变成列标签
In [11]: pop_flat = pop.reset_index(name='population')
In [12]: pop_flat
Out[12]:
state year population
0 California 2000 23343
1 California 2010 33453
2 Texas 2000 78893
3 Texas 2010 34569
4 New York 2000 56493
5 New York 2010 89392
#set_index()把列标签设置成层级索引
In [13]: pop_flat.set_index(['state','year'])
Out[13]:
population
state year
California 2000 23343
2010 33453
Texas 2000 78893
2010 34569
New York 2000 56493
2010 89392
DataFrame
In [30]: health_data
Out[30]:
name Bill Andy Coco
type Temp HR Temp HR Temp HR
year quarter
2019 1 45.0 35.4 32.0 38.1 27.0 37.7
2 41.0 36.9 59.0 35.4 41.0 37.1
3 51.0 37.1 18.0 37.5 19.0 36.7
4 45.0 37.6 36.0 37.6 28.0 37.5
2020 1 39.0 35.9 32.0 37.2 41.0 38.0
2 54.0 36.8 44.0 38.7 41.0 36.9
3 37.0 37.0 40.0 35.1 39.0 37.9
4 40.0 37.5 48.0 37.5 26.0 35.7
#从列堆积到行:stack(level=*),*表示索引的序号,用隐性索引则自上而下从0开始索引,用显性索引可直接赋值索引名称
#默认level=1
In [31]: health_data.stack()
Out[31]:
name Andy Bill Coco
year quarter type
2019 1 HR 38.1 35.4 37.7
Temp 32.0 45.0 27.0
2 HR 35.4 36.9 37.1
Temp 59.0 41.0 41.0
3 HR 37.5 37.1 36.7
Temp 18.0 51.0 19.0
4 HR 37.6 37.6 37.5
Temp 36.0 45.0 28.0
2020 1 HR 37.2 35.9 38.0
Temp 32.0 39.0 41.0
2 HR 38.7 36.8 36.9
Temp 44.0 54.0 41.0
3 HR 35.1 37.0 37.9
Temp 40.0 37.0 39.0
4 HR 37.5 37.5 35.7
Temp 48.0 40.0 26.0
In [33]: health_data.stack(level=1)
Out[33]:
name Andy Bill Coco
year quarter type
2019 1 HR 38.1 35.4 37.7
Temp 32.0 45.0 27.0
2 HR 35.4 36.9 37.1
Temp 59.0 41.0 41.0
3 HR 37.5 37.1 36.7
Temp 18.0 51.0 19.0
4 HR 37.6 37.6 37.5
Temp 36.0 45.0 28.0
2020 1 HR 37.2 35.9 38.0
Temp 32.0 39.0 41.0
2 HR 38.7 36.8 36.9
Temp 44.0 54.0 41.0
3 HR 35.1 37.0 37.9
Temp 40.0 37.0 39.0
4 HR 37.5 37.5 35.7
Temp 48.0 40.0 26.0
In [32]: health_data.stack(level=0)
Out[32]:
type HR Temp
year quarter name
2019 1 Andy 38.1 32.0
Bill 35.4 45.0
Coco 37.7 27.0
2 Andy 35.4 59.0
Bill 36.9 41.0
Coco 37.1 41.0
3 Andy 37.5 18.0
Bill 37.1 51.0
Coco 36.7 19.0
4 Andy 37.6 36.0
Bill 37.6 45.0
Coco 37.5 28.0
2020 1 Andy 37.2 32.0
Bill 35.9 39.0
Coco 38.0 41.0
2 Andy 38.7 44.0
Bill 36.8 54.0
Coco 36.9 41.0
3 Andy 35.1 40.0
Bill 37.0 37.0
Coco 37.9 39.0
4 Andy 37.5 48.0
Bill 37.5 40.0
Coco 35.7 26.0
In [38]: health_data.stack(level='name')
Out[38]:
type HR Temp
year quarter name
2019 1 Andy 38.1 32.0
Bill 35.4 45.0
Coco 37.7 27.0
2 Andy 35.4 59.0
Bill 36.9 41.0
Coco 37.1 41.0
3 Andy 37.5 18.0
Bill 37.1 51.0
Coco 36.7 19.0
4 Andy 37.6 36.0
Bill 37.6 45.0
Coco 37.5 28.0
2020 1 Andy 37.2 32.0
Bill 35.9 39.0
Coco 38.0 41.0
2 Andy 38.7 44.0
Bill 36.8 54.0
Coco 36.9 41.0
3 Andy 35.1 40.0
Bill 37.0 37.0
Coco 37.9 39.0
4 Andy 37.5 48.0
Bill 37.5 40.0
Coco 35.7 26.0
#从行散开到列:unstack(level=*),*表示索引的序号,用隐性索引则自左而右从0开始索引,用显性索引可直接赋值索引名称
#默认level=1
In [35]: health_data.unstack(level=0)
Out[35]:
name Bill Andy Coco
type Temp HR Temp HR Temp HR
year 2019 2020 2019 2020 2019 2020 2019 2020 2019 2020 2019 2020
quarter
1 45.0 39.0 35.4 35.9 32.0 32.0 38.1 37.2 27.0 41.0 37.7 38.0
2 41.0 54.0 36.9 36.8 59.0 44.0 35.4 38.7 41.0 41.0 37.1 36.9
3 51.0 37.0 37.1 37.0 18.0 40.0 37.5 35.1 19.0 39.0 36.7 37.9
4 45.0 40.0 37.6 37.5 36.0 48.0 37.6 37.5 28.0 26.0 37.5 35.7
In [36]: health_data.unstack(level=1)
Out[36]:
name Bill Andy ... Coco
type Temp HR Temp ... HR Temp HR
quarter 1 2 3 4 1 2 3 4 1 ... 4 1 2 3 4 1 2 3 4
year ...
2019 45.0 41.0 51.0 45.0 35.4 36.9 37.1 37.6 32.0 ... 37.6 27.0 41.0 19.0 28.0 37.7 37.1 36.7 37.5
2020 39.0 54.0 37.0 40.0 35.9 36.8 37.0 37.5 32.0 ... 37.5 41.0 41.0 39.0 26.0 38.0 36.9 37.9 35.7
[2 rows x 24 columns]
In [39]: health_data.unstack(level='year')
Out[39]:
name Bill Andy Coco
type Temp HR Temp HR Temp HR
year 2019 2020 2019 2020 2019 2020 2019 2020 2019 2020 2019 2020
quarter
1 45.0 39.0 35.4 35.9 32.0 32.0 38.1 37.2 27.0 41.0 37.7 38.0
2 41.0 54.0 36.9 36.8 59.0 44.0 35.4 38.7 41.0 41.0 37.1 36.9
3 51.0 37.0 37.1 37.0 18.0 40.0 37.5 35.1 19.0 39.0 36.7 37.9
4 45.0 40.0 37.6 37.5 36.0 48.0 37.6 37.5 28.0 26.0 37.5 35.7
多级索引数据统计方法
In [23]: health_data
Out[23]:
name Bill Andy Coco
type Temp HR Temp HR Temp HR
year quarter
2019 1 37.1 35.3 37.2 37.5 36.2 37.6
2 36.9 37.0 37.8 35.9 37.5 36.6
3 36.8 34.7 35.4 37.5 37.0 38.2
4 36.8 37.6 36.2 37.4 36.4 36.7
2020 1 37.5 34.8 36.8 36.5 36.3 37.4
2 36.2 36.7 37.3 37.2 36.3 36.4
3 37.7 38.1 36.9 37.7 36.6 36.1
4 36.5 36.3 37.6 36.3 35.5 37.3
#level参数设置累计的索引层级
In [24]: data_mean = health_data.mean(level='year')
In [25]: data_mean
Out[25]:
name Bill Andy Coco
type Temp HR Temp HR Temp HR
year
2019 36.900 36.150 36.65 37.075 36.775 37.275
2020 36.975 36.475 37.15 36.925 36.175 36.800
#axis参数设置累计的轴
In [26]: data_mean.mean(axis=1,level='type')
Out[26]:
type Temp HR
year
2019 36.775000 36.833333
2020 36.766667 36.733333
Pandas选择与索引的更多相关文章
- 如何选择普通索引和唯一索引《死磕MySQL系列 五》
系列文章 一.原来一条select语句在MySQL是这样执行的<死磕MySQL系列 一> 二.一生挚友redo log.binlog<死磕MySQL系列 二> 三.MySQL强 ...
- pandas选择数据-【老鱼学pandas】
选择列 根据列名来选择某列的数据 import pandas as pd import numpy as np dates = pd.date_range("2017-01-08" ...
- pandas选择单元格,选择行列
首先创建示例df: df = pd.DataFrame(np.arange(16).reshape(4, 4), columns=list('ABCD'), index=list('5678')) d ...
- pandas 之 多层索引
In many applications, data may be spread across a number of files or datasets or be arranged in a fo ...
- numpy和pandas的基础索引切片
Numpy的索引切片 索引 In [72]: arr = np.array([[[1,1,1],[2,2,2]],[[3,3,3],[4,4,4]]]) In [73]: arr Out[73]: a ...
- 利用Python进行数据分析(11) pandas基础: 层次化索引
层次化索引 层次化索引指你能在一个数组上拥有多个索引,例如: 有点像Excel里的合并单元格对么? 根据索引选择数据子集 以外层索引的方式选择数据子集: 以内层索引的方式选择数据: 多重索引S ...
- 深入学习jQuery选择器系列第三篇——过滤选择器之索引选择器
× 目录 [1]通用形式 [2]首尾索引 [3]奇偶索引[4]范围索引 前面的话 上一篇介绍了过滤选择器中的子元素选择器部分,本文开始介绍极易与之混淆的索引选择器 通用形式 $(':eq(index) ...
- 但从谈论性能点SQL Server选择聚集索引键
简单介绍 在SQL Server中,数据是按页进行存放的.而为表加上聚集索引后,SQL Server对于数据的查找就是依照聚集索引的列作为keyword进行了. 因此对于聚集索引的选择对性能的影响就变 ...
- pandas 选择列或者添加列生成新的DataFrame
选择某些列 import pandas as pd # 从Excel中读取数据,生成DataFrame数据 # 导入Excel路径和sheet name df = pd.read_excel(exce ...
- pandas基础用法——索引
# -*- coding: utf-8 -*- # Time : 2016/11/28 15:14 # Author : XiaoDeng # version : python3.5 # Softwa ...
随机推荐
- gin中间件&控制器
中间件 Gin 框架允许开发者在处理请求的过程中,加入用户自己的钩子(Hook)函数.这个钩子函数就叫中间件,中间件适合处理一些公共的业务逻辑,比如登录认证.权限校验.数据分页.记录日志.耗时统计等. ...
- 前端面试题(四)—— 事件委托(Event Delegation)
一.什么是事件委托 事件委托(Event Delegation)是一种常用的技术. 它利用事件冒泡的特性,在父元素上监听事件,而不是在子元素上直接添加事件监听器. 通过在父元素上捕获事件,然后根据事件 ...
- 【Azure 存储服务】Azure Storage Account 下的 Table 查询的性能调优
问题描述 Azure Storage Account 下的 Table 查询的性能调优? 问题解答 因为Azure Storage Table服务(表服务) 与常规的关系型数据库不一样(例如:MySQ ...
- 【Azure 应用服务】Azure SignalR 是否可以同时支持近十万人在线互动
什么是 Azure SignalR 服务? Azure SignalR Service 简化了通过 HTTP 向应用程序添加实时 Web 功能的过程. 这种实时功能允许服务将内容更新推送到连接的客户端 ...
- xml中xsd、xsi、xmlns的含义
XML是可扩展标记语言,它定义了按格式编码文件的一系列规则[3],编码的文件是机器可读和人可读的.XML文件对于机器可读是基于XSD(XML Schema Definition)[1]的.XSD是受W ...
- wsl使用记录
# wsl使用记录 安装 直接参考微软官方文档使用 WSL 在 Windows 上安装 Linux ubuntu可用发行版安装 https://ubuntu.com/wsl 访问 方式一:在资源管理器 ...
- C++B树的实现
B树的实现 今天我们就来实现以下B树,B树有什么特点那?我们来列举一下 每个非叶子节点中存放若干关键字数据,并且有若干指向儿子节点的指针.指针数目=关键字数目+1 根节点有最少1个,最多m-1个关键字 ...
- 安装swagger过程及遇到问题
参考:欢迎点击原文:https://www.cnblogs.com/liconglong/p/11477401.html(主要) https://blog.csdn.net/liangjiabao55 ...
- LinuxDNS分析从入门到放弃(记一次有趣的dns问题排查记录,ping 源码分析,getaddrinfo源码分析)
PS:要转载请注明出处,本人版权所有. PS: 这个只是基于<我自己>的理解, 如果和你的原则及想法相冲突,请谅解,勿喷. 环境说明 ubuntu 18.04 前言 我们这里有一块 ...
- Cesium之CustomShader
1. 引言 Cesium自1.87.1版本,开始支持3DTileset使用CustomShader: Added CustomShader class for styling Cesium3DTile ...