python数据类型之pandas

DataFrame定义：

DataFrame是pandas的两个主要数据结构之一，另一个是Series

—一个表格型的数据结构

—含有一组有序的列

—大致可看成共享同一个index的Series集合

DataFrame创建方式：

默认方式创建：

>>> data = {'name':['Wangdachui','Linling','Niuyun'],'pay':[4000,5000,6000]}

>>> frame = pd.DataFrame(data)

>>> frame

         name   pay

0  Wangdachui  4000

1     Linling  5000

2      Niuyun  6000

传入索引的方式创建：

>>> data = np.array([('Wangdachui',4000),('Linling',5000),('Niuyun',6000)])

>>> frame = pd.DataFrame(data,index = range(1,4),columns=['name','pay'])

>>> frame

         name   pay

1  Wangdachui  4000

2     Linling  5000

3      Niuyun  6000

>>> frame.index

RangeIndex(start=1, stop=4, step=1)

>>> frame.columns

Index(['name', 'pay'], dtype='object')

>>> frame.values

array([['Wangdachui', ''],

       ['Linling', ''],

       ['Niuyun', '']], dtype=object)

DataFrame的基本操作：

取DataFrame对象的行和列

>>> frame
name pay
1 Wangdachui 4000
2 Linling 5000
3 Niuyun 6000

>>> frame['name']

1    Wangdachui

2       Linling

3        Niuyun

Name: name, dtype: object

>>> frame.pay

1    4000

2    5000

3    6000

Name: pay, dtype: object

取特定的行或列

>>> frame.iloc[:2,1]#取第0,1行的第1列

1    4000

2    5000

Name: pay, dtype: object

>>> frame.iloc[:1,0]#取第0行的第0列

1    Wangdachui

Name: name, dtype: object

>>> frame.iloc[2,1]#取第2行的第1列

''

>>> frame.iloc[2]#取第2行

name    Niuyun

pay       6000

Name: 3, dtype: object

DataFrame对象的修改和删除

>>> frame['name']= 'admin'

>>> frame

    name   pay

1  admin  4000

2  admin  5000

3  admin  6000

>>> del frame['pay']

>>> frame

    name

1  admin

2  admin

3  admin

DataFrame的统计功能

找最低工资和工资大于5000的人

>>> frame

         name   pay

1  Wangdachui  4000

2     Linling  5000

3      Niuyun  6000

>>> frame.pay.min()

''

>>> frame[frame.pay >= '']

      name   pay

2  Linling  5000

3   Niuyun  6000

案例：

已知有一个列表中存放了一组音乐数据：

music_data = [("the rolling stones","Satisfaction"),("Beatles","Let It Be"),("Guns N'Roses","Don't Cry"),("Metallica","Nothing Else Matters")]，请根据这组数据创建一个如下的DataFrame:

singer song_name
1 the rolling stones Satisfaction
2 Beatles Let It Be
3 Guns N'Roses 　　 Don't Cry
4 Metallica Nothing Else Matters

方法如下：

>>> import pandas as pd

>>> music_data = [("the rolling stones","Satisfaction"),("Beatles","Let It Be"),("Guns N'Roses","Don't Cry"),("Metallica","Nothing Else Matters")]

>>> music_table = pd.DataFrame(music_data)

>>> music_table

                    0                     1

0  the rolling stones          Satisfaction

1             Beatles             Let It Be

2        Guns N'Roses             Don't Cry

3           Metallica  Nothing Else Matters

>>> music_table.index = range(1,5)

>>> music_table.columns = ['singer','song_name']

>>> print(music_table)

               singer             song_name

1  the rolling stones          Satisfaction

2             Beatles             Let It Be

3        Guns N'Roses             Don't Cry

4           Metallica  Nothing Else Matters

DataFrame基本操作补充

DataFrame对象如下：

>>> frame

         name   pay

1  Wangdachui  4000

2     Linling  5000

3      Niuyun  6000

（1）添加列

添加列可以直接赋值，例如给frame添加tax列：

>>> frame['tax'] = [0.05,0.05,0.1]

>>> frame

         name   pay   tax

1  Wangdachui  4000  0.05

2     Linling  5000  0.05

3      Niuyun  6000  0.10

（2）添加行

添加行可以用loc（标签）和iloc（位置）索引，也可以用append（）和concat（）方法，这里用loc（）方法

>>> frame.loc[5] = {'name':'Liuxi','pay':5000,'tax':0.05}

>>> frame

         name   pay   tax

1  Wangdachui  4000  0.05

2     Linling  5000  0.05

3      Niuyun  6000  0.10

5       Liuxi  5000  0.05

（3）删除对象元素

删除数据可直接用“del数据”的方式进行，但这种方式是直接对原始数据操作，不安全，可利用drop()方法删除指定轴上的数据

>>> frame.drop(5)

         name   pay   tax

1  Wangdachui  4000  0.05

2     Linling  5000  0.05

3      Niuyun  6000  0.10

>>> frame.drop('tax',axis = 1)

         name   pay

1  Wangdachui  4000

2     Linling  5000

3      Niuyun  6000

5       Liuxi  5000

此时frame没有受影响

>>> frame

         name   pay   tax

1  Wangdachui  4000  0.05

2     Linling  5000  0.05

3      Niuyun  6000  0.10

5       Liuxi  5000  0.05

（4）修改

继承上面的frame，对tax统一修改成0.03

>>> frame['tax'] = 0.03

>>> frame

         name   pay   tax

1  Wangdachui  4000  0.03

2     Linling  5000  0.03

3      Niuyun  6000  0.03

5       Liuxi  5000  0.03

也可以直接用loc（）修改

>>> frame.loc[5] = ['Liuxi',9800,0.05]

>>> frame

         name   pay   tax

1  Wangdachui  4000  0.03

2     Linling  5000  0.03

3      Niuyun  6000  0.03

5       Liuxi  9800  0.05