一:改变索引

  reindex方法对于Series直接索引,对于DataFrame既可以改变行索引,也可以改变列索引,还可以两个一起改变.

  1)对于Series

 In [2]: seri = pd.Series([4.5,7.2,-5.3,3.6],index = ['d','b','a','c'])

 In [3]: seri
Out[3]:
d 4.5
b 7.2
a -5.3
c 3.6
dtype: float64 In [4]: seri1 = seri.reindex(['a','b','c','d','e']) In [5]: seri1
Out[5]:
a -5.3
b 7.2
c 3.6
d 4.5
e NaN #没有的即为NaN
dtype: float64 In [6]: seri.reindex(['a','b','c','d','e'], fill_value=0)
Out[6]:
a -5.3
b 7.2
c 3.6
d 4.5
e 0.0 #没有的填充为0
dtype: float64 In [7]: seri
Out[7]:
d 4.5
b 7.2
a -5.3
c 3.6
dtype: float64 In [8]: seri_2 = pd.Series(['blue','purple','yellow'], index=[0,2,4]) In [9]: seri_2
Out[9]:
0 blue
2 purple
4 yellow
dtype: object #reindex可用的方法:ffill为向前填充,bfill为向后填充 In [10]: seri_2.reindex(range(6),method='ffill')
Out[10]:
0 blue
1 blue
2 purple
3 purple
4 yellow
5 yellow
dtype: object In [11]: seri_2.reindex(range(6),method='bfill')
Out[11]:
0 blue
1 purple
2 purple
3 yellow
4 yellow
5 NaN
dtype: object

Series的改变索引

  2)对于DataFrame

    其reindex的函数参数:method="ffill/bfill";fill_value=...[若为NaN时的填充值];......

 In [4]: dframe_1 = pd.DataFrame(np.arange(9).reshape((3,3)),index=['a','b','c'],
columns=['Ohio','Texas','Cal'])
In [5]: dframe_1
Out[5]:
Ohio Texas Cal
a 0 1 2
b 3 4 5
c 6 7 8 In [6]: dframe_2 = dframe_1.reindex(['a','b','c','d']) In [7]: dframe_2
Out[7]:
Ohio Texas Cal
a 0 1 2
b 3 4 5
c 6 7 8
d NaN NaN NaN In [16]: dframe_1.reindex(index=['a','b','c','d'],method='ffill',columns=['Ohio'
,'Beijin','Cal'])
Out[16]:
Ohio Beijin Cal
a 0 NaN 2
b 3 NaN 5
c 6 NaN 8
d 6 NaN 8 In [17]: dframe_1.reindex(index=['a','b','c','d'],fill_value='Z',columns=['Ohio'
Out[17]: ,'Cal'])
Ohio Beijin Cal
a 0 Z 2
b 3 Z 5
c 6 Z 8
d Z Z Z In [8]: dframe_1.reindex(columns=['Chengdu','Beijin','Shanghai','Guangdong'])
Out[8]:
Chengdu Beijin Shanghai Guangdong
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN In [9]: dframe_1
Out[9]:
Ohio Texas Cal
a 0 1 2
b 3 4 5
c 6 7 8 #用ix关键字同时改变行/列索引
In [10]: dframe_1.ix[['a','b','c','d'],['Ohio','Beijing','Guangdong']]
Out[10]:
Ohio Beijing Guangdong
a 0 NaN NaN
b 3 NaN NaN
c 6 NaN NaN
d NaN NaN NaN

DataFrame的改变索引

二:丢弃指定轴的数据

  drop方法, 通过索引删除

  1)对于Series

 In [21]: seri = pd.Series(np.arange(5),index=['a','b','c','d','e'])

 In [22]: seri
Out[22]:
a 0
b 1
c 2
d 3
e 4
dtype: int32 In [23]: seri.drop('b')
Out[23]:
a 0
c 2
d 3
e 4
dtype: int32 In [24]: seri.drop(['d','e'])
Out[24]:
a 0
b 1
c 2
dtype: int32

Series的删除数据

  2)对于DataFrame

 In [29]: dframe = pd.DataFrame(np.arange(16).reshape((4,4)),index=['Chen','Bei',
'Shang','Guang'],columns=['one','two','three','four']) In [30]: dframe
Out[30]:
one two three four
Chen 0 1 2 3
Bei 4 5 6 7
Shang 8 9 10 11
Guang 12 13 14 15 #删除行
In [31]: dframe.drop(['Bei','Shang'])
Out[31]:
one two three four
Chen 0 1 2 3
Guang 12 13 14 15 #删除列
In [33]: dframe.drop(['two','three'],axis=1)
Out[33]:
one four
Chen 0 3
Bei 4 7
Shang 8 11
Guang 12 15 #若第一个参数只有一个时可以不要【】

DataFrame的删除数据

三:索引,选取,过滤

  1)Series

    仍然可以向list那些那样用下标访问,不过我觉得不太还,最好还是选择用索引值来进行访问,并且索引值也可用于切片

In [4]: seri = pd.Series(np.arange(4),index=['a','b','c','d'])

In [5]: seri
Out[5]:
a 0
b 1
c 2
d 3
dtype: int32 In [6]: seri['a']
Out[6]: 0 In [7]: seri[['b','a']] #显示顺序也变了
Out[7]:
b 1
a 0
dtype: int32 In [18]: seri[seri<2] #!!元素级别运算!!
Out[18]:
a 0
b 1
dtype: int32 In [11]: seri['a':'c'] #索引用于切片
Out[11]:
a 0
b 1
c 2
dtype: int32 In [12]: seri['a':'c']='z' In [13]: seri
Out[13]:
a z
b z
c z
d 3
dtype: object

Series选取

  2)DataFrame

    其实就是获取一个或多个列的问题。需要注意的是,其实DataFrame可以看作多列索引相同的Series组成的,对应DataFrame数据来说,其首行横向的字段才应该看作是他的索引,所以通过dframe【【n个索引值】】可以选出多列Series,而其中的索引值必须是首行横向的字段,否者报错。而想要取列的话可以通过切片完成,如dframe[:2]选出第0和1行。通过ix【参数1(x),参数2(y)】可以在两个方向上进行选取。

 In [19]: dframe = pd.DataFrame(np.arange(16).reshape((4,4)),index=['one','two','
three','four'],columns=['Bei','Shang','Guang','Sheng']) In [21]: dframe
Out[21]:
Bei Shang Guang Sheng
one 0 1 2 3
two 4 5 6 7
three 8 9 10 11
four 12 13 14 15 In [22]: dframe[['one']] #即是开头讲的索引值用的不正确而报错
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-22-c2522043b676> in <module>()
----> 1 dframe[['one']] In [25]: dframe[['Bei']]
Out[25]:
Bei
one 0
two 4
three 8
four 12 In [26]: dframe[['Bei','Sheng']]
Out[26]:
Bei Sheng
one 0 3
two 4 7
three 8 11
four 12 15 In [27]: dframe[:2] #取行
Out[27]:
Bei Shang Guang Sheng
one 0 1 2 3
two 4 5 6 7 In [32]: #为了在DataFrame中引入标签索引,用ix字段,其第一个参数是对行的控制,第二个为对列的控制 In [33]: dframe.ix[['one','two'],['Bei','Shang']]
Out[33]:
Bei Shang
one 0 1
two 4 5 #有此可看出横向的每个字段为dframe实例的属性
In [35]: dframe.Bei
Out[35]:
one 0
two 4
three 8
four 12
Name: Bei, dtype: int32 In [36]: dframe[dframe.Bei<5]
Out[36]:
Bei Shang Guang Sheng
one 0 1 2 3
two 4 5 6 7 In [38]: dframe.ix[dframe.Bei<5,:2]
Out[38]:
Bei Shang
one 0 1
two 4 5 In [43]: dframe.ix[:'two',['Shang','Bei']]
Out[43]:
Shang Bei
one 1 0
two 5 4

DataFrame选取

四:算术运算

  1)Series

    在运算时会自动按索引对齐后再运算,且在索引值不重叠时产生的运算结果是NaN值, 用运算函数时可以避免此情况。

 In [4]: seri_1 = pd.Series([1,2,3,4],index = ['a','b','c','d'])

 In [5]: seri_2 = pd.Series([5,6,7,8,9],index = ['a','c','e','g','f'])

 In [6]: seri_1 + seri_2
Out[6]:
a 6
b NaN
c 9
d NaN
e NaN
f NaN
g NaN
dtype: float64 In [8]: seri_1.add(seri_2)
Out[8]:
a 6
b NaN
c 9
d NaN
e NaN
f NaN
g NaN
dtype: float64 In [7]: seri_1.add(seri_2,fill_value = 0)
Out[7]:
a 6
b 2
c 9
d 4
e 7
f 9
g 8
dtype: float64 #上面的未重叠区依然有显示值而不是NaN!!
#对应的方法是:add:+; mul: X; sub: -; div : /

Series算术运算

  2)DataFrame

 In [10]: df_1 = pd.DataFrame(np.arange(12).reshape((3,4)),columns = list('abcd')
)
In [11]: df_2 = pd.DataFrame(np.arange(20).reshape((4,5)),columns = list('abcde'
))
In [12]: df_1 + df_2
Out[12]:
a b c d e
0 0 2 4 6 NaN
1 9 11 13 15 NaN
2 18 20 22 24 NaN
3 NaN NaN NaN NaN NaN In [13]: df_1.add(df_2)
Out[13]:
a b c d e
0 0 2 4 6 NaN
1 9 11 13 15 NaN
2 18 20 22 24 NaN
3 NaN NaN NaN NaN NaN In [14]: df_1.add(df_2, fill_value = 0)
Out[14]:
a b c d e
0 0 2 4 6 4
1 9 11 13 15 9
2 18 20 22 24 14
3 15 16 17 18 19

DataFrame算术运算

  3)DataFrame与Series之间进行运算

  类似:np.array

 In [15]: arr_1 = np.arange(12).reshape((3,4))

 In [16]: arr_1 - arr_1[0]
Out[16]:
array([[0, 0, 0, 0],
[4, 4, 4, 4],
[8, 8, 8, 8]]) In [17]: arr_1
Out[17]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])

array型

 In [18]: dframe_1 = pd.DataFrame(np.arange(12).reshape((4,3)),columns=list('bde'
),index = ['Chen','Bei','Shang','Sheng'])
In [19]: dframe_1
Out[19]:
b d e
Chen 0 1 2
Bei 3 4 5
Shang 6 7 8
Sheng 9 10 11 In [20]: seri = dframe_1.ix[0] In [21]: seri
Out[21]:
b 0
d 1
e 2
Name: Chen, dtype: int32 In [22]: dframe_1 - seri #每行匹配的进行运算
Out[22]:
b d e
Chen 0 0 0
Bei 3 3 3
Shang 6 6 6
Sheng 9 9 9 In [23]: seri_2 = pd.Series(range(3),index=['b','e','f']) In [24]: dframe_1 - seri_2
Out[24]:
b d e f
Chen 0 NaN 1 NaN
Bei 3 NaN 4 NaN
Shang 6 NaN 7 NaN
Sheng 9 NaN 10 NaN In [27]: seri_3 = dframe_1['d'] In [28]: seri_3 #注意!Serie_3索引并不与dframe_1的相同,与上面的运算形式不同
Out[28]:
Chen 1
Bei 4
Shang 7
Sheng 10
Name: d, dtype: int32 In [29]: dframe_1 - seri_3
Out[29]:
Bei Chen Shang Sheng b d e
Chen NaN NaN NaN NaN NaN NaN NaN
Bei NaN NaN NaN NaN NaN NaN NaN
Shang NaN NaN NaN NaN NaN NaN NaN
Sheng NaN NaN NaN NaN NaN NaN NaN
#注意dframe的columns已经变成了Series的index和其自己的columns相加了 #通过运算函数中的axis参数可改变匹配轴以避免上情况
#0为列匹配,1为行匹配
In [31]: dframe_1.sub(seri_3,axis=0)
Out[31]:
b d e
Chen -1 0 1
Bei -1 0 1
Shang -1 0 1
Sheng -1 0 1 In [33]: dframe_1.sub(seri_3,axis=1)
Out[33]:
Bei Chen Shang Sheng b d e
Chen NaN NaN NaN NaN NaN NaN NaN
Bei NaN NaN NaN NaN NaN NaN NaN
Shang NaN NaN NaN NaN NaN NaN NaN
Sheng NaN NaN NaN NaN NaN NaN NaN

DataFrame & Series运算

    注:axis按轴取可以看成  0:以index为index的Series【竖轴】, 1:以colum为index的Series【横轴】

五:使用函数

使用函数

 In [6]: dframe=pd.DataFrame(np.random.randn(4,3),columns=list('bde'),index=['Che
n','Bei','Shang','Sheng'])
In [7]: dframe
Out[7]:
b d e
Chen 1.838620 1.023421 0.641420
Bei 0.920563 -2.037778 -0.853871
Shang -0.587332 0.576442 0.596269
Sheng 0.366174 -0.689582 -1.064030 In [8]: np.abs(dframe) #绝对值函数
Out[8]:
b d e
Chen 1.838620 1.023421 0.641420
Bei 0.920563 2.037778 0.853871
Shang 0.587332 0.576442 0.596269
Sheng 0.366174 0.689582 1.064030 In [9]: func = lambda x: x.max() - x.min() In [10]: dframe.apply(func)
Out[10]:
b 2.425952
d 3.061200
e 1.705449
dtype: float64 In [11]: dframe.apply(func,axis=1)
Out[11]:
Chen 1.197200
Bei 2.958341
Shang 1.183602
Sheng 1.430204
dtype: float64 In [12]: dframe.max() #即dframe.max(axis=0)
Out[12]:
b 1.838620
d 1.023421
e 0.641420
dtype: float64 In [15]: dframe.max(axis=1)
Out[15]:
Chen 1.838620
Bei 0.920563
Shang 0.596269
Sheng 0.366174
dtype: float64

六:排序

  1)按索引排序:sort_index(【axis=0/1,ascending=False/True】)注,其中默认axis为0(index排序),ascending为True(升序)

 In [16]: seri = pd.Series(range(4),index=['d','a','d','c'])

 In [17]: seri
Out[17]:
d 0
a 1
d 2
c 3
dtype: int64 In [18]: seri.sort_index()
Out[18]:
a 1
c 3
d 2
d 0
dtype: int64

Series的索引排序

 In [22]: dframe
Out[22]:
c a b
Chen 1.838620 1.023421 0.641420
Bei 0.920563 -2.037778 -0.853871
Shang -0.587332 0.576442 0.596269
Sheng 0.366174 -0.689582 -1.064030 In [23]: dframe.sort_index()
Out[23]:
c a b
Bei 0.920563 -2.037778 -0.853871
Chen 1.838620 1.023421 0.641420
Shang -0.587332 0.576442 0.596269
Sheng 0.366174 -0.689582 -1.064030 In [24]: dframe.sort_index(axis=1)
Out[24]:
a b c
Chen 1.023421 0.641420 1.838620
Bei -2.037778 -0.853871 0.920563
Shang 0.576442 0.596269 -0.587332
Sheng -0.689582 -1.064030 0.366174

DataFrame的索引排序,用axis制定是按index(默认)还是columns进行排序(1)

  2)按值排序sort_values方法【注:order方法已不推荐使用了】

 In [32]: seri =pd.Series([4,7,np.nan,-1,2,np.nan])

 In [33]: seri
Out[33]:
0 4
1 7
2 NaN
3 -1
4 2
5 NaN
dtype: float64 In [34]: seri.sort_values()
Out[34]:
3 -1
4 2
0 4
1 7
2 NaN
5 NaN
dtype: float64 #NaN值会默认排到最后

Series的值排序

 In [38]: dframe = pd.DataFrame({'b':[4,7,-3,2],'a':[0,1,0,1]})

 In [39]: dframe
Out[39]:
a b
0 0 4
1 1 7
2 0 -3
3 1 2 In [54]: dframe.sort_values('a')
Out[54]:
a b
0 0 4
2 0 -3
1 1 7
3 1 2 In [55]: dframe.sort_values('b')
Out[55]:
a b
2 0 -3
3 1 2
0 0 4
1 1 7 In [57]: dframe.sort_values(['a','b'])
Out[57]:
a b
2 0 -3
0 0 4
3 1 2
1 1 7 In [58]: dframe.sort_values(['b','a'])
Out[58]:
a b
2 0 -3
3 1 2
0 0 4
1 1 7

DataFrame的值排序

七:排名

  rank方法

八:统计计算

  count:非NaN值  describe:对Series或DataFrame列计算汇总统计  min,max  argmin,argmax(整数值):最值得索引值  idmax,idmin:最值索引值

  sum  mean:平均数  var:样本方差  std:样本标准差  kurt:峰值  cumsum:累积和  cummin/cummax:累计最值  pct_change:百分数变化

 In [63]: df = pd.DataFrame([[1.4,np.nan],[7.1,-4.5],[np.nan,np.nan],[0.75,-1.3]]
,index=['a','b','c','d'],columns=['one','two']) In [64]: df
Out[64]:
one two
a 1.40 NaN
b 7.10 -4.5
c NaN NaN
d 0.75 -1.3 In [66]: df.sum()
Out[66]:
one 9.25
two -5.80
dtype: float64 In [67]: df.sum(axis=1)
Out[67]:
a 1.40
b 2.60
c NaN
d -0.55
dtype: float64 #求平均值,skipna:跳过NaN
In [68]: df.mean(axis=1,skipna=False)
Out[68]:
a NaN
b 1.300
c NaN
d -0.275
dtype: float64 In [70]: df.idxmax()
Out[70]:
one b
two d
dtype: object In [71]: df.cumsum()
Out[71]:
one two
a 1.40 NaN
b 8.50 -4.5
c NaN NaN
d 9.25 -5.8 In [72]: df.describe()
Out[72]:
one two
count 3.000000 2.000000
mean 3.083333 -2.900000
std 3.493685 2.262742
min 0.750000 -4.500000
25% 1.075000 -3.700000
50% 1.400000 -2.900000
75% 4.250000 -2.100000
max 7.100000 -1.300000

一些统计计算

九:唯一值,值计数,以及成员资格

  unique方法  value_counts:顶级方法  isin方法

 In [74]: seri = pd.Series(['c','a','d','a','a','b','b','c','c'])

 In [75]: seri
Out[75]:
0 c
1 a
2 d
3 a
4 a
5 b
6 b
7 c
8 c
dtype: object In [76]: seri.unique()
Out[76]: array(['c', 'a', 'd', 'b'], dtype=object) In [77]: seri.value_counts()
Out[77]:
c 3
a 3
b 2
d 1
dtype: int64 In [78]: pd.value_counts(seri.values,sort=False)
Out[78]:
a 3
c 3
b 2
d 1
dtype: int64 In [81]: seri.isin(['b','c'])
Out[81]:
0 True
1 False
2 False
3 False
4 False
5 True
6 True
7 True
8 True
dtype: bool

唯一值,值计数,成员资格

十:缺少数据处理

  一)删除NaN:dropna方法

    1)Series

      python中的None即是对应到的Numpy的NaN

 In [3]: seri = pd.Series(['aaa','bbb',np.nan,'ccc'])

 In [4]: seri[0]=None

 In [5]: seri
Out[5]:
0 None
1 bbb
2 NaN
3 ccc
dtype: object In [7]: seri.isnull()
Out[7]:
0 True
1 False
2 True
3 False
dtype: bool In [8]: seri.dropna() #返回非NaN值
Out[8]:
1 bbb
3 ccc
dtype: object In [9]: seri
Out[9]:
0 None
1 bbb
2 NaN
3 ccc
dtype: object In [10]: seri[seri.notnull()] #返回非空值
Out[10]:
1 bbb
3 ccc
dtype: object

Series数据处理

    2)DataFrame

      对于DataFrame事情稍微复杂,有时希望删除全NaN或者含有NaN的行或列。

 In [15]: df = pd.DataFrame([[1,6.5,3],[1,np.nan,np.nan],[np.nan,np.nan,np.nan],[
np.nan,6.5,3]]) In [16]: df
Out[16]:
0 1 2
0 1 6.5 3
1 1 NaN NaN
2 NaN NaN NaN
3 NaN 6.5 3 In [17]: df.dropna() #默认以行(axis=0),只要有NaN的就删除
Out[17]:
0 1 2
0 1 6.5 3 In [19]: df.dropna(how='all') #只删除全是NaN的行
Out[19]:
0 1 2
0 1 6.5 3
1 1 NaN NaN
3 NaN 6.5 3 In [21]: df.dropna(axis=1,how='all') #以列为标准来丢弃列
Out[21]:
0 1 2
0 1 6.5 3
1 1 NaN NaN
2 NaN NaN NaN
3 NaN 6.5 3 In [22]: df.dropna(axis=1)
Out[22]:
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3]

DataFrame的数据处理

  

  二)填充NaN:fillna方法    

 In [88]: df
Out[88]:
one two
a 1.40 NaN
b 7.10 -4.5
c NaN NaN
d 0.75 -1.3 In [90]: df.fillna(0)
Out[90]:
one two
a 1.40 0.0
b 7.10 -4.5
c 0.00 0.0
d 0.75 -1.3

填充NaN

十一:层次化索引

 In [30]: seri = pd.Series(np.random.randn(10),index=[['a','a','a','b','b','b','c
','c','d','d'],[1,2,3,1,2,3,1,2,2,3]])
In [31]: seri
Out[31]:
a 1 0.528387
2 -0.152286
3 -0.776540
b 1 0.025425
2 -1.412776
3 0.969498
c 1 0.478260
2 0.116301
d 2 1.464144
3 2.266069
dtype: float64 In [32]: seri['a']
Out[32]:
1 0.528387
2 -0.152286
3 -0.776540
dtype: float64 In [33]: seri.index
Out[33]:
MultiIndex(levels=[[u'a', u'b', u'c', u'd'], [1, 2, 3]],
labels=[[0, 0, 0, 1, 1, 1, 2, 2, 3, 3], [0, 1, 2, 0, 1, 2, 0, 1, 1, 2
]]) In [35]: seri['a':'c']
Out[35]:
a 1 0.528387
2 -0.152286
3 -0.776540
b 1 0.025425
2 -1.412776
3 0.969498
c 1 0.478260
2 0.116301
dtype: float64 In [45]: seri.unstack()
Out[45]:
1 2 3
a 0.528387 -0.152286 -0.776540
b 0.025425 -1.412776 0.969498
c 0.478260 0.116301 NaN
d NaN 1.464144 2.266069 In [46]: seri.unstack().stack()
Out[46]:
a 1 0.528387
2 -0.152286
3 -0.776540
b 1 0.025425
2 -1.412776
3 0.969498
c 1 0.478260
2 0.116301
d 2 1.464144
3 2.266069
dtype: float64

Series层次化索引,利用unstack方法可以转化为DataFrame型数据

 In [48]: df = pd.DataFrame(np.arange(12).reshape((4,3)),index=[['a','a','b','b']
,[1,2,1,2]],columns=[['Ohio','Ohio','Colorado'],['Green','Red','Green']]) In [49]: df
Out[49]:
Ohio Colorado
Green Red Green
a 1 0 1 2
2 3 4 5
b 1 6 7 8
2 9 10 11 In [50]: df.index
Out[50]:
MultiIndex(levels=[[u'a', u'b'], [1, 2]],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]]) In [51]: df.columns
Out[51]:
MultiIndex(levels=[[u'Colorado', u'Ohio'], [u'Green', u'Red']],
labels=[[1, 1, 0], [0, 1, 0]]) In [53]: df['Ohio']
Out[53]:
Green Red
a 1 0 1
2 3 4
b 1 6 7
2 9 10 In [57]: df.ix['a','Ohio']
Out[57]:
Green Red
1 0 1
2 3 4 In [61]: df.ix['a','Ohio'].ix[1,'Red']
Out[61]: 1

DataFrame层次化索引

 

3.1,pandas【基本功能】的更多相关文章

  1. pandas小记:pandas高级功能

    http://blog.csdn.net/pipisorry/article/details/53486777 pandas高级功能:面板数据.字符串方法.分类.可视化. 面板数据 {pandas数据 ...

  2. Pandas基本功能详解

    Pandas基本功能详解 Pandas  Pandas基本功能详解 |轻松玩转Pandas(2) 参考:Pandas基本功能详解 |轻松玩转Pandas(2)

  3. Pandas基本功能之reindex重新索引

    重新索引 reindex重置索引,如果索引值不存在,就引入缺失值 参数介绍 参数 说明 index 用作索引的新序列 method 插值 fill_vlaue 引入缺失值时的替代NaN limit 最 ...

  4. python使用easyinstall安装xlrd、xlwt、pandas等功能模块的方法

    在日常工作中,使用Python时经常要引入一些集成好的第三方功能模块,如读写excel的xlrd和xlwt模块,以及数据分析常用的pandas模块等. 原生的python并不含这些模块,在使用这些功能 ...

  5. Pandas基本功能

    到目前为止,我们了解了三种Pandas数据结构以及如何创建它们.接下来将主要关注数据帧(DataFrame)对象,因为它在实时数据处理中非常重要,并且还讨论其他数据结构. 系列基本功能 编号 属性或方 ...

  6. Pandas日期功能

    日期功能扩展了时间序列,在财务数据分析中起主要作用.在处理日期数据的同时,我们经常会遇到以下情况 - 生成日期序列 将日期序列转换为不同的频率 创建一个日期范围 通过指定周期和频率,使用date.ra ...

  7. Pandas常用功能

    在使用Pandas之前,需要导入pandas库 import pandas  as pd #pd作为pandas的别名 常用功能如下: 代码 功能1 .DataFrame()   创建一个DataFr ...

  8. Pandas常用功能总结

    1.读取.csv文件 df2 = pd.read_csv('beijingsale.csv', encoding='gb2312',index_col='id',sep='\t',header=Non ...

  9. Pandas基本功能之层次化索引及层次化汇总

    层次化索引 层次化也就是在一个轴上拥有多个索引级别 Series的层次化索引 data=Series(np.random.randn(10),index=[ ['a','a','a','b','b', ...

  10. Pandas基本功能之算术运算、排序和排名

    算术运算和数据对齐 Series和DataFrame中行运算和列运算有种特征叫做广播 在将对象相加时,如果存在不同的索引对,则结果的索引就是该索引对的并集.自动的数据对齐操作在不重叠的索引处引入了NA ...

随机推荐

  1. 过滤器(filter)实现

    花了2天时间,实现了过滤器功能,针对数据进行筛选,包含以下7个过滤器: 'date','currency','number','tolowercase','touppercase','orderBy' ...

  2. flowplayer+flashhls使用过程中发现的一些小问题

    flashls里边有好几套代码,主要看生成路径,其中flowplayer用了flashls.swc,flashls.swc使用的代码在这里:/src/org/mangui/hls,所以要注意,当搜索代 ...

  3. C语言初学 简单计算器计算加减乘除程序

    #include<stdio.h> main() { float a,b; char c; printf("输入表达式如a+(* -  /)b:\n"); scanf( ...

  4. 2017-1-9css

    2017-1-9css css border-image详解 http://www.360doc.com/content/14/1016/13/2792772_417403574.shtml 最简单的 ...

  5. angular controller js 压缩后报错解决方案

    简单介绍下ng-annotate这个项目,这个项目正好提供了gulp的插件. gulp配置文件: var gulp = require('gulp'); var ngAnnotate = requir ...

  6. JS--图片轮播效果

    搞了很长时间才弄清楚图片轮播效果的原理,理解各个事件发生的原因,浪费了这么长的时间,只怪自己的知识太过于薄弱.现将代码写下,供大家参看,如有不妥之处还望指出,大家一起学习. 功能: 1.点击左右两边的 ...

  7. pm2 安装使用

    pm2 是全新开发的进程守护服务, 同时集成了负载均衡功能. 以及开机启动, 自动重启有问题进程. 还可以查看各服务进程状态. 使用方法参照:https://github.com/Unitech/pm ...

  8. 用存储过程生成订单号ID

    DECLARE @sonumber BIGINTSELECT @sonumber=CONVERT(BIGINT, @serverId + Substring(CONVERT(VARCHAR(4), D ...

  9. C# and android

    http://www.cnblogs.com/GoodHelper/archive/2011/07/08/android_socket_chart.html http://blog.csdn.net/ ...

  10. PCB设计备忘录

    在PCB设计过程中,常常有很多细节只有在实践中才能体会到其重要性,本人记性不好,索性把相关的注意点记录下来,也顺便希望能够给读者朋友们一些帮助. 接插件以及连接器比较常用的针脚之间间距有2.54mm/ ...