3.1,pandas【基本功能】
一:改变索引
reindex方法对于Series直接索引,对于DataFrame既可以改变行索引,也可以改变列索引,还可以两个一起改变.
1)对于Series
In [2]: seri = pd.Series([4.5,7.2,-5.3,3.6],index = ['d','b','a','c']) In [3]: seri
Out[3]:
d 4.5
b 7.2
a -5.3
c 3.6
dtype: float64 In [4]: seri1 = seri.reindex(['a','b','c','d','e']) In [5]: seri1
Out[5]:
a -5.3
b 7.2
c 3.6
d 4.5
e NaN #没有的即为NaN
dtype: float64 In [6]: seri.reindex(['a','b','c','d','e'], fill_value=0)
Out[6]:
a -5.3
b 7.2
c 3.6
d 4.5
e 0.0 #没有的填充为0
dtype: float64 In [7]: seri
Out[7]:
d 4.5
b 7.2
a -5.3
c 3.6
dtype: float64 In [8]: seri_2 = pd.Series(['blue','purple','yellow'], index=[0,2,4]) In [9]: seri_2
Out[9]:
0 blue
2 purple
4 yellow
dtype: object #reindex可用的方法:ffill为向前填充,bfill为向后填充 In [10]: seri_2.reindex(range(6),method='ffill')
Out[10]:
0 blue
1 blue
2 purple
3 purple
4 yellow
5 yellow
dtype: object In [11]: seri_2.reindex(range(6),method='bfill')
Out[11]:
0 blue
1 purple
2 purple
3 yellow
4 yellow
5 NaN
dtype: object
Series的改变索引
2)对于DataFrame
其reindex的函数参数:method="ffill/bfill";fill_value=...[若为NaN时的填充值];......
In [4]: dframe_1 = pd.DataFrame(np.arange(9).reshape((3,3)),index=['a','b','c'],
columns=['Ohio','Texas','Cal'])
In [5]: dframe_1
Out[5]:
Ohio Texas Cal
a 0 1 2
b 3 4 5
c 6 7 8 In [6]: dframe_2 = dframe_1.reindex(['a','b','c','d']) In [7]: dframe_2
Out[7]:
Ohio Texas Cal
a 0 1 2
b 3 4 5
c 6 7 8
d NaN NaN NaN In [16]: dframe_1.reindex(index=['a','b','c','d'],method='ffill',columns=['Ohio'
,'Beijin','Cal'])
Out[16]:
Ohio Beijin Cal
a 0 NaN 2
b 3 NaN 5
c 6 NaN 8
d 6 NaN 8 In [17]: dframe_1.reindex(index=['a','b','c','d'],fill_value='Z',columns=['Ohio'
Out[17]: ,'Cal'])
Ohio Beijin Cal
a 0 Z 2
b 3 Z 5
c 6 Z 8
d Z Z Z In [8]: dframe_1.reindex(columns=['Chengdu','Beijin','Shanghai','Guangdong'])
Out[8]:
Chengdu Beijin Shanghai Guangdong
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN In [9]: dframe_1
Out[9]:
Ohio Texas Cal
a 0 1 2
b 3 4 5
c 6 7 8 #用ix关键字同时改变行/列索引
In [10]: dframe_1.ix[['a','b','c','d'],['Ohio','Beijing','Guangdong']]
Out[10]:
Ohio Beijing Guangdong
a 0 NaN NaN
b 3 NaN NaN
c 6 NaN NaN
d NaN NaN NaN
DataFrame的改变索引
二:丢弃指定轴的数据
drop方法, 通过索引删除
1)对于Series
In [21]: seri = pd.Series(np.arange(5),index=['a','b','c','d','e']) In [22]: seri
Out[22]:
a 0
b 1
c 2
d 3
e 4
dtype: int32 In [23]: seri.drop('b')
Out[23]:
a 0
c 2
d 3
e 4
dtype: int32 In [24]: seri.drop(['d','e'])
Out[24]:
a 0
b 1
c 2
dtype: int32
Series的删除数据
2)对于DataFrame
In [29]: dframe = pd.DataFrame(np.arange(16).reshape((4,4)),index=['Chen','Bei',
'Shang','Guang'],columns=['one','two','three','four']) In [30]: dframe
Out[30]:
one two three four
Chen 0 1 2 3
Bei 4 5 6 7
Shang 8 9 10 11
Guang 12 13 14 15 #删除行
In [31]: dframe.drop(['Bei','Shang'])
Out[31]:
one two three four
Chen 0 1 2 3
Guang 12 13 14 15 #删除列
In [33]: dframe.drop(['two','three'],axis=1)
Out[33]:
one four
Chen 0 3
Bei 4 7
Shang 8 11
Guang 12 15 #若第一个参数只有一个时可以不要【】
DataFrame的删除数据
三:索引,选取,过滤
1)Series
仍然可以向list那些那样用下标访问,不过我觉得不太还,最好还是选择用索引值来进行访问,并且索引值也可用于切片
In [4]: seri = pd.Series(np.arange(4),index=['a','b','c','d']) In [5]: seri
Out[5]:
a 0
b 1
c 2
d 3
dtype: int32 In [6]: seri['a']
Out[6]: 0 In [7]: seri[['b','a']] #显示顺序也变了
Out[7]:
b 1
a 0
dtype: int32 In [18]: seri[seri<2] #!!元素级别运算!!
Out[18]:
a 0
b 1
dtype: int32 In [11]: seri['a':'c'] #索引用于切片
Out[11]:
a 0
b 1
c 2
dtype: int32 In [12]: seri['a':'c']='z' In [13]: seri
Out[13]:
a z
b z
c z
d 3
dtype: object
Series选取
2)DataFrame
其实就是获取一个或多个列的问题。需要注意的是,其实DataFrame可以看作多列索引相同的Series组成的,对应DataFrame数据来说,其首行横向的字段才应该看作是他的索引,所以通过dframe【【n个索引值】】可以选出多列Series,而其中的索引值必须是首行横向的字段,否者报错。而想要取列的话可以通过切片完成,如dframe[:2]选出第0和1行。通过ix【参数1(x),参数2(y)】可以在两个方向上进行选取。
In [19]: dframe = pd.DataFrame(np.arange(16).reshape((4,4)),index=['one','two','
three','four'],columns=['Bei','Shang','Guang','Sheng']) In [21]: dframe
Out[21]:
Bei Shang Guang Sheng
one 0 1 2 3
two 4 5 6 7
three 8 9 10 11
four 12 13 14 15 In [22]: dframe[['one']] #即是开头讲的索引值用的不正确而报错
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-22-c2522043b676> in <module>()
----> 1 dframe[['one']] In [25]: dframe[['Bei']]
Out[25]:
Bei
one 0
two 4
three 8
four 12 In [26]: dframe[['Bei','Sheng']]
Out[26]:
Bei Sheng
one 0 3
two 4 7
three 8 11
four 12 15 In [27]: dframe[:2] #取行
Out[27]:
Bei Shang Guang Sheng
one 0 1 2 3
two 4 5 6 7 In [32]: #为了在DataFrame中引入标签索引,用ix字段,其第一个参数是对行的控制,第二个为对列的控制 In [33]: dframe.ix[['one','two'],['Bei','Shang']]
Out[33]:
Bei Shang
one 0 1
two 4 5 #有此可看出横向的每个字段为dframe实例的属性
In [35]: dframe.Bei
Out[35]:
one 0
two 4
three 8
four 12
Name: Bei, dtype: int32 In [36]: dframe[dframe.Bei<5]
Out[36]:
Bei Shang Guang Sheng
one 0 1 2 3
two 4 5 6 7 In [38]: dframe.ix[dframe.Bei<5,:2]
Out[38]:
Bei Shang
one 0 1
two 4 5 In [43]: dframe.ix[:'two',['Shang','Bei']]
Out[43]:
Shang Bei
one 1 0
two 5 4
DataFrame选取
四:算术运算
1)Series
在运算时会自动按索引对齐后再运算,且在索引值不重叠时产生的运算结果是NaN值, 用运算函数时可以避免此情况。
In [4]: seri_1 = pd.Series([1,2,3,4],index = ['a','b','c','d']) In [5]: seri_2 = pd.Series([5,6,7,8,9],index = ['a','c','e','g','f']) In [6]: seri_1 + seri_2
Out[6]:
a 6
b NaN
c 9
d NaN
e NaN
f NaN
g NaN
dtype: float64 In [8]: seri_1.add(seri_2)
Out[8]:
a 6
b NaN
c 9
d NaN
e NaN
f NaN
g NaN
dtype: float64 In [7]: seri_1.add(seri_2,fill_value = 0)
Out[7]:
a 6
b 2
c 9
d 4
e 7
f 9
g 8
dtype: float64 #上面的未重叠区依然有显示值而不是NaN!!
#对应的方法是:add:+; mul: X; sub: -; div : /
Series算术运算
2)DataFrame
In [10]: df_1 = pd.DataFrame(np.arange(12).reshape((3,4)),columns = list('abcd')
)
In [11]: df_2 = pd.DataFrame(np.arange(20).reshape((4,5)),columns = list('abcde'
))
In [12]: df_1 + df_2
Out[12]:
a b c d e
0 0 2 4 6 NaN
1 9 11 13 15 NaN
2 18 20 22 24 NaN
3 NaN NaN NaN NaN NaN In [13]: df_1.add(df_2)
Out[13]:
a b c d e
0 0 2 4 6 NaN
1 9 11 13 15 NaN
2 18 20 22 24 NaN
3 NaN NaN NaN NaN NaN In [14]: df_1.add(df_2, fill_value = 0)
Out[14]:
a b c d e
0 0 2 4 6 4
1 9 11 13 15 9
2 18 20 22 24 14
3 15 16 17 18 19
DataFrame算术运算
3)DataFrame与Series之间进行运算
类似:np.array
In [15]: arr_1 = np.arange(12).reshape((3,4)) In [16]: arr_1 - arr_1[0]
Out[16]:
array([[0, 0, 0, 0],
[4, 4, 4, 4],
[8, 8, 8, 8]]) In [17]: arr_1
Out[17]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
array型
In [18]: dframe_1 = pd.DataFrame(np.arange(12).reshape((4,3)),columns=list('bde'
),index = ['Chen','Bei','Shang','Sheng'])
In [19]: dframe_1
Out[19]:
b d e
Chen 0 1 2
Bei 3 4 5
Shang 6 7 8
Sheng 9 10 11 In [20]: seri = dframe_1.ix[0] In [21]: seri
Out[21]:
b 0
d 1
e 2
Name: Chen, dtype: int32 In [22]: dframe_1 - seri #每行匹配的进行运算
Out[22]:
b d e
Chen 0 0 0
Bei 3 3 3
Shang 6 6 6
Sheng 9 9 9 In [23]: seri_2 = pd.Series(range(3),index=['b','e','f']) In [24]: dframe_1 - seri_2
Out[24]:
b d e f
Chen 0 NaN 1 NaN
Bei 3 NaN 4 NaN
Shang 6 NaN 7 NaN
Sheng 9 NaN 10 NaN In [27]: seri_3 = dframe_1['d'] In [28]: seri_3 #注意!Serie_3索引并不与dframe_1的相同,与上面的运算形式不同
Out[28]:
Chen 1
Bei 4
Shang 7
Sheng 10
Name: d, dtype: int32 In [29]: dframe_1 - seri_3
Out[29]:
Bei Chen Shang Sheng b d e
Chen NaN NaN NaN NaN NaN NaN NaN
Bei NaN NaN NaN NaN NaN NaN NaN
Shang NaN NaN NaN NaN NaN NaN NaN
Sheng NaN NaN NaN NaN NaN NaN NaN
#注意dframe的columns已经变成了Series的index和其自己的columns相加了 #通过运算函数中的axis参数可改变匹配轴以避免上情况
#0为列匹配,1为行匹配
In [31]: dframe_1.sub(seri_3,axis=0)
Out[31]:
b d e
Chen -1 0 1
Bei -1 0 1
Shang -1 0 1
Sheng -1 0 1 In [33]: dframe_1.sub(seri_3,axis=1)
Out[33]:
Bei Chen Shang Sheng b d e
Chen NaN NaN NaN NaN NaN NaN NaN
Bei NaN NaN NaN NaN NaN NaN NaN
Shang NaN NaN NaN NaN NaN NaN NaN
Sheng NaN NaN NaN NaN NaN NaN NaN
DataFrame & Series运算
注:axis按轴取可以看成 0:以index为index的Series【竖轴】, 1:以colum为index的Series【横轴】
五:使用函数
In [6]: dframe=pd.DataFrame(np.random.randn(4,3),columns=list('bde'),index=['Che
n','Bei','Shang','Sheng'])
In [7]: dframe
Out[7]:
b d e
Chen 1.838620 1.023421 0.641420
Bei 0.920563 -2.037778 -0.853871
Shang -0.587332 0.576442 0.596269
Sheng 0.366174 -0.689582 -1.064030 In [8]: np.abs(dframe) #绝对值函数
Out[8]:
b d e
Chen 1.838620 1.023421 0.641420
Bei 0.920563 2.037778 0.853871
Shang 0.587332 0.576442 0.596269
Sheng 0.366174 0.689582 1.064030 In [9]: func = lambda x: x.max() - x.min() In [10]: dframe.apply(func)
Out[10]:
b 2.425952
d 3.061200
e 1.705449
dtype: float64 In [11]: dframe.apply(func,axis=1)
Out[11]:
Chen 1.197200
Bei 2.958341
Shang 1.183602
Sheng 1.430204
dtype: float64 In [12]: dframe.max() #即dframe.max(axis=0)
Out[12]:
b 1.838620
d 1.023421
e 0.641420
dtype: float64 In [15]: dframe.max(axis=1)
Out[15]:
Chen 1.838620
Bei 0.920563
Shang 0.596269
Sheng 0.366174
dtype: float64
六:排序
1)按索引排序:sort_index(【axis=0/1,ascending=False/True】)注,其中默认axis为0(index排序),ascending为True(升序)
In [16]: seri = pd.Series(range(4),index=['d','a','d','c']) In [17]: seri
Out[17]:
d 0
a 1
d 2
c 3
dtype: int64 In [18]: seri.sort_index()
Out[18]:
a 1
c 3
d 2
d 0
dtype: int64
Series的索引排序
In [22]: dframe
Out[22]:
c a b
Chen 1.838620 1.023421 0.641420
Bei 0.920563 -2.037778 -0.853871
Shang -0.587332 0.576442 0.596269
Sheng 0.366174 -0.689582 -1.064030 In [23]: dframe.sort_index()
Out[23]:
c a b
Bei 0.920563 -2.037778 -0.853871
Chen 1.838620 1.023421 0.641420
Shang -0.587332 0.576442 0.596269
Sheng 0.366174 -0.689582 -1.064030 In [24]: dframe.sort_index(axis=1)
Out[24]:
a b c
Chen 1.023421 0.641420 1.838620
Bei -2.037778 -0.853871 0.920563
Shang 0.576442 0.596269 -0.587332
Sheng -0.689582 -1.064030 0.366174
DataFrame的索引排序,用axis制定是按index(默认)还是columns进行排序(1)
2)按值排序sort_values方法【注:order方法已不推荐使用了】
In [32]: seri =pd.Series([4,7,np.nan,-1,2,np.nan]) In [33]: seri
Out[33]:
0 4
1 7
2 NaN
3 -1
4 2
5 NaN
dtype: float64 In [34]: seri.sort_values()
Out[34]:
3 -1
4 2
0 4
1 7
2 NaN
5 NaN
dtype: float64 #NaN值会默认排到最后
Series的值排序
In [38]: dframe = pd.DataFrame({'b':[4,7,-3,2],'a':[0,1,0,1]}) In [39]: dframe
Out[39]:
a b
0 0 4
1 1 7
2 0 -3
3 1 2 In [54]: dframe.sort_values('a')
Out[54]:
a b
0 0 4
2 0 -3
1 1 7
3 1 2 In [55]: dframe.sort_values('b')
Out[55]:
a b
2 0 -3
3 1 2
0 0 4
1 1 7 In [57]: dframe.sort_values(['a','b'])
Out[57]:
a b
2 0 -3
0 0 4
3 1 2
1 1 7 In [58]: dframe.sort_values(['b','a'])
Out[58]:
a b
2 0 -3
3 1 2
0 0 4
1 1 7
DataFrame的值排序
七:排名
rank方法
八:统计计算
count:非NaN值 describe:对Series或DataFrame列计算汇总统计 min,max argmin,argmax(整数值):最值得索引值 idmax,idmin:最值索引值
sum mean:平均数 var:样本方差 std:样本标准差 kurt:峰值 cumsum:累积和 cummin/cummax:累计最值 pct_change:百分数变化
In [63]: df = pd.DataFrame([[1.4,np.nan],[7.1,-4.5],[np.nan,np.nan],[0.75,-1.3]]
,index=['a','b','c','d'],columns=['one','two']) In [64]: df
Out[64]:
one two
a 1.40 NaN
b 7.10 -4.5
c NaN NaN
d 0.75 -1.3 In [66]: df.sum()
Out[66]:
one 9.25
two -5.80
dtype: float64 In [67]: df.sum(axis=1)
Out[67]:
a 1.40
b 2.60
c NaN
d -0.55
dtype: float64 #求平均值,skipna:跳过NaN
In [68]: df.mean(axis=1,skipna=False)
Out[68]:
a NaN
b 1.300
c NaN
d -0.275
dtype: float64 In [70]: df.idxmax()
Out[70]:
one b
two d
dtype: object In [71]: df.cumsum()
Out[71]:
one two
a 1.40 NaN
b 8.50 -4.5
c NaN NaN
d 9.25 -5.8 In [72]: df.describe()
Out[72]:
one two
count 3.000000 2.000000
mean 3.083333 -2.900000
std 3.493685 2.262742
min 0.750000 -4.500000
25% 1.075000 -3.700000
50% 1.400000 -2.900000
75% 4.250000 -2.100000
max 7.100000 -1.300000
一些统计计算
九:唯一值,值计数,以及成员资格
unique方法 value_counts:顶级方法 isin方法
In [74]: seri = pd.Series(['c','a','d','a','a','b','b','c','c']) In [75]: seri
Out[75]:
0 c
1 a
2 d
3 a
4 a
5 b
6 b
7 c
8 c
dtype: object In [76]: seri.unique()
Out[76]: array(['c', 'a', 'd', 'b'], dtype=object) In [77]: seri.value_counts()
Out[77]:
c 3
a 3
b 2
d 1
dtype: int64 In [78]: pd.value_counts(seri.values,sort=False)
Out[78]:
a 3
c 3
b 2
d 1
dtype: int64 In [81]: seri.isin(['b','c'])
Out[81]:
0 True
1 False
2 False
3 False
4 False
5 True
6 True
7 True
8 True
dtype: bool
唯一值,值计数,成员资格
十:缺少数据处理
一)删除NaN:dropna方法
1)Series
python中的None即是对应到的Numpy的NaN
In [3]: seri = pd.Series(['aaa','bbb',np.nan,'ccc']) In [4]: seri[0]=None In [5]: seri
Out[5]:
0 None
1 bbb
2 NaN
3 ccc
dtype: object In [7]: seri.isnull()
Out[7]:
0 True
1 False
2 True
3 False
dtype: bool In [8]: seri.dropna() #返回非NaN值
Out[8]:
1 bbb
3 ccc
dtype: object In [9]: seri
Out[9]:
0 None
1 bbb
2 NaN
3 ccc
dtype: object In [10]: seri[seri.notnull()] #返回非空值
Out[10]:
1 bbb
3 ccc
dtype: object
Series数据处理
2)DataFrame
对于DataFrame事情稍微复杂,有时希望删除全NaN或者含有NaN的行或列。
In [15]: df = pd.DataFrame([[1,6.5,3],[1,np.nan,np.nan],[np.nan,np.nan,np.nan],[
np.nan,6.5,3]]) In [16]: df
Out[16]:
0 1 2
0 1 6.5 3
1 1 NaN NaN
2 NaN NaN NaN
3 NaN 6.5 3 In [17]: df.dropna() #默认以行(axis=0),只要有NaN的就删除
Out[17]:
0 1 2
0 1 6.5 3 In [19]: df.dropna(how='all') #只删除全是NaN的行
Out[19]:
0 1 2
0 1 6.5 3
1 1 NaN NaN
3 NaN 6.5 3 In [21]: df.dropna(axis=1,how='all') #以列为标准来丢弃列
Out[21]:
0 1 2
0 1 6.5 3
1 1 NaN NaN
2 NaN NaN NaN
3 NaN 6.5 3 In [22]: df.dropna(axis=1)
Out[22]:
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3]
DataFrame的数据处理
二)填充NaN:fillna方法
In [88]: df
Out[88]:
one two
a 1.40 NaN
b 7.10 -4.5
c NaN NaN
d 0.75 -1.3 In [90]: df.fillna(0)
Out[90]:
one two
a 1.40 0.0
b 7.10 -4.5
c 0.00 0.0
d 0.75 -1.3
填充NaN
十一:层次化索引
In [30]: seri = pd.Series(np.random.randn(10),index=[['a','a','a','b','b','b','c
','c','d','d'],[1,2,3,1,2,3,1,2,2,3]])
In [31]: seri
Out[31]:
a 1 0.528387
2 -0.152286
3 -0.776540
b 1 0.025425
2 -1.412776
3 0.969498
c 1 0.478260
2 0.116301
d 2 1.464144
3 2.266069
dtype: float64 In [32]: seri['a']
Out[32]:
1 0.528387
2 -0.152286
3 -0.776540
dtype: float64 In [33]: seri.index
Out[33]:
MultiIndex(levels=[[u'a', u'b', u'c', u'd'], [1, 2, 3]],
labels=[[0, 0, 0, 1, 1, 1, 2, 2, 3, 3], [0, 1, 2, 0, 1, 2, 0, 1, 1, 2
]]) In [35]: seri['a':'c']
Out[35]:
a 1 0.528387
2 -0.152286
3 -0.776540
b 1 0.025425
2 -1.412776
3 0.969498
c 1 0.478260
2 0.116301
dtype: float64 In [45]: seri.unstack()
Out[45]:
1 2 3
a 0.528387 -0.152286 -0.776540
b 0.025425 -1.412776 0.969498
c 0.478260 0.116301 NaN
d NaN 1.464144 2.266069 In [46]: seri.unstack().stack()
Out[46]:
a 1 0.528387
2 -0.152286
3 -0.776540
b 1 0.025425
2 -1.412776
3 0.969498
c 1 0.478260
2 0.116301
d 2 1.464144
3 2.266069
dtype: float64
Series层次化索引,利用unstack方法可以转化为DataFrame型数据
In [48]: df = pd.DataFrame(np.arange(12).reshape((4,3)),index=[['a','a','b','b']
,[1,2,1,2]],columns=[['Ohio','Ohio','Colorado'],['Green','Red','Green']]) In [49]: df
Out[49]:
Ohio Colorado
Green Red Green
a 1 0 1 2
2 3 4 5
b 1 6 7 8
2 9 10 11 In [50]: df.index
Out[50]:
MultiIndex(levels=[[u'a', u'b'], [1, 2]],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]]) In [51]: df.columns
Out[51]:
MultiIndex(levels=[[u'Colorado', u'Ohio'], [u'Green', u'Red']],
labels=[[1, 1, 0], [0, 1, 0]]) In [53]: df['Ohio']
Out[53]:
Green Red
a 1 0 1
2 3 4
b 1 6 7
2 9 10 In [57]: df.ix['a','Ohio']
Out[57]:
Green Red
1 0 1
2 3 4 In [61]: df.ix['a','Ohio'].ix[1,'Red']
Out[61]: 1
DataFrame层次化索引
3.1,pandas【基本功能】的更多相关文章
- pandas小记:pandas高级功能
http://blog.csdn.net/pipisorry/article/details/53486777 pandas高级功能:面板数据.字符串方法.分类.可视化. 面板数据 {pandas数据 ...
- Pandas基本功能详解
Pandas基本功能详解 Pandas Pandas基本功能详解 |轻松玩转Pandas(2) 参考:Pandas基本功能详解 |轻松玩转Pandas(2)
- Pandas基本功能之reindex重新索引
重新索引 reindex重置索引,如果索引值不存在,就引入缺失值 参数介绍 参数 说明 index 用作索引的新序列 method 插值 fill_vlaue 引入缺失值时的替代NaN limit 最 ...
- python使用easyinstall安装xlrd、xlwt、pandas等功能模块的方法
在日常工作中,使用Python时经常要引入一些集成好的第三方功能模块,如读写excel的xlrd和xlwt模块,以及数据分析常用的pandas模块等. 原生的python并不含这些模块,在使用这些功能 ...
- Pandas基本功能
到目前为止,我们了解了三种Pandas数据结构以及如何创建它们.接下来将主要关注数据帧(DataFrame)对象,因为它在实时数据处理中非常重要,并且还讨论其他数据结构. 系列基本功能 编号 属性或方 ...
- Pandas日期功能
日期功能扩展了时间序列,在财务数据分析中起主要作用.在处理日期数据的同时,我们经常会遇到以下情况 - 生成日期序列 将日期序列转换为不同的频率 创建一个日期范围 通过指定周期和频率,使用date.ra ...
- Pandas常用功能
在使用Pandas之前,需要导入pandas库 import pandas as pd #pd作为pandas的别名 常用功能如下: 代码 功能1 .DataFrame() 创建一个DataFr ...
- Pandas常用功能总结
1.读取.csv文件 df2 = pd.read_csv('beijingsale.csv', encoding='gb2312',index_col='id',sep='\t',header=Non ...
- Pandas基本功能之层次化索引及层次化汇总
层次化索引 层次化也就是在一个轴上拥有多个索引级别 Series的层次化索引 data=Series(np.random.randn(10),index=[ ['a','a','a','b','b', ...
- Pandas基本功能之算术运算、排序和排名
算术运算和数据对齐 Series和DataFrame中行运算和列运算有种特征叫做广播 在将对象相加时,如果存在不同的索引对,则结果的索引就是该索引对的并集.自动的数据对齐操作在不重叠的索引处引入了NA ...
随机推荐
- JS+CSS+HTML简单计算器
<!doctype html> <html> <head> <title>计算器</title> <meta charset=&quo ...
- ECSTORE 关于FILTER条件所代表的含义
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 //以下为ecstore filter条件所代表的含义 $FilterArray= array( ' ...
- 微软office MIME类型
后缀 MIME 類型 .docx application/vnd.openxmlformats-officedocument.wordprocessingml.document .docm app ...
- HAPROXY 配置项/配置实例
HAPROXY 配置项/实例 常用配置选项: OPTION 选项: option httpclose :HAProxy会针对客户端的第一条请求的返回添加cookie并返回给客户端,客户端发送后续请求时 ...
- PHP浮点数的一个常见问题的解答
作者: Laruence 本文地址: http://www.laruence.com/2013/03/26/2884.html 转载请注明出处 关于PHP的浮点数, 我之前写过一篇文章: 关于PHP浮 ...
- 机器人操作系统ROS | 简介篇
同样,从个人微信公众号Nao(ID:qRobotics)搬运. 前言 先放一个ROS Industrial一周年剪辑视频. ROS已经发布八周年了,在国外科研机构中非常受欢迎.目前,以美国西南研究院为 ...
- 10 001st prime number
这真是一个耗CPU的运算,怪不得现在因式分解和素数查找现在都用于加密运算. By listing the first six prime numbers: 2, 3, 5, 7, 11, and 13 ...
- C#中判断字符串是否中文的方法
public bool IsChinaString(string CString) { bool BoolValue = false; ; i < CString.Length; i++) { ...
- SRM 583 DIV1
A 裸最短路. class TravelOnMars { public: int minTimes(vector <int>, int, int); }; vector<int> ...
- Boyang Tex上海帛扬时装面料有限公司
Boyang Tex 上海帛扬时装面料有限公司是一家从事开发.推广销售高级时装面料的专业公司.各国高级男女时尚面料荟萃,引领时尚潮流,为国内外众多知名服饰品牌提供最 新颖时尚的高档时装面料,产品为众多 ...