Python 数据科学系列 の Numpy、Series 和 DataFrame介绍
本課主題
- Numpy 的介绍和操作实战
- Series 的介绍和操作实战
- DataFrame 的介绍和操作实战
Numpy 的介绍和操作实战
numpy 是 Python 在数据计算领域里很常用的模块
import numpy as np
np.array([11,22,33]) #接受一个列表数据
- 创建 numpy array
>>> import numpy as np
>>> mylist = [1,2,3]
>>> x = np.array(mylist)
>>> x
array([1, 2, 3])
>>> y = np.array([4,5,6])
>>> y
array([4, 5, 6])
>>> m = np.array([[7,8,9],[10,11,12]])
>>> m
array([[ 7, 8, 9],
[10, 11, 12]])创建 numpy array(例子)
- 查看 numpy array 的
>>> m.shape #array([1, 2, 3])
(2, 3) >>> x.shape #array([4, 5, 6])
(3,) >>> y.shape #array([[ 7, 8, 9], [10, 11, 12]])
(3,) - numpy.arrange
>>> n = np.arange(0,30,2)
>>> n
array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28])numpy.arrange( )(例子)
- 改变numpy array的位置
>>> n = np.arange(0,30,2)
>>> n
array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28])
>>> n.shape
(15,) >>> n = n.reshape(3,5) #从15列改成3列5行
>>> n array([[ 0, 2, 4, 6, 8],
[10, 12, 14, 16, 18],
[20, 22, 24, 26, 28]])numpy.reshape( )(例子一)
>>> o = np.linspace(0,4,9)
>>> o
array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. ])
>>> o.resize(3,3)
>>> o
array([[ 0. , 0.5, 1. ],
[ 1.5, 2. , 2.5],
[ 3. , 3.5, 4. ]])numpy.reshape( )(例子二)
- numpy.ones( ) ,numpy.zeros( ),numpy.eye( )
>>> r1 = np.ones((3,2))
>>> r1
array([[ 1., 1.],
[ 1., 1.],
[ 1., 1.]]) >>> r1 = np.zeros((2,3))
>>> r1
array([[ 0., 0., 0.],
[ 0., 0., 0.]]) >>> r2 = np.eye(3)
>>> r2
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])numpy.ones/zeros/eye( )(例子)
可以定义整数
>>> r5 = np.ones([2,3], int)
>>> r5
array([[1, 1, 1],
[1, 1, 1]]) >>> r5 = np.ones([2,3])
>>> r5
array([[ 1., 1., 1.],
[ 1., 1., 1.]])numpy.ones(x,int)(例子)
- numpy.diag( )
>>> y = np.array([4,5,6])
>>> y
array([4, 5, 6]) >>> np.diag(y)
array([[4, 0, 0],
[0, 5, 0],
[0, 0, 6]])diag( )(例子)
- 复制 numpy array
>>> r3 = np.array([1,2,3] * 3)
>>> r3
array([1, 2, 3, 1, 2, 3, 1, 2, 3]) >>> r4 = np.repeat([1,2,3],3)
>>> r4
array([1, 1, 1, 2, 2, 2, 3, 3, 3])复制numpy array(例子)
- numpy中的 vstack和 hstack
>>> r5 = np.ones([2,3], int)
>>> r5
array([[1, 1, 1],
[1, 1, 1]]) >>> r6 = np.vstack([r5,2*r5])
>>> r6
array([[1, 1, 1],
[1, 1, 1],
[2, 2, 2],
[2, 2, 2]]) >>> r7 = np.hstack([r5,2*r5])
>>> r7
array([[1, 1, 1, 2, 2, 2],
[1, 1, 1, 2, 2, 2]])numpy.vstack( )和np.hstack( )(例子)
- numpy 中的加减乘除操作一 (+-*/)
>>> mylist = [1,2,3]
>>> x = np.array(mylist)
>>> y = np.array([4,5,6]) >>> x+y
array([5, 7, 9]) >>> x-y
array([-3, -3, -3]) >>> x*y
array([ 4, 10, 18]) >>> x**2
array([1, 4, 9]) >>> x.dot(y)
32numpy中的加减乘除(例子一)
- numpy 中的加减乘除操作二:sum( )、max( )、min( )、mean( )、std( )
>>> a = np.array([1,2,3,4,5])
>>> a.sum()
15 >>> a.max()
5 >>> a.min()
1 >>> a.mean()
3.0 >>> a.std()
1.4142135623730951 >>> a.argmax()
4 >>> a.argmin()
0numpy中的加减乘除(例子二)
- 查看numpy array 的数据类型
>>> y = np.array([4,5,6])
>>> z = np.array([y, y**2])
>>> z
array([[ 4, 5, 6],
[16, 25, 36]]) >>> z.shape
(2, 3) >>> z.T.shape
(3, 2) >>> z.dtype
dtype('int64') >>> z = z.astype('f') >>> z.dtype
dtype('float32')numpy array 的数据类型
- numpy 中的索引和切片
>>> s = np.arange(13)
>>> s
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]) >>> s = np.arange(13) ** 2
>>> s
array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144]) >>> s[0],s[4],s[0:3]
(0, 16, array([0, 1, 4])) >>> s[1:5]
array([ 1, 4, 9, 16]) >>> s[-4:]
array([ 81, 100, 121, 144]) >>> s[-5:-2]
array([ 64, 81, 100])numpy索引和切片(例子一)
>>> r = np.arange(36)
>>> r.resize((6,6))
>>> r
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]]) >>> r[2,2]
14 >>> r[3,3:6]
array([21, 22, 23]) >>> r[:2,:-1]
array([[ 0, 1, 2, 3, 4],
[ 6, 7, 8, 9, 10]]) >>> r[-1,::2]
array([30, 32, 34]) >>> r[r > 30] #取r大于30的数据
array([31, 32, 33, 34, 35]) >>> re2 = r[r > 30] = 30
>>> re2
30
>>> r8 = r[:3,:3]
>>> r8 array([[ 0, 1, 2],
[ 6, 7, 8],
[12, 13, 14]]) >>> r8[:] = 0 >>> r8
array([[0, 0, 0],
[0, 0, 0],
[0, 0, 0]]) >>> r
array([[ 0, 0, 0, 3, 4, 5],
[ 0, 0, 0, 9, 10, 11],
[ 0, 0, 0, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 30, 30, 30, 30, 30]])numpy索引和切片(例子二)
- copy numpy array 的数组
>>> r = np.arange(36)
>>> r.resize((6,6))
>>> r_copy = r.copy()
>>> r
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]]) >>> r_copy
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]]) >>> r_copy[:] = 10 >>> r_copy
array([[10, 10, 10, 10, 10, 10],
[10, 10, 10, 10, 10, 10],
[10, 10, 10, 10, 10, 10],
[10, 10, 10, 10, 10, 10],
[10, 10, 10, 10, 10, 10],
[10, 10, 10, 10, 10, 10]])copy( )例子
- 其他操作
>>> test = np.random.randint(0,10,(4,3))
>>> test
array([[3, 5, 2],
[7, 7, 9],
[8, 9, 2],
[2, 9, 1]]) >>> for row in test:
... print(row)
...
[3 5 2]
[7 7 9]
[8 9 2]
[2 9 1] >>> for i in range(len(test)):
... print(test[i])
...
[3 5 2]
[7 7 9]
[8 9 2]
[2 9 1] >>> for i, row in enumerate(test):
... print('row', i, 'is', row)
...
row 0 is [3 5 2]
row 1 is [7 7 9]
row 2 is [8 9 2]
row 3 is [2 9 1] >>> test2 = test ** 2
>>> test2
array([[ 9, 25, 4],
[49, 49, 81],
[64, 81, 4],
[ 4, 81, 1]]) >>> for i,j, in zip(test,test2):
... print(i, '+', j, '=', i + j)
...
[3 5 2] + [ 9 25 4] = [12 30 6]
[7 7 9] + [49 49 81] = [56 56 90]
[8 9 2] + [64 81 4] = [72 90 6]
[2 9 1] + [ 4 81 1] = [ 6 90 2]
>>>numpy array 的其他操作例子
Series 的介绍和操作实战
如果是输入一个字典类型的话,字典的键会自动变成 Index,然后它的值是Value
from pandas import Series, DataFrame
import pandas as pd
pd.Series(['Dog','Bear','Tiger','Moose','Giraffe','Hippopotamus','Mouse'], name='Animals') #接受一个列表类型的数据
def __init__(self, data=None, index=None, dtype=None, name=None,
copy=False, fastpath=False):
Series的__init__方法
- 创建 Series 类型
第一:你可以传入一个列表或者是字典来创建 Series,如果传入的是列表,Python会自动把 [0,1,2] 作为 Series 的索引。
第二:如果你传入的是字符串类型的数据,Series 返回的dtype是object;如果你传入的是数字类型的数据,Series 返回的dtype是int64>>> from pandas import Series, DataFrame
>>> import pandas as pd
>>> animals = ['Tiger','Bear','Moose'] >>> s1 = pd.Series(animals)
>>> s1
0 Tiger
1 Bear
2 Moose
dtype: object >>> s2 = pd.Series([1,2,3])
>>> s2
0 1
1 2
2 3
dtype: int64创建 Series
Series如何处理 NaN的数据?
>>> animals2 = ['Tiger','Bear',None]
>>> s3 = pd.Series(animals2)
>>> s3
0 Tiger
1 Bear
2 None
dtype: object >>> s4 = pd.Series([1,2,None])
>>> s4
0 1.0
1 2.0
2 NaN
dtype: float64Series NaN数据(范例)
- Series 中的 NaN数据和如何检查 NaN数据是否相等,这时候需要调用 np.isnan( )方法
>>> import numpy as np
>>> np.nan == None
False >>> np.nan == np.nan
False >>> np.isnan(np.nan)
Truenp.isnan( )
- Series 默应 Index 是 [0,1,2],但也可以自定义 Series 中的Index
>>> import numpy as np
>>> sports = {
... 'Archery':'Bhutan',
... 'Golf':'Scotland',
... 'Sumo':'Japan',
... 'Taekwondo':'South Korea'
... } >>> s5 = pd.Series(sports)
>>> s5
Archery Bhutan
Golf Scotland
Sumo Japan
Taekwondo South Korea
dtype: object >>> s5.index
Index(['Archery', 'Golf', 'Sumo', 'Taekwondo'], dtype='object')自定义 Series 中的Index(例子一)
>>> from pandas import Series, DataFrame
>>> import pandas as pd
>>> s6 = pd.Series(['Tiger','Bear','Moose'], index=['India','America','Canada'])
>>> s6
India Tiger
America Bear
Canada Moose
dtype: object自定义 Series 中的Index(例子一)
- 查询 Series 的数据有两种方法,第一是通过index方法 e.g. s.iloc[2];第二是通过label方法 e.g. s.loc['America']
>>> from pandas import Series, DataFrame
>>> import pandas as pd
>>> s6
India Tiger
America Bear
Canada Moose
dtype: object >>> s6.iloc[2] #获取 index2位置的数据
'Moose' >>> s6.loc['America'] #获取 label: America 的值
'Bear' >>> s6[1] #底层调用了 s6.iloc[1]
'Bear' >>> s6['India'] #底层调用了 s6.loc['India']
'Tiger'查询Series(例子)
- Series 的数据操作: sum( ),它底层也是调用 numpy 的方法
>>> s7 = pd.Series([100.00,120.00,101.00,3.00])
>>> s7
0 100.0
1 120.0
2 101.0
3 3.0
dtype: float64 >>> total = 0
>>> for item in s7:
... total +=item
...
>>> total
324.0 >>> total2 = np.sum(s7)
>>> total2
324.0np.sum(s7)
>>> s8 = pd.Series(np.random.randint(0,1000,10000))
>>> s8.head()
0 25
1 399
2 326
3 479
4 603
dtype: int64
>>> len(s8)
10000head( )例子
- Series 也可以存储混合型数据
>>> s9 = pd.Series([1,2,3])
>>> s9.loc['Animals'] = 'Bears'
>>> s9
0 1
1 2
2 3
Animals Bears
dtype: object混合型存储数据(例子)
- Series 中的 append( ) 用法
>>> original_sports = pd.Series({'Archery':'Bhutan',
... 'Golf':'Scotland',
... 'Sumo':'Japan',
... 'Taekwondo':'South Korea'})
>>> cricket_loving_countries = pd.Series(['Australia', 'Barbados','Pakistan','England'],
... index=['Cricket','Cricket','Cricket','Cricket'])
>>> all_countries = original_sports.append(cricket_loving_countries) >>> original_sports
Archery Bhutan
Golf Scotland
Sumo Japan
Taekwondo South Korea
dtype: object >>> cricket_loving_countries
Cricket Australia
Cricket Barbados
Cricket Pakistan
Cricket England
dtype: object >>> all_countries
Archery Bhutan
Golf Scotland
Sumo Japan
Taekwondo South Korea
Cricket Australia
Cricket Barbados
Cricket Pakistan
Cricket England
dtype: objectSeries类型的append( )
DataFrame
这是创建一个DataFrame对象的基本语句:接受字典类型的数据;字典中的Key (e.g. Animals, Owners) 对应 DataFrame中的Columns,它的 Value 也相当于数据库表中的每一行数据。
data = {
'Animals':['Dog','Bear','Tiger','Moose','Giraffe','Hippopotamus','Mouse'],
'Owners':['Chris','Kevyn','Bob','Vinod','Daniel','Fil','Stephanie']
}
df = DataFrame(data, columns=['Animals','Owners'])
基础操作
- 创建DataFrame
>>> from pandas import Series, DataFrame
>>> import pandas as pd
>>> data = {'name':['yahoo','google','facebook'],
... 'marks':[200,400,800],
... 'price':[9,3,7]}
>>> df = DataFrame(data)
>>> df
marks name price
0 200 yahoo 9
1 400 google 3
2 800 facebook 7创建DataFrame(例子一)
>>> df2 = DataFrame(data, columns=['name','price','marks'])
>>> df2
name price marks
0 yahoo 9 200
1 google 3 400
2 facebook 7 800 >>> df3 = DataFrame(data, columns=['name','price','marks'], index=['a','b','c'])
>>> df3
name price marks
a yahoo 9 200
b google 3 400
c facebook 7 800 >>> df4 = DataFrame(data, columns=['name','price','marks', 'debt'], index=['a','b','c'])
>>> df4
name price marks debt
a yahoo 9 200 NaN
b google 3 400 NaN
c facebook 7 800 NaN创建DataFrame(例子二)
>>> import pandas as pd
>>> purchase_1 = pd.Series({'Name':'Chris','Item Purchased':'Dog Food','Cost':22.50})
>>> purchase_2 = pd.Series({'Name':'Kelvin','Item Purchased':'Kitty Litter','Cost':2.50})
>>> purchase_3 = pd.Series({'Name':'Vinod','Item Purchased':'Bird Seed','Cost':5.00})
>>>
>>> df = pd.DataFrame([purchase_1,purchase_2,purchase_3],index=['Store 1','Store 2','Store 1'])
>>> df
Cost Item Purchased Name
Store 1 22.5 Dog Food Chris
Store 2 2.5 Kitty Litter Kelvin
Store 1 5.0 Bird Seed Vinod创建DataFrame(例子三)
- 查询 dataframe 的index:df.loc['index']
>>> df.loc['Store 2']
Cost 2.5
Item Purchased Kitty Litter
Name Kelvin
Name: Store 2, dtype: objectdf.loc['Store 2']
>>> df.loc['Store 1']
Cost Item Purchased Name
Store 1 22.5 Dog Food Chris
Store 1 5.0 Bird Seed Vinoddf.loc['Store 1']
>>> df['Item Purchased']
Store 1 Dog Food
Store 2 Kitty Litter
Store 1 Bird Seed
Name: Item Purchased, dtype: objectdf['Item Purchased']
- 查 store1 的 cost 是多少
>>> df.loc['Store 1', 'Cost']
Store 1 22.5
Store 1 5.0
Name: Cost, dtype: float64df.loc['Store 1', 'Cost']
- 查询Cost大于3的Name
>>> df['Name'][df['Cost']>3]
Store 1 Chris
Store 1 Vinod
Name: Name, dtype: objectdf['Name'][df['Cost']>3]
- 查询DataFrame 的类型
>>> type(df.loc['Store 2'])
<class 'pandas.core.series.Series'>type( )例子
- drop dataframe (但这不会把原来的 dataframe drop 掉)
>>> df.drop('Store 1')
Cost Item Purchased Name
Store 2 2.5 Kitty Litter Kelvin >>> df
Cost Item Purchased Name
Store 1 22.5 Dog Food Chris
Store 2 2.5 Kitty Litter Kelvin
Store 1 5.0 Bird Seed Vinoddf.drop('Store 1')
>>> copy_df = df.copy()
>>> copy_df
Cost Item Purchased Name
Store 1 22.5 Dog Food Chris
Store 2 2.5 Kitty Litter Kelvin
Store 1 5.0 Bird Seed Vinod
>>> copy_df = df.drop('Store 1')
>>> copy_df
Cost Item Purchased Name
Store 2 2.5 Kitty Litter Kelvin把dataframe数据drop的例子
也可以用 del 把 Column 列删除掉
>>> del copy_df['Name']
>>> copy_df
Cost Item Purchased
Store 2 2.5 Kitty Litterdel copy_df['Name']
- set_index
- rename column
- 可以修改dataframe里的数据
>>> df = pd.DataFrame([purchase_1,purchase_2,purchase_3],index=['Store 1','Store 2','Store 1'])
>>> df
Cost Item Purchased Name
Store 1 22.5 Dog Food Chris
Store 2 2.5 Kitty Litter Kelvin
Store 1 5.0 Bird Seed Vinod >>> df['Cost'] = df['Cost'] * 0.8
>>> df
Cost Item Purchased Name
Store 1 18.0 Dog Food Chris
Store 2 2.0 Kitty Litter Kelvin
Store 1 4.0 Bird Seed Vinoddf['Cost'] * 0.8
>>> df = pd.DataFrame([purchase_1,purchase_2,purchase_3],index=['Store 1','Store 2','Store 1'])
>>> costs = df['Cost']
>>> costs
Store 1 22.5
Store 2 2.5
Store 1 5.0
Name: Cost, dtype: float64
>>> costs += 2
>>> costs
Store 1 24.5
Store 2 4.5
Store 1 7.0
Name: Cost, dtype: float64costs = df['Cost']
进阶操作
- Merge
Full Outer Join
Inner Join
Left Join
Right Join - apply
- group by
- agg
- astype
- cut
s = pd.Series([168, 180, 174, 190, 170, 185, 179, 181, 175, 169, 182, 177, 180, 171])
pd.cut(s, 3)
pd.cut(s, 3, labels=['Small', 'Medium', 'Large'])cut( )
- pivot table
Date in DataFrame
- Timestampe
- period
- DatetimeINdex
- PeriodIndex
- to_datetime
- Timedelta
- date_range
- difference between date value
- resample
- asfreq - changing the frequency of the date
读取 csv 文件
import pandas as pd
pd.read_csv('student.csv')
- 读取csv
>>> from pandas import Series, DataFrame
>>> import pandas as pd
>>> df_student = pd.read_csv('student.csv')
>>> df_student
name class marks age
janice python 80 22
alex python 95 21
peter python 85 25
ken java 75 28
lawerance java 50 22pd.read_csv('student.csv')(例子一)
df_student = pd.read_csv('student.csv', index_col=0, skiprows=1)
pd.read_csv('student.csv')(例子二)
- 获取分数大于70的数据
>>> df_student['marks'] > 70
True
True
True
True
False
Name: marks, dtype: bool方法一: df_student['marks'] > 70
>>> df_student.where(df_student['marks']>70)
name class marks age
janice python 80.0 22.0
alex python 95.0 21.0
peter python 85.0 25.0
ken java 75.0 28.0
NaN NaN NaN NaN方法二: df_student.where(df_student['marks']>70)
>>> df_student[df_student['marks'] > 70]
name class marks age
0 janice python 80 22
1 alex python 95 21
2 peter python 85 25
3 ken java 75 28方法三: df_student[df_student['marks'] > 70]
- 获取class = 'python' 的数据,df.count( ) 是不会把 NaN数据计算在其中
>>> df2 = df_student.where(df_student['class'] == 'python')
>>> df2
name class marks age
0 janice python 80.0 22.0
1 alex python 95.0 21.0
2 peter python 85.0 25.0
3 NaN NaN NaN NaN
4 NaN NaN NaN NaN >>> df2 = df_student[df_student['class'] == 'python']
>>> df2
name class marks age
0 janice python 80 22
1 alex python 95 21
2 peter python 85 25df_student.where( )例子
- 计算 class 的数目 e.g. count( )
>>> df2['class'].count() #不会把 NaN也计算
3 >>> df_student['class'].count() #会把 NaN也计算
5df.count( )例子
- 删取NaN数据
>>> df3 = df2.dropna()
>>> df3
name class marks age
0 janice python 80.0 22.0
1 alex python 95.0 21.0
2 peter python 85.0 25.0df2.dropna()
- 获取age大于23 学生的数据
>>> df_student
name class marks age
0 janice python 80 22
1 alex python 95 21
2 peter python 85 25
3 ken java 75 28
4 lawerance java 50 22 >>> df_student[df_student['age'] > 23]
name class marks age
2 peter python 85 25
3 ken java 75 28 >>> df_student['age'] > 23
0 False
1 False
2 True
3 True
4 False
Name: age, dtype: bool >>> len(df_student[df_student['age'] > 23])
2df_student[df_student['age'] > 23]
- 获取age大于23和分数大于80分学生的数据
>>> df_student
name class marks age
0 janice python 80 22
1 alex python 95 21
2 peter python 85 25
3 ken java 75 28
4 lawerance java 50 22
>>> df_and = df_student[(df_student['age'] > 23) & (df_student['marks'] > 80)]
>>> df_and
name class marks age
2 peter python 85 25df_student[(df_student['age'] > 23) & (df_student['marks'] > 80)]
- 获取age大于23或分数大于80分学生的数据
>>> df_student
name class marks age
0 janice python 80 22
1 alex python 95 21
2 peter python 85 25
3 ken java 75 28
4 lawerance java 50 22 >>> df_or = df_student[(df_student['age'] > 23) | (df_student['marks'] > 80)]
>>> df_or
name class marks age
1 alex python 95 21
2 peter python 85 25
3 ken java 75 28df_student[(df_student['age'] > 23) | (df_student['marks'] > 80)]
- 重新定义index的数值 df.set_index( )
>>> df_student = pd.read_csv('student.csv')
>>> df_student
name class marks age
0 janice python 80 22
1 alex python 95 21
2 peter python 85 25
3 ken java 75 28
4 lawerance java 50 22 >>> df_student['order_id'] = df_student.index
>>> df_student
name class marks age order_id
0 janice python 80 22 0
1 alex python 95 21 1
2 peter python 85 25 2
3 ken java 75 28 3
4 lawerance java 50 22 4 >>> df_student = df_student.set_index('class')
>>> df_student
name marks age order_id
class
python janice 80 22 0
python alex 95 21 1
python peter 85 25 2
java ken 75 28 3
java lawerance 50 22 4df_student.set_index( )例子
- 获取在 dataframe column 中唯一的数据
>>> df_student = pd.read_csv('student.csv')
>>> df_student['class'].unique()
array(['python', 'java'], dtype=object)df.unique( )例子
python 的可视化 matplotlib
- plot
參考資料
Coursera: Introduction to Data Science in Python
Data Science: GoodHart's Law | Goodhart's Law
Python 数据科学系列 の Numpy、Series 和 DataFrame介绍的更多相关文章
- Python数据科学手册-Numpy入门
通过Python有效导入.存储和操作内存数据的技巧 数据来源:文档.图像.声音.数值等等,将所有的数据简单的看做数字数组 非常有助于 理解和处理数据 不管数据是何种形式,第一步都是 将这些数据转换成 ...
- [python]-数据科学库Numpy学习
一.Numpy简介: Python中用列表(list)保存一组值,可以用来当作数组使用,不过由于列表的元素可以是任何对象,因此列表中所保存的是对象的指针.这样为了保存一个简单的[1,2,3],需要有3 ...
- Python数据科学手册-Numpy的结构化数组
结构化数组 和 记录数组 为复合的.异构的数据提供了非常有效的存储 (一般使用pandas 的 DataFrame来实现) 传入的dtpye 使用 Numpy数据类型 Character Descri ...
- Python数据科学手册-Numpy数组的排序
1) Numpy中的快速排序: np.sort 和 np.argsort np.sort 是快速排序,算法复杂度 O[ N log N] ,也可以选择归并排序和堆排序 如果不想修改原始输入数组,返 ...
- Python数据科学手册-Numpy数组的计算:比较、掩码和布尔逻辑,花哨的索引
Numpy的通用函数可以用来替代循环, 快速实现数组的逐元素的 运算 同样,使用其他通用函数实现数组的逐元素的 比较 < > 这些运算结果 是一个布尔数据类型的数组. 有6种标准的比较操作 ...
- Python数据科学手册-Numpy数组的计算,通用函数
Python的默认实现(CPython)处理某些操作非常慢,因为动态性和解释性, CPython 在每次循环必须左数据类型的检查和函数的调度..在编译是进行这样的操作.就会加快执行速度. 通用函数介绍 ...
- Python数据科学手册-Numpy数组的计算:广播
广播可以简单理解为用于不同大小数组的二元通用函数(加减乘等)的一组规则 二元运算符是对相应元素逐个计算 广播允许这些二元运算符可以用于不同大小的数组 更高维度的数组 更复杂的情况,对俩个数组的同时广播 ...
- python书籍推荐:Python数据科学手册
所属网站分类: 资源下载 > python电子书 作者:today 链接:http://www.pythonheidong.com/blog/article/448/ 来源:python黑洞网 ...
- 干货!小白入门Python数据科学全教程
前言 本文讲解了从零开始学习Python数据科学的全过程,涵盖各种工具和方法 你将会学习到如何使用python做基本的数据分析 你还可以了解机器学习算法的原理和使用 说明 先说一段题外话.我是一名数据 ...
随机推荐
- 简单理解OpenGL模型视图变换
前几天学习了OpenGL的绘图原理(其实就是坐标的不停变换变换),看到网上有个比较好的例程,于是学习了下,并在自己感兴趣的部分做了注释. 首先通过glMatrixMode(GL_MODELVIEW)设 ...
- [Maven实战](7)坐标
1. 简单介绍 maven的世界中拥有数量很巨大的构件,也就是平时用的一些jar,war等文件. 在maven为这些构件引入坐标概念之前,我们无法使用不论什么一种方式来唯一标识全部这些构件. 因此,当 ...
- 设置Eclipse的workspace路径
首次启动Eclipse/MyEclipse时, 会弹出"Workspace Launcher"对话框, 提示设置Workspace路径. 设定好路径后, 若勾选了"Use ...
- ZOJ ACM 1204 (JAVA)
毕业好几年了,对算法还是比較有兴趣,所以想又一次開始做ACM题.俺做题比較任意,一般先挑通过率高的题来做. 第1204题,详细描写叙述请參考,ZOJ ACM 1204 1)难度分析 这个题目,基本的难 ...
- 在Azure Container Service创建Kubernetes(k8s)群集运行ASP.NET Core跨平台应用程序
引子 在此前的一篇文章中,我介绍了如何在本地docker环境中运行ASP.NET Core跨平台应用程序(http://www.cnblogs.com/chenxizhang/p/7148657.ht ...
- 自学Python5.1-模块简介
模块简介 在C语言中如果要引用sqrt这个函数,必须用语句"#include<math.h>"引入math.h这个头文件,否则是无法正常进行调用的.那么在Python中 ...
- Effective Java 第三版——7. 消除过期的对象引用
Tips <Effective Java, Third Edition>一书英文版已经出版,这本书的第二版想必很多人都读过,号称Java四大名著之一,不过第二版2009年出版,到现在已经将 ...
- python filter map reduce
filter(function, iterable): Construct a list from those elements of iterable for which function retu ...
- Java之路上,让我们Stand Up Again
在开始之前,先发表一下个人想法吧. 在读书的时候每天忙的不可开交,也就没有了所谓的自由,突然参加工作,传统的朝八晚五,标准的八小时工作制,每天都是两点一线,工作中涉及商业机密,公司的东西也不能带回家, ...
- 《编程珠玑(第2版)》【PDF】下载
<编程珠玑(第2版)>[PDF]下载链接: https://u253469.pipipan.com/fs/253469-230382225 内容简介 书的内容围绕程序设计人员面对的一系列实 ...