今日内容概要

布尔选择器
索引
数据对齐
数据操作(增出改查)
算术方法
DataFrame(Excel表格数据)

布尔选择器

import numpy as np

import pandas as pd

res = pd.Series([True,False,False,True,False])

price = pd.Series([321321,123,324,5654,645])

# 掌握

price[res]

0    321321

3      5654

dtype: int64

# 了解

price|res

0    True

1    True

2    True

3    True

4    True

dtype: bool

price&res

0     True

1    False

2    False

3    False

4    False

dtype: bool

# 需要掌握

(price > 100) & (price < 700)

0    False

1     True

2     True

3    False

4     True

dtype: bool

price[(price > 100) & (price < 700)]

1    123

2    324

4    645

dtype: int64

索引及标签

res1 = pd.Series({'a':111,'b':222,'c':333,'d':444,'e':555})

res1

a    111

b    222

c    333

d    444

e    555

dtype: int64

# 索引取值

res1[0]

111

# 标签取值

res1['a']

111

# 获取所有的标签

res1.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

# 给标签加列名称

res1.index.name = 'STA'

res1

STA

a    111

b    222

c    333

d    444

e    555

dtype: int64

# data_range时间间隔

res2 = pd.date_range('2020-01-01','2020-12-01',freq='M')  # frep后面按照指定的时间间隔(年'Y',月'M',日'D')

res2

DatetimeIndex(['2020-01-31', '2020-02-29', '2020-03-31', '2020-04-30',

               '2020-05-31', '2020-06-30', '2020-07-31', '2020-08-31',

               '2020-09-30', '2020-10-31', '2020-11-30'],

              dtype='datetime64[ns]', freq='M')

# 还可以将日期作为Series的标签

res3 = pd.Series([111,222,333,444,555],index=res3)

res3

2020-01-31    111

2020-02-29    222

2020-03-31    333

2020-04-30    444

2020-05-31    555

Freq: M, dtype: int64

res3.index.name = '日期'

日期

2020-01-31    111

2020-02-29    222

2020-03-31    333

2020-04-30    444

2020-05-31    555

Freq: M, dtype: int64

整数索引

1 整数索引

x1 = pd.Series(np.arange(11))

x1

0      0

1      1

2      2

3      3

4      4

5      5

6      6

7      7

8      8

9      9

10    10

dtype: int32

x2 = x1[4:]

x2

4      4

5      5

6      6

7      7

8      8

9      9

10    10

dtype: int32

##################################################################################################

# 索引取值

# x1[1] # 报错

'''针对取值操作，以后需要用特定方法来约束'''

# iloc按照索引的方式取值

# loc按照标签的方式取值

# x1.iloc[1] # 1

x1.loc[3]  # 3

'''非常重要，一定要记忆'''

###################################################################################################

数据对齐

a1 = pd.Series([12,23,34,45],index=['c','a','d','b'])

a2 = pd.Series([11,20,10,30],index=['d','c','a','b'])

a1 + a2

运行结果：

a    33

b    75

c    32

d    45

dtype: int64

# 可以通过这种索引对齐直接将两个Series对象进行运算

a3 = pd.Series([11,20,10,14],index=['d','c','a','e'])

a1 + a3

运行结果：

a    33.0

b     NaN

c    32.0

d    45.0

e     NaN

dtype: float64

# a1和a3的索引不一致，所以最终的运行会发现e索引对应的值无法运算，就返回了NAN，一个缺失值

'''

疑问：为什么运算完之后数据类型会由原来的int64变成float64?

因为NaN其实是float类型

type(np.nan)

结果是：float

'''

数据操作

'''增删改查'''

a3= pd.Series([11,20,10,14],index=['d','c','a','e'])

a3

d    11

c    20

a    10

e    14

dtype: int64

# 查

a3.loc['a']

10

# 改

a3.iloc[2]= 100

a3

d     11

c     20

a    100

e     14

dtype: int64

# 增

# 方式1：append不修改原数据

a3.append(pd.Series([66],index=['e']))

d     11

c     20

a    100

e     14

e     66

dtype: int64

#方式2：set_value直接修改原数据

a3.set_value('f',999)  # 会有一个提示 如果不想有这个提示需要配置

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: set_value is deprecated and will be removed in a future release. Please use .at[] or .iat[] accessors instead

d     11

c     20

a    100

e     14

f    999

dtype: int64

a3

d     11

c     20

a    100

e     14

f    999

dtype: int64

# 删: del关键字作用的也是原数据

del a3['f']

a3

d     11

c     20

a    100

e     14

dtype: int64

灵活的算术方法

"""

针对加减乘除等数学运算

可以直接使用符号

也可以使用提供的方法名(可以有额外的功能)

add

sub

div

mul

"""

b1 = pd.Series([12,23,34], index=['c','a','b'])

b3 = pd.Series([11,20,10,14], index=['d','c','a','b'])

b1

c    12

a    23

b    34

dtype: int64

b3

d    11

c    20

a    10

b    14

dtype: int64

tes = b1 + b3

tes

a    33.0

b    48.0

c    32.0

d     NaN

dtype: float64

tes1 = b1*b3

tes1

a    230.0

b    476.0

c    240.0

d      NaN

dtype: float64

b1.add(b3,fill_value=666)

b1

c    12

a    23

b    34

dtype: int64

b3

d    11

c    20

a    10

b    14

dtype: int64

fill_value

b1.add(b3,fill_value=0) # 在运行之前找出调用该方法的Series当中的缺失值补全后再运算

a    33.0

b    48.0

c    32.0

d    11.0

dtype: float64

b1.mul(b3,fill_value=1)

a    230.0

b    476.0

c    240.0

d     11.0

dtype: float64

DataFrame

表格型数据结构，相当于一个二维数组，含有一组有序的列也可以看作是由Series组成

基本使用

# 创建Dataframe有很多中方式，但是一般情况下我们都不需要自己创建DataFrame而是将excel文件直接引导成DataFrame

# 方式1 传字典字典的键会变成表格的列名称 行名称默认是索引

import numpy as np

import pandas as pd

res = pd.DataFrame({'one':[1,2,3,4],'two':[4,3,2,1]})

res

one	two

0	1	4

1	2	3

2	3	2

3	4	1

# 取值

res['one']  # 默认是Series的展示形式

0    1

1    2

2    3

3    4

Name: one, dtype: int64

res[['two']] # 再加[]就会变成表格的形式

two

0	4

1	3

2	2

3	1

res['two'][1] # 第一个中括号里面是列 第二个中括号里面是行

3

# 方式2： 直接传Series 如果Series有自定义的标签 那么生成的DataFrame列名称采用的就是标签名

res1 = pd.DataFrame({'one':pd.Series([1,2,3],index=['c','b','a']),'two':pd.Series([1,2,3],index=['b','a','c'])})

res1

one	two

a	3	2

b	2	1

c	1	3

# 方式3：自定义行列 index行 columns列

pd.DataFrame(np.array([[10,20,30],[40,50,60]]),index=['a','b'],columns=['c1','c2','c3'])

c1	c2	c3

a	10	20	30

b	40	50	60

arange

# 方式4：列表中有几个元素就会生成几行数据  不指定行列默认都是用索引表示

pd.DataFrame([np.arange(1,8),np.arange(11,18),np.arange(21,28)])

0	1	2	3	4	5	6

0	1	2	3	4	5	6	7

1	11	12	13	14	15	16	17

2	21	22	23	24	25	26	27

# 方式5：会自动找行列的对应位置 没有的用NaN表示缺失值

s1 = pd.Series(np.arange(1,9,2))

s2 = pd.Series(np.arange(2,10,2))

s3 = pd.Series(np.arange(5,7),index=[1,2])

s1

0    1

1    3

2    5

3    7

dtype: int32

s2

0    2

1    4

2    6

3    8

dtype: int32

s3

1    5

2    6

dtype: int32

df5 = pd.DataFrame({'c1':s1,'c2':s2,'c3':s3})

df5

c1	c2	c3

0	1	2	NaN

1	3	4	5.0

2	5	6	6.0

3	7	8	NaN

'''以上创建房事后都仅仅做一个了解即可，因为工作在中dataframe的数据一般都是来自于读取外部文件数据'''

常见属性及方法

1.index  行索引

2.columns 列索引

3.T        转置

4. values  值索引

5.describe 快速统计

# index获取行索引

df5.index

Int64Index([0,1,2,3],dtype='int64')

# columns获取列索引

df5.columns

Index(['c1', 'c2', 'c3'], dtype='object')

# T转置 行列互换

df5.T

0	1	2	3

c1	1.0	3.0	5.0	7.0

c2	2.0	4.0	6.0	8.0

c3	NaN	5.0	6.0	NaN

df5

c1	c2	c3

0	1	2	NaN

1	3	4	5.0

2	5	6	6.0

3	7	8	NaN

values

# values获取表格数据 组织成二维数组的形式

df5.values

array([[ 1.,  2., nan],

       [ 3.,  4.,  5.],

       [ 5.,  6.,  6.],

       [ 7.,  8., nan]])

# describe常见的数学统计

df5.describe()

c1	c2	c3

count	4.000000	4.000000	2.000000

mean	4.000000	5.000000	5.500000

std	2.581989	2.581989	0.707107

min	1.000000	2.000000	5.000000

25%	2.500000	3.500000	5.250000

50%	4.000000	5.000000	5.500000

75%	5.500000	6.500000	5.750000

max	7.000000	8.000000	6.000000

pandas模块篇（之二）的更多相关文章

pandas模块篇（终章）及初识mataplotlib
今日内容概要时间序列针对表格数据的分组与聚合操作其他函数补充(apply) 练习题(为了加深对DataFrame操作的印象) mataplotlib画图模块今日内容详细时间序列处理时间序列 ...
pandas模块篇(之三）
今日内容概要目标:将Pandas尽量结束如何读取外部excel文件数据到DataFrame中针对DataFrame的常用数据操作索引与切片操作DataFrame的字段名称时间对象序列操作 ...
Pandas模块
前言: 最近公司有数据分析的任务,如果使用Python做数据分析,那么对Pandas模块的学习是必不可少的: 本篇文章基于Pandas 0.20.0版本话不多说社会你根哥!开干! pip insta ...
开发技术--pandas模块
开发|pandas模块整了一篇关于pandas模块的使用文章,方便检查自己的学习质量.自从使用了pandas之后,真的是被它的功能所震撼~~~ 前言目前所有的文章思想格式都是:知识+情感. 知识: ...
python之pandas模块
一.pandas模块是基于Numpy模块的,pandas的主要数据结构是Series和DadaFrame,下面引入这样的约定: from pandas import Series,DataFrame ...
Python 数据处理扩展包： numpy 和 pandas 模块介绍
一.numpy模块 NumPy(Numeric Python)模块是Python的一种开源的数值计算扩展.这种工具可用来存储和处理大型矩阵,比Python自身的嵌套列表(nested list str ...
关于Python pandas模块输出每行中间省略号问题
关于Python数据分析中pandas模块在输出的时候,每行的中间会有省略号出现,和行与行中间的省略号....问题,其他的站点(百度)中的大部分都是瞎写,根本就是复制黏贴以前的版本,你要想知道其他问题 ...
Pandas模块：表计算与数据分析
目录 Pandas之Series Pandas之DataFrame 一.pandas简单介绍 1.pandas是一个强大的Python数据分析的工具包.2.pandas是基于NumPy构建的. 3.p ...
pandas模块(很详细归类),pd.concat(后续补充)
6.12自我总结一.pandas模块 import pandas as pd约定俗称为pd 1.模块官方文档地址 https://pandas.pydata.org/pandas-docs/stab ...

随机推荐

linux文件详细说明与inode编号
目录一:文件类型概念说明 1.文件详细信息详解 2.inode编号一:文件类型概念说明 1.文件详细信息详解文件详细信息编号 ls - lhi /etc/ 134319695 -rw------ ...
ARP链路追踪
arp协议在TCP/IP模型中属于IP层(网络层),在OSI模型中属于链路层.arp协议即地址解析协议,是根据IP地址获取物理地址的一个TCP/IP协议.它可以解决同一个局域网内主机或路由器的IP地址 ...
python33day
内容回顾概念同步异步阻塞和非阻塞同步阻塞:调用一个函数需要等待这个函数的执行结果,并且在执行这个函数的过程中CPU不工作 inp=input('>>>') 同步非阻塞:调用一个 ...
X000001
一些相互无关联的题目的集合都是码量不大,略有思维难度的题做起来还是很舒适的 P6312 [PA2018]Palindrom 空间限制很小,不足以存下整个字符串,故暴力判断不可行. 考虑使用字符串哈 ...
Git简单多人协作
感谢廖雪峰老师,引自他的Git教程. Git简单多人协作首先,可以试图用git push origin <branch-name>推送自己的修改: 如果推送失败,则因为远程分支比你的本地 ...
为什么要配置path环境变量
因为在jdk下bin文件夹中有很多我们在开发中要使用的工具,如java.exe,javac.exe,jar.ex等,那么我们在使用时,想要在电脑的任意位置下使用这些java开发工具,那么我们就需有把这 ...
JSP、Servlet和Spring MVC
感谢原博主!!!https://blog.csdn.net/whut2010hj/article/details/80874008 版权声明:本文为博主原创文章,遵循CC 4.0 BY版权协议,转载请 ...
PHP中 die,die(),exit,exit(),return,return() 的区别
die:是遇到错误才停止die():停止程序运行,输出内容exit:是停止程序运行,是直接停止,并且不运行后续代码,不输出内容exit():可以显示内容.exit(0):正常运行程序并退出程序:exi ...
python继承关系中，类属性的修改
class Grandfather(object): mylist = [] def __init__(self): pass class Father(Grandfather): pass Gran ...
@Resource注解和@Autowired注解
原创:转载需注明原创地址 https://www.cnblogs.com/fanerwei222/p/11770982.html 1. @Resource 类来源: javax(Java扩展包) 类全 ...

pandas模块篇（之二）

今日内容概要

布尔选择器

索引及标签

整数索引

数据对齐

数据操作

灵活的算术方法

DataFrame

基本使用

常见属性及方法

pandas模块篇（之二）的更多相关文章

随机推荐

热门专题