Python基于pandas的数据处理（二）

14 抽样

 df.sample(10, replace = True)

 df.sample(3)

 df.sample(frac = 0.5) # 按比例抽样

 df.sample(frac = 10, replace = True,weights = np.random.randint(1,10,6)) # 对样本加权

 df.sample(3, axis = 1) # 变量抽样

15 join（即 merge）

 pd.merge(df.sample(4), df.sample(4), how = "left", on = "A", indicator = True)

16 随机数

numpy.random.rand(3, 2) # 按维度生成[0,1)之间的均匀分布随机数

np.random.randn(2,5) # 按维度生成标准正太分布随机数

np.random.randint(2, size=10) # randint(low[, high, size])生成随机整数，默认low为0，high必填，size默认为1

np.random.bytes(10) # 返回随机字节

a=np.arange(10)

np.random.shuffle(a) # 洗牌

a=np.arange(9).reshape(3, 3)

np.random.shuffle(a) # 若是数组，则只会打乱第一维

np.random.permutation(10) # 随机排列，对于多维序列也适用

np.random.permutation(10) .reshape(2, 5)

np.random.seed(1000) # 种子

np.random.normal(2,3,[5,2]) # 高斯分布，其他分布可查

# http://docs.scipy.org/doc/numpy-1.10.1/reference/routines.random.html

np.random.seed(12345678)

x = scipy.stats.norm.rvs(loc=5, scale=3, size=100) # 另外scipy也有这些随机数的生成，附带检验

scipy.stats.shapiro(x)

# http://docs.scipy.org/doc/scipy-0.17.0/reference/stats.html

17 gather和spread

 # gather:

 def gather( df, key, value, cols ):

     id_vars = [ col for col in df.columns if col not in cols ]

     id_values = cols

     var_name = key

     value_name = value

     return pandas.melt( df, id_vars, id_values, var_name, value_name )

 # 以上是定义的一个函数，实际上一样的，横变竖，是gather,竖变横，是spread

 pd.melt(df, id_vars=['E','F'], value_vars=['A','C'])

 # spread:

 pd.pivot(df["D"],df["E"],df['F']) #这个是竖变横

 df3=pd.pivot(df2['D'],df2['variable'],df2['value'])

 df3.reset_index(level=0, inplace=True) # 再变回df的样子

18 熵

 scipy.stats.entropy(np.arange(10))

19 字符串拼接

 [",".join(['a','b','d'])]

 df[['E','F']].groupby('F')['E'].apply(lambda x: "{%s}" % ', '.join(x)) # 分组拼接，前提是这些列都要是字符串

 df[['E','F']].applymap(str).groupby('E')['F'].apply(lambda x: "%s" % ', '.join(x)) # 所以可以这样

20 随机字符串生成

 import random,string

 df2 = pd.DataFrame(range(10),columns=['y'])

 df2["x"] = [",".join(random.sample(string.lowercase,random.randint(2,5))) for i in range(10)]

21 分列后生成hash表

 # 用20 的示例数据

 df3=pd.DataFrame(df2.x.str.split(',').tolist(),index=df2.y).stack().reset_index(level=0)

 df3.columns=["y","x"]

22 去重

 df[["F","E"]].drop_duplicates()

23 离散化

 pd.cut(df.A,range(-1,2,1))

Python基于pandas的数据处理（二）的更多相关文章

Python基于pandas的数据处理（一）
import pandas as pd, numpy as np dates = pd.date_range(', periods=6) df = pd.DataFrame(np.random.ran ...
python – 基于pandas中的列中的值从DataFrame中选择行
如何从基于pandas中某些列的值的DataFrame中选择行?在SQL中我将使用: select * from table where colume_name = some_value. 我试图看看 ...
python使用pandas进行数据处理
pandas数据处理关注公众号"轻松学编程"了解更多. 以下命令都是在浏览器中输入. cmd命令窗口输入:jupyter notebook 打开浏览器输入网址http://loc ...
【python】pandas & matplotlib 数据处理绘制曲面图
Python matplotlib模块,是扩展的MATLAB的一个绘图工具库,它可以绘制各种图形建议安装 Anaconda后使用 ,集成了很多第三库,基本满足大家的需求,下载地址,对应选择pytho ...
基于pandas python的美团某商家的评论销售数据分析(可视化）
基于pandas python的美团某商家的评论销售数据分析第一篇数据初步的统计本文是该可视化系列的第二篇第三篇数据中的评论数据用于自然语言处理导入相关库 from pyecharts i ...
基于 Python 和 Pandas 的数据分析(2) --- Pandas 基础
在这个用 Python 和 Pandas 实现数据分析的教程中, 我们将明确一些 Pandas 基础知识. 加载到 Pandas Dataframe 的数据形式可以很多, 但是通常需要能形成行和列的数 ...
基于 Python 和 Pandas 的数据分析(1)
基于 Python 和 Pandas 的数据分析(1) Pandas 是 Python 的一个模块(module), 我们将用 Python 完成接下来的数据分析的学习. Pandas 模块是一个高性 ...
Python：pandas（二）——pandas函数
Python:pandas(一) 这一章翻译总结自:pandas官方文档--General functions 空值:pd.NaT.np.nan //判断是否为空 if a is np.nan: .. ...
基于 Python 和 Pandas 的数据分析(4) --- 建立数据集
这一节我想对使用 Python 和 Pandas 的数据分析做一些扩展. 假设我们是亿万富翁, 我们会想要多元化地进行投资, 比如股票, 分红, 金融市场等, 那么现在我们要聚焦房地产市场, 做一些这 ...

随机推荐

javascript 中的继承实现， call,apply,prototype,构造函数
javascript中继承可以通过call.apply.protoperty实现 1.call call的含义: foo.call(thisObject, args...) 表示函数foo调用的时候, ...
mac系统，git上刚刚checkout出来的文件，一检查，发现已经被修改过了，怎么破？？？
如下图中所示: 事实上,checkout之后什么都还没做,这些文件为何就被修改? 检查一下别的电脑上所存放的同一套源码,原来出问题的文件都是同名文件,只不过是有大小写区分而已!!! linux系统可以 ...
新版react踩坑总结
使用es6语法与原本es5语法几个有区别的地方 1.React.creatClass与React.Component var Component = React.createClass({ rende ...
[Swift] 疑难杂症
[Swift] 疑难杂症 1.class .... has no initializers --> class 的每一个元素都需要初始化,否则会报错,除了可空元素
nginx支持pathinfo并且隐藏index.php
How To Set Nginx Support PATHINFO URL Model And Hide The /index.php/ 就像这样 The URL before setting lik ...
在oracle里写各种语句得心应手，但是在mybatis.xml文件里呢？
这个问题我让我搞了大半天,实在气人,话不多说,直接上代码 <select id="*" resultMap="Blog" parameterType=&q ...
greendao对SQLite数据库的增删改查操作
利用greendao操作数据库时,都是以对象或者对象的list来进行增删改查的操作,操作的结果都是用一个list来接收的!!! 1.增加一条记录 Stu stu01=new Stu();stu01.s ...
Python：进程
由于GIL的存在,python一个进程同时只能执行一个线程.因此在python开发时,计算密集型的程序常用多进程,IO密集型的使用多线程 1.多进程创建: #创建方法1:将要执行的方法作为参数传给Pr ...
colorbox 自适应高度
$(".example3").colorbox({ inline: true, scrolling: false , onComplete: ...
IOS开发之网络图片处理
//图片压缩 UIImage* image=[UIImage imageWithData:data]; NSData *data1 = UIImageJPEGRepresentation(image, ...

Python基于pandas的数据处理（二）

Python基于pandas的数据处理（二）的更多相关文章

随机推荐

热门专题