import numpy as np

The numpy.random module supplements(补充) the built-in Python random with functions for efficiently generating whole arrays of sample values from many kinds of probalility distributions. For example, you can get a 4x4 array of samples from the standard normal distribution using normal:

samples = np.random.normal(size=(4,4))
samples
array([[-0.49777854,  1.01894039,  0.3542692 ,  1.0187122 ],
[-0.07139068, -0.44245259, -2.05535526, 0.49974435],
[ 0.80183078, -0.11299759, 1.22759314, 1.37571884],
[ 0.32086762, -0.79930024, -0.31965109, 0.23004107]])

Python's built-in random module, by contrast(对比), only samples one value at a time. As you can see from this benchmark, numpy.random is well over an order of magnitude faster for generating very large samples:

from random import normalvariate

n = 100000

'python 运行时间:'
%timeit samples = [normalvariate(0,1) for _ in range(n)]
'python 运行时间:'
127 ms ± 7.72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
'np.random 运行时间:'
%timeit np.random.normal(size=n)
'np.random 运行时间:'
4.2 ms ± 277 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%time 

# 我后来还是喜欢 %%time 这样的计时方式

from random import normalvariate 

n = 1000000

'Python run time:'
samples = [normalvariate(0,1) for _ in range(n)]
Wall time: 1.08 s

We say that these are pseudorandom numbers(伪随机数) because they are generated by an algorithim with deterministic behavior(确定行为的算法生成的) You can change NumPy's random number generation seed number generation seed using np.random.seed:

"cj还是不理解seed(), 是让算法不改变吗? 每次都是同一个?"

np.random.seed(1234)
"现在理解了, seed(n),叫做随机种子, n表示在接下来的n次(抽样)的结果是一样的"
'cj还是不理解seed(), 是让算法不改变吗? 每次都是同一个?'
'现在理解了, seed(n),叫做随机种子, n表示在接下来的n次(抽样)的结果是一样的'

The data generation functions in numpy.random use a global random seed. To avoid global state, you can use numpy.random.RandomState to create a random number generator islolated(单独的) from others.

rng = np.random.RandomState(1234)

rng.randn(10)
array([ 0.47143516, -1.19097569,  1.43270697, -0.3126519 , -0.72058873,
0.88716294, 0.85958841, -0.6365235 , 0.01569637, -2.24268495])

See Table 4-8 for a partial list of functions available in numpy.random. I wil give some examples of leveraging(利用) these function's ablility to generate large arrays of samples all at onece in the next section.

  • seed Seed the random number generator (不明白这个函数还是)
  • permutation Return a random pemutation(排列) of a sequence, or return a permuted range
  • shuffle Randomly permute(转换) a sequence in-place (随机洗牌)

  • rand Draw samples from a uniform distribution U~[0, 1], rand(shape)(均匀分布, 每个值出现的可能性是一样的)
  • uniform U~[0, 1], uniform(low=0, high=1, size)
  • randint Draw random integers from a given low-to-high range. (a,b) 边界都是可以取到的

  • randn 标准正态分布\(N(\mu=0, \sigma=1)\), randn(shape)
  • normal 正态分布\(N(\mu, \sigma, size)\), normal(low=0, high=1, size)
  • binomial 二项分布
  • chisquare 卡方分布
  • beta
  • gamma
"permutaion(x), 产生0-x范围内x个随机自然数的一个排列"
np.random.permutation(6) "shuffle(seq) 将一个序列随机打乱, in-place 哦 "
arr1 = np.arange(6)
np.random.shuffle(arr1)
arr1
'permutaion(x), 产生0-x范围内x个随机自然数的一个排列'
array([1, 2, 5, 0, 3, 4])
'shuffle(seq) 将一个序列随机打乱, in-place 哦 '
array([5, 2, 3, 0, 4, 1])
"rand(shape) 0-1的均匀分布哦"
"shape=(2x2x3)-> 2里的每个1是3,每个1里面是1 "
"[[], []]"
"[ [ [6],[6],[6] ], [ [6],[6],[6] ] ]" np.random.rand(2,1,1) "uniform()"
np.random.uniform(3,4,5)
'rand(shape) 0-1的均匀分布哦'
'shape=(2x2x3)-> 2里的每个1是3,每个1里面是1 '
'[[], []]'
'[  [  [6],[6],[6] ],   [ [6],[6],[6]  ]  ]'
array([[[0.06152103]],

       [[0.8725525 ]]])
'uniform()'
array([3.35219682, 3.62783057, 3.51758469, 3.37434633, 3.64026243])
"randn(shape), 标准正态分布"
np.random.randn(1,2,3) "满足normal(loc=0, scale=1, size=None) 均值80, 标准差10"
np.random.normal(80, 20, 6)
'randn(shape), 标准正态分布'

array([[[0.49112636, 0.90638754, 0.05000051],
[1.21431522, 0.67847748, 1.3797269 ]]])
'满足normal(loc=0, scale=1, size=None) 均值80, 标准差10'

array([55.07130243, 56.34397557, 68.95608996, 31.40875572, 89.80741058,
37.38567435])
"binorma(n, p, size), n次试验, p次成功概率, 5个样本量"
np.random.binomial(100, 0.4, 5) "chisquare(10,2) 服从于自由度为10的卡方分布下的2个样本"
np.random.chisquare(10, 2)
'binorma(n, p, size), n次试验, p次成功概率, 样本量'

array([47, 42, 41, 34, 44])

array([20.07301382, 14.54581473])

Example:Random Walks

The simulation of random walks(随机漫步) provides an illustrative(实例) of utilizing(使用) array operations. Let's first consider a simple random walk starting at 0 with steps of 1 and -1 occuring(出现) with equal probalility. (1,-1 等概率出现)

Here is a pure Python way to implement a single random walk with 1000 steps using the built-in random module.

import random
import matplotlib.pyplot as plt def random_walk(position=0, steps=1000, walk=[]):
"""
position=0 # 初始位置
walk = [] # 结果列表
steps = 1000
"""
for i in range(steps):
state = 1 if random.randint(0,1) else -1
position += state
# 将每次位置存入结果列表中
walk.append(position) return walk # test
walk_result = random_walk() "plot the first 100 values on one of these random walks:"
"默认折线图" plt.plot(walk_result[:100])
'plot the first 100 values on one of these random walks:'

'默认折线图'

[<matplotlib.lines.Line2D at 0x19329c95208>]

You might make the observation that walk is simply the

cumulative(累积的) sum of the random steps and could be evaluate as an array expression, Thus, I use the np.random module to draw 1000 coin flips at once, set these to 1 and -1, and compute the cumulative sum:

nsteps = 1000

"用size代替for循环, 面向数组编程哦"
draws = np.random.randint(0, 2, size=nsteps) "np.where 三元很厉害, 代替if-else"
steps = np.where(draws > 0, 1, -1) 'cumsum 累积求和'
walk = steps.cumsum() plt.plot(walk)
'用size代替for循环, 面向数组编程哦'

'np.where 三元很厉害, 代替if-else'

'cumsum 累积求和'

[<matplotlib.lines.Line2D at 0x1932a032278>]

From this we can begin to extract(提取) statistics like the minmun and maxmun value along the walks trajectory(轨迹)

"walk的极值"
walk.min()
walk.max() type(walk)
'walk的极值'

-17

15

numpy.ndarray

# cj test cumsum()
cj_arr = np.array([[1,2,3],[4,5,6]])
cj_arr "cumsum()累积求和, 返回的是narray 可看到累积的过程哦"
np.cumsum(cj_arr, axis=0) # 往下, 显示没列的和 np.cumsum(cj_arr, axis=1) # 往右, 显示每行的和
array([[1, 2, 3],
[4, 5, 6]])
'cumsum()累积求和, 返回的是narray 可看到累积的过程哦'

array([[1, 2, 3],
[5, 7, 9]], dtype=int32)
array([[ 1,  3,  6],
[ 4, 9, 15]], dtype=int32)

A more complicated(更复杂的) statistic is the first crossing time, the step at which the random walk reaches a particular value. Here we might want to know how long it took the random walk to get at least 10 steps away from the orgin 0 in either direction. (距离原点为10, 需要多少次) np.abs(walk) >= 10 gives us a boolean array indicating(指明) where(是否) the walk has reached or exceeded 10, but we want the index of the first 10 or -10, Turn out(结果是), we can compute this using argmax, which returns the first index of maximum value in the boolean array(True is the maximum): -> argmax()返回数组中, 最大值的第一个索引, 配合maximum使用

"查询累积数组中, 绝对距离大于10的,的第一个值的索引 "
(np.abs(walk) >= 10).argmax()
'查询累积数组中, 绝对距离大于10的,的第一个值的索引 '

49

Note that(注意) using argmax here is not always efficient because it always makes a full scan of the array. -> 用argmax()来找最大值索引, 效率是不高的,因为需要遍历整个数组. In this special case, once a True is observed we know it to be the maximum value.

# cj test

np.max([1,3,2,5,7])

"maximum返回一个数组, 逐个比较传入数组的值,和第二个参数比较, quit其大,替换"
np.maximum([10,1,3,2,5,7],6)
7

'maximum返回一个数组, 逐个比较传入数组的值,和第二个参数比较, quit其大,替换'

array([10,  6,  6,  6,  6,  7])

"待补充中..."
np.maximum(np.abs(walk), 10).argmax()
530

# cj_arr1 = np.array([10,6,6,6,6,7]).argmax()
# cj_arr1
0

Simulating Many Random Walk at Once

If your goal to simulate many random walks, say 5000 of them, you can generate all of the random walks with minor modiffcations(微小的修改) to the preceding code(之前的代码). If passed a 2-tuple, the numpy.random functions will generate a two-dimensional array of draws, and we can compute the cumulative sum across the rows to compute all 5000 random walks in one shot:

# cj test
np.random.randint(0,2, size=(3,4))
array([[0, 0, 0, 1],
[0, 1, 0, 1],
[0, 0, 1, 1]])
"一次生成500次行走记录, 每次走1000不的大数据集5000 x 1000"
nwalks = 5000
nsteps = 1000 # 0 or 1
%time draws = np.random.randint(0, 2, size=(nwalks, nsteps)) "5000000万次, 这耗时也太短了,厉害了, 1秒=10^3ms" " >0, 1, <0, -1 "
steps = np.where(draws > 0, 1, -1) "轴1, 列方向, 右边, 按照每行累积"
walks = steps.cumsum(axis=1) walks
'一次生成500次行走记录, 每次走1000不的大数据集5000 x 1000'

Wall time: 58 ms

'5000000万次, 这耗时也太短了,厉害了'

' >0, 1, <0, -1 '

'轴1, 列方向, 右边, 按照每行累积'

array([[ -1,   0,  -1, ...,  22,  21,  22],
[ -1, -2, -3, ..., 52, 53, 54],
[ 1, 0, -1, ..., -20, -19, -20],
...,
[ -1, -2, -3, ..., -48, -47, -48],
[ 1, 2, 1, ..., -8, -7, -8],
[ -1, -2, -1, ..., -10, -11, -10]], dtype=int32)

Now, we can compute the maximun and minimum values obtained(获得) over all of the walks: ->获取最大值, 最小值

"整个数组的"

'最大值', walks.max()
'最小值', walks.min()
'整个数组的'

('最大值', 112)

('最小值', -109)

Out of these walks, let's compute the minimum crossing time to 30 or -30. This is slightly tricky(稍微有些棘手) because not all 5000 of them reach 30. We can check this using the any method:

hist30 = (np.abs(walks) >= 30).any(axis=1)  # 按照行

hist30

'统计 number that hit 30 or -30, 有多少行'
hist30.sum()
array([ True,  True,  True, ...,  True,  True,  True])

'统计 number that hit 30 or -30, 有多少行'

3441

We can use this boolean array to select out the rows of walks that actually cross the absolute 30 level an d call argmax across axis=1 to get the crossing times:

'cj待理解'
crossing_times = (np.abs(walks[hist30]) >= 30).argmax(1)
crossing_times.mean()
'cj待理解'

497.102877070619

Feel free to experiment(积极地尝试) with other distributions for the steps other than equal-sized coin flips(硬币试验). You need only use a different random number generation function, like normal to generate normally distribute steps with some mean and standard deviation(标准差)

import sys

steps = np.random.normal(loc=0, scale=0.25, size=(5000, 1000))

"查看这个对象占多少内存"

"{}的{}占用{}字节".format(steps.shape, steps.dtype, sys.getsizeof(steps))
'查看这个对象占多少内存'

'(5000, 1000)的float64占用40000112字节'

Conclusion

While(当然) much of the rest of the book will focus on building data wrangling(数据整理) skills with pandas, we will continue to work in a similar array-based style. In Appendix A, we will dig deeper(深入挖掘) into NumPy features to help you further develop your array computing skills.

NumPy 之 案例(随机漫步)的更多相关文章

  1. 今天给大家分享用Python matplotlib来写随机漫步的小程序

    先安装两个库: pip install matplotlib pip install numpy 引用库: import matplotlib.pyplot as mp import numpy as ...

  2. Python 项目实践二(生成数据)第二篇之随机漫步

    接着上节继续学习,在本节中,我们将使用Python来生成随机漫步数据,再使用matplotlib以引人瞩目的方式将这些数据呈现出来.随机漫步是这样行走得到的路径:每次行走都完全是随机的,没有明确的方向 ...

  3. Python实现随机漫步

    随机漫步生成是无规则的,是系统自行选择的结果.根据设定的规则自定生成,上下左右的方位,每次所经过的方向路径. 首先,创建一个RandomWalk()类和fill_walk()函数 random_wal ...

  4. 醉汉随机行走/随机漫步问题(Random Walk Randomized Algorithm Python)

    世界上有些问题看似是随机的(stochastic),没有规律可循,但很可能是人类还未发现和掌握这类事件的规律,所以说它们是随机发生的. 随机漫步(Random  Walk)是一种解决随机问题的方法,它 ...

  5. matplotlib之随机漫步

    # 随机漫步类 from random import choice from matplotlib import pyplot as plt from pylab import mpl from ma ...

  6. Python入门-随机漫步

    Python入门-随机漫步,贴代码吧,都在代码里面 代码1 class文件 random_walk.py from random import choice class RandomWalk(): # ...

  7. Numpy知识之随机散步实例

    类似于投硬币,简单的随机散步就是在前进一步(+1)和后退一步(-1)之间随机选择. 生成多个随机漫步. 并对多个随机漫步进行简单分析.

  8. 使用 numpy.random.choice随机采样

    使用 numpy.random.choice随机采样: 说明: numpy.random.choice(a, size=None, replace=True, p=None) 示例: >> ...

  9. 【Python】随机漫步

    创建Randomwalk()类 我们将使用Python来生成随机漫步数据,再使用matplotlib以引入瞩目的方式将这些数据呈现出来 首先创建类Randomwalk() from random im ...

随机推荐

  1. contest14 CF160div2 oooxx oooxx ooooo

    DE E : 排序条件不能加等于号, 不然会T

  2. Spring Boot进阶系列四

    这边文章主要实战如何使用Mybatis以及整合Redis缓存,数据第一次读取从数据库,后续的访问则从缓存中读取数据. 1.0 Mybatis MyBatis 是支持定制化 SQL.存储过程以及高级映射 ...

  3. Gamma阶段事后分析

    设想和目标 我们的软件要解决什么问题?是否定义得很清楚?是否对典型用户和典型场景有清晰的描述? 我们的软件要解决的是安卓游戏的自动化异常检测问题,定义的足够清楚,对于典型用户的描述和典型场景的描述也足 ...

  4. 修改ARP缓存表大小

    在下发Mininet的ARP缓存表表项时,出现了如下的错误信息: SIOCSARP: No buffer space available 这是由于ARP表是缓存在内存中的,超过了系统对ARP缓存表大小 ...

  5. vue报错: Class constructor FileManager cannot be invoked without 'new'.

    vue中报Class constructor FileManager cannot be invoked without 'new'.错处理: 原因:less 3.10 正式版报错 解决方法很简单,把 ...

  6. Docker 一步搞定 ZooKeeper 集群的搭建

    Docker 一步搞定 ZooKeeper 集群的搭建 背景 原来学习 ZK 时, 我是在本地搭建的伪集群, 虽然说使用起来没有什么问题, 但是总感觉部署起来有点麻烦. 刚好我发现了 ZK 已经有了 ...

  7. NOI-动规题目集锦

    162:Post Office 解题思路 #include<bits/stdc++.h> using namespace std; ],f[][],mi[][],i,j; int main ...

  8. adb和fastboot的使用

    1.前言 随着Android系统的普及,ADB(Android Debug Bridge)逐渐成了Android设备调试的必不可少的一种重要工具,该工具可以完成多种功能,例如跟踪系统日志,上传或下载文 ...

  9. 034 通过域名访问服务器或本地的图片资源---switchhost+nginx

    1.修改host解析 2.使用Nginx代理,实现域名访问 进入Nginx的安装路径E:\toolsoftware\nginx-1.14.0\nginx-1.14.0\conf,修改 添加如下内容: ...

  10. 发现一个企业微信第三方应用开发的疑似BUG

    1.企业微信两个账号A(超级管理员),账号B(分级管理员),账号B具有创建应用与小程序权限.2.账号B添加一个第三方应用后(创建后能看到第三方应用),使用下图接口登录时回调的agent一直为空,3.超 ...