I learned A/B testing from a Youtube vedio. The link is https://www.youtube.com/watch?v=Bu7OqjYk0jM.

I will divide the note into two parts. The first part is generally an overview of hypothesis testing. Most concepts can be found in the article "Statistics Basics: Main Concepts in Hypothesis Testing" and I will focus on pratical applications here.

Actual

Predicted

T (H₁)

F (H₀)

T (H₁)

FP (α)

F (H₀)

FN (β)

　　P = TP/(TP+FN)

　　R = 1-β =TP/(TP+FN)

Python Example

Case #1: Alternate hypothesis is true

n = 100

p1 = 0.4

p2 = 0.6

# Compute distributions

x = np.arange(0,n+1)

pmf1 = stats.binom.pmf(x,n,p1)

pmf2 = stats.binom.pmf(x,n,p2)

plot(x,pmf1,pmf2)

We can find that the distributions between Coin 1 and Coin 2 are different. We check different values of m1 and m2.

# Example outcomes

m1, m2 = 40, 60

table = [[m1, n-m1], [m2, n-m2]]

chi2, pval, dof, expected = stats.chi2_contingency(table)

decision = 'reject H0' if pval<0.05 else 'accept H0'

print('{} ({})'.format(pval,decision))

0.007209570764742524 (reject H0)

# Example outcomes

m1, m2 = 43, 57

table = [[m1, n-m1], [m2, n-m2]]

chi2, pval, dof, expected = stats.chi2_contingency(table)

decision = 'reject H0' if pval<0.05 else 'accept H0'

print('{} ({})'.format(pval,decision))

0.06599205505934735 (accept H0)

In the secod example, m1 and m2 are not different significantly to reject H0.

Case #2: Null hypothesis is true

n = 100

p1 = 0.5

p2 = 0.5

# Compute distributions

x = np.arange(0,n+1)

pmf1 = stats.binom.pmf(x,n,p1)

pmf2 = stats.binom.pmf(x,n,p2)

plot(x,pmf1,pmf2)

In this case, two distributions overlap because we define the same value of p1 and p2.

# Example outcomes

m1, m2 = 49, 51

table = [[m1, n-m1], [m2, n-m2]]

chi2, pval, dof, expected = stats.chi2_contingency(table)

decision = 'reject H0' if pval<0.05 else 'accept H0'

print('{} ({})'.format(pval,decision))

0.887537083981715 (accept H0)

Actuall, we can only say that m1 and m2 are not different significantly to reject H0. It doesn't mean we should accept H0. The explanation is given in the previous article.

# Example outcomes

m1, m2 = 42, 58

table = [[m1, n-m1], [m2, n-m2]]

chi2, pval, dof, expected = stats.chi2_contingency(table)

decision = 'reject H0' if pval<0.05 else 'accept H0'

print('{} ({})'.format(pval,decision))

0.033894853524689295 (reject H0)

Firstly, calculate the sample size:

p1, p2 = 0.0500, 0.0515

alpha = 0.05

beta = 0.05

# Evaluate quantile function

p_bar = (p1+p2)/2.0

za = stats.norm.ppf(1-alpha/2) # Two-sided test

zb = stats.norm.ppf(1-beta)

# Compute correction factor

A = (za*np.sqrt(2*p_bar*(1-p_bar))+ zb*np.sqrt(p1*(1-p1)+p2*(1-p2)))**2

#Estimate samples required

n = A*(((1+np.sqrt(1+4*(p1-p2)/A)))/(2*(p1-p2)))**2

print (n) # we need 2n users

555118.7638311392

So for test and control combined we'll need at least 2n = 1.1 million users. This is where this stuff gets hard and you're trying to measure something that doesn't happen often which is usally the thing to care because if it's rare, it's usually valuable. When you're trying to change it, you usually can't change it much because if you can change it a lot then your business would be easier usually it's harder to change the thing you care the most about. So in that case, it's like the hardest case where a/b testing the most values are unchange.

Next we can perform a/b testing:

n = 555119

n_trials = 10000

# Simulate experimental results when null is true

control0 = stats.binom.rvs (n,p1,size = n_trials)

test0 = stats.binom.rvs(n, p1, size = n_trials) # Test and control are the same

tables0 = [[[a, n-a], [b, n-b]] for a, b in zip(control0, test0)]

results0 = [stats.chi2_contingency(T) for T in tables0]

decisions0 = [x[1] <= alpha for x in results0]

# Simulate experimental results when alternate is true

control1 = stats.binom.rvs (n,p1,size = n_trials)

test1 = stats.binom.rvs(n, p2, size = n_trials) # Test and control are the same

tables1 = [[[a, n-a], [b, n-b]] for a, b in zip(control1, test1)]

results1 = [stats.chi2_contingency(T) for T in tables1]

decisions1 = [x[1] <= alpha for x in results1]

# Compute false alarm and correct detection rates

alpha_est = sum(decisions0)/float(n_trials)

power_est = sum(decisions1)/float(n_trials)

print('Theoretical false alarm rate = {:0.4f}, '.format(alpha)+

     'empirical false alarm rate = {:0.4f}'.format(alpha_est))

print('Theoretical power = {:0.4f}, '.format(1-beta)+

     'empirical power = {:0.4f}'.format(power_est))

Theoretical false alarm rate = 0.0500, empirical false alarm rate = 0.0509

Theoretical power = 0.9500, empirical power = 0.9536

A/B Testing with Practice in Python (Part One)的更多相关文章

A/B Testing with Practice in Python (Part Two)
This is the second part of A/B testing notes, which contains the practical issues and alternatives o ...
[Python + Unit Testing] Write Your First Python Unit Test with pytest
In this lesson you will create a new project with a virtual environment and write your first unit te ...
Testing shell commands from Python
如何测试shell命令?最近,我遇到了一些情况,我想运行shell命令进行测试,Python称为万能胶水语言,一些自动化测试都可以完成,目前手头的工作都是用python完成的.但是无法从Python中 ...
[The Basics of Hacking and Penetration Testing] Learn & Practice
Remember to consturct your test environment. Kali Linux & Metasploitable2 & Windows XP
Automation Testing - Best Practice（书写规范）
Coding Standards Coding Standards are suggestions that will help us to write automation Scripts code ...
Machine and Deep Learning with Python
Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...
python安装locustio报错error: invalid command 'bdist_wheel'的解决方法
locust--scalable user load testing tool writen in Python(是用python写的.规模化.可扩展的测试性能的工具) 安装locustio需要的环境 ...
Python框架、库以及软件资源汇总
转自:http://developer.51cto.com/art/201507/483510.htm 很多来自世界各地的程序员不求回报的写代码为别人造轮子.贡献代码.开发框架.开放源代码使得分散在世 ...
Awesome Python
Awesome Python A curated list of awesome Python frameworks, libraries, software and resources. Insp ...

随机推荐

ACM二分搜索算法
二分搜索算法就是把要搜索的数据在搜索文本中根据情况进行折半,比如要在2 6 4 9 3 8 7 3 5中找到找到4的位置,那么可以考虑先把数据进行排序,然后把拍好后的数据的中间的那个数据和要查找的数据 ...
Oracle 自增写给自己的
首先咱先建一张表: CREATE TABLE example( ID Number(4) NOT NULL PRIMARY KEY, NAME VARCHAR(25), PHONE VARCHAR(1 ...
Python全栈 MySQL 数据库（引擎、事物、pymysql模块、orm）
ParisGabriel 每天坚持手写一天一篇决定坚持几年为了梦想为了信仰开局一张图存储引擎(处理表的处理器) 基本操作: ...
Python全栈 MySQL 数据库（表字段增、删、改、查、函数）
ParisGabriel 每天坚持手写一天一篇决定坚持几年为了梦想为了信仰开局一张图查询SQL变量 show variables 1.表字 ...
vim使用的一些积累
vi visual interfacevim vi improved vim模式:编辑模式(命令模式)输入模式末行模式编辑模式下,zz保存并退出移动光标:(编辑模式)1.逐字符移动 h 左 l 右 ...
201621123033 《Java程序设计》第7周学习总结
1. 本周学习总结 1.1 思维导图:Java图形界面总结 2.书面作业 1. GUI中的事件处理 1.1 写出事件处理模型中最重要的几个关键词. 事件源:事件发生的场所,具体指各个组件. 事件:组件 ...
【bzoj3438】小M的作物网络流最小割
原文地址:http://www.cnblogs.com/GXZlegend/p/6801522.html 题目描述小M在MC里开辟了两块巨大的耕地A和B(你可以认为容量是无穷),现在,小P有n中作物 ...
POJ 1039 Pipe | 线段相交
题目: 给一个管子,有很多转弯处,问从管口的射线射进去最长能射到多远题解: 根据黑书,可以证明的是这条光线一定经过了一个上顶点和下顶点所以我们枚举每对上下顶点就可以了 #include<cs ...
雅礼集训 Day5 T3 题解题报告
题题目背景由于出题人赶时间所以没办法编故事来作为背景. 题目描述一开始有\(n\)个苹果,\(m\)个人依次来吃苹果,第\(i\)个人会尝试吃\(u_i\)或\(v_i\)号苹果,具体来说分三种 ...
在线cron生成器
什么是cron表达式: cron表达式主要是用在Quartz框架中,Quartz是一个完全由java编写的开源作业调度框架,最主要的功能就是调度器(完成定时任务),可以与javaEE或者javaSE应 ...

A/B Testing with Practice in Python (Part One)

Python Example

Case #1: Alternate hypothesis is true

Case #2: Null hypothesis is true

A/B Testing with Practice in Python (Part One)的更多相关文章

随机推荐

热门专题