Machine Learning Trick of the Day (2): Gaussian Integral Trick
Machine Learning Trick of the Day (2): Gaussian Integral Trick
Today's trick, the Gaussian integral trick, is one that allows us to re-express a (potentially troublesome) function in an alternative form, in particular, as an integral of a Gaussian against another function — integrals against a Gaussian turn out not to be too troublesome and can provide many statistical and computational benefits. One popular setting where we can exploit such an alternative representation is for inference in discrete undirected graphical models (think Boltzmann machines or discrete Markov random fields). In such cases, this trick lets us transform our discrete problem into one that has an underlying continuous (Gaussian) representation, which we can then solve using our other machine learning tricks. But this is part of a more general strategy that is used throughout machine learning, whether in Bayesian posterior analysis, deep learning or kernel machines. This trick has many facets, and this post explores the Gaussian integral trick and its more general form, auxiliary variable augmentation.
Gaussian integral trick state expansion.
Gaussian Integral Trick
The Gaussian integral trick is one of the statistical flavour and allows us to turn a function that is an exponential in x2 into an exponential that is linear in x. We do this by augmenting a linear function with auxiliary variables and then integrating over these auxiliary variables, hence a form of auxiliary variable augmentation. The simplest form of this trick is to apply the following identity:
We can prove this to ourselves by exploiting our knowledge of Gaussian distributions (which this looks strikingly similar to) and our ability to complete the square when we see such quadratic forms. Separating out the scaling factor a we get:
Which by completing the square becomes:
where the last integral is solved by matching it to a Gaussian with mean μ=x2a and variance σ2=12a, which we know has a normalisation of 2πσ2−−−−√ — this last step shows how this trick got its name.
The 'Gaussian integral trick' was coined and initially described by Hertz et al. [Ch10, pg 253] [1], and is closely related to the Hubbard-Stratonovich transform (which provides the augmentation for exp(−x2)).
Transforming Binary MRFs
This trick is also valid in the multivariate case, which is what we will most often be interested in. One good place to see this trick in action is when applied to binary MRFs or Boltzmann machines. Binary MRFs have a joint probability, for binary random variables x:
where Z, is the normalising constant. The (multivariate) Gaussian integral trick can be applied to the quadratic term in this energy function allowing for an insightful analysis andinteresting reparameterisation that allows for alternative inference methods to be used. For example:
- We can conduct an analysis of Boltzmann machines that when combined with our earlier trick, (trick 1) the replica trick, allows for theoretical predictions about the performance of this model. See:
- Formal statistical Mechanics of neural networks, section 10.1 (eq 10.5), Hertz et al. [1]
- We can use the trick to create a Gaussian augmented space for discrete MRFs to which Hamiltonian Monte Carlo, previously restricted to continuous and differentiable models, can be applied [2][3]. See:
- Continuous Relaxations for Discrete Hamiltonian Monte Carlo, Zhang et al.
- Auxiliary-variable Exact Hamiltonian Monte Carlo Samplers for Binary Distributions. Pakman and Paninski.
Variable Augmentation
Graphical model for a general augmentation.
This trick is a special case of a more general strategy called variable (or data) augmentation — I prefervariable augmentation to data augmentation [4], since it will not be confused with observed data preprocessing and manipulation. In this setting, the introduction of auxiliary variables has been most often used to develop better mixing Markov chain Monte Carlo samplers. This is because after augmentation, the conditional distributions of the model often have highly convenient and easy-to-sample-from forms.
One recent example of variable augmentation (and that parallels our initial trick) is the Polya-Gamma variable augmentation. In this case, we can express the sigmoid function that appears when computing the mean of the Bernoulli distribution, as:
where p(y) has a Polya-Gamma distribution [5]. This nicely transforms the sigmoid into a Gaussian convolution (integrated against a Polya-Gamma random variable) — and gives us a different type of Gaussian integral trick. In fact, similar Gaussian integral tricks are abound, and are typically described under the heading of Gaussian scale-mixture distributions.
There are many examples of variable augmentation to be found, especially for binary and categorical distributions. Much guidance is available, and some papers that demonstrate this are:
- Albert and Chib's paper is one of the first where the concept of data augmentation is most clearly established, and to whom data augmentation is most often attributed. It shows augmentation for binary and categorical variables — a classic paper that everyone should read.
- Polson and Scott introduced the Polya-Gamma augmentation I described above, and is amongst the more recent of augmentation strategies. This augmentation can be used for more effective Monte Carlo or variational inference, e.g.,
- Ultimately, finding a good augmentation relies on exploiting known and tractable integrals. As such there can be a bit of an art to creating such augmentations and is what Van Dyk and Meng discuss.
Summary
The Gaussian integral trick is just one from a large class of variable augmentation strategies that are widely used in statistics and machine learning. They work by introducing auxiliary variables into our problems that induce an alternative representation, and that then give us additional statistical and computational benefits. Such methods lie at the heart of efficient inference algorithms, whether these be Monte Carlo or deterministic approximate inference schemes, making variable augmentation a favourite in our box of machine learning tricks.
Some References
| [1] | John Hertz, Anders Krogh, Richard G Palmer, Introduction to the theory of neural computation, , 1991 |
| [2] | Yichuan Zhang, Zoubin Ghahramani, Amos J Storkey, Charles A Sutton, Continuous relaxations for discrete hamiltonian monte carlo, Advances in Neural Information Processing Systems, 2012 |
| [3] | Ari Pakman, Liam Paninski, Auxiliary-variable exact Hamiltonian Monte Carlo samplers for binary distributions, Advances in Neural Information Processing Systems, 2013 |
| [4] | James H Albert, Siddhartha Chib, Bayesian analysis of binary and polychotomous response data, Journal of the American statistical Association, 1993 |
| [5] | Nicholas G Polson, James G Scott, Jesse Windle, Bayesian inference for logistic models using P\'olya--Gamma latent variables, Journal of the American Statistical Association, 2013 |
Machine Learning Trick of the Day (2): Gaussian Integral Trick的更多相关文章
- How do I learn machine learning?
https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644 How Can I Learn X? ...
- Machine Learning Trick of the Day (1): Replica Trick
Machine Learning Trick of the Day (1): Replica Trick 'Tricks' of all sorts are used throughout machi ...
- Kernel Functions for Machine Learning Applications
In recent years, Kernel methods have received major attention, particularly due to the increased pop ...
- Machine Learning for Developers
Machine Learning for Developers Most developers these days have heard of machine learning, but when ...
- 学习笔记之Machine Learning Crash Course | Google Developers
Machine Learning Crash Course | Google Developers https://developers.google.com/machine-learning/c ...
- How do I learn mathematics for machine learning?
https://www.quora.com/How-do-I-learn-mathematics-for-machine-learning How do I learn mathematics f ...
- Machine Learning and Data Mining(机器学习与数据挖掘)
Problems[show] Classification Clustering Regression Anomaly detection Association rules Reinforcemen ...
- [C2P1] Andrew Ng - Machine Learning
About this Course Machine learning is the science of getting computers to act without being explicit ...
- 【机器学习Machine Learning】资料大全
昨天总结了深度学习的资料,今天把机器学习的资料也总结一下(友情提示:有些网站需要"科学上网"^_^) 推荐几本好书: 1.Pattern Recognition and Machi ...
随机推荐
- OcLint的使用
一.介绍 OCLint是一个强大的静态代码分析工具,可以用来提高代码质量,查找潜在的bug,主要针对c,c++和Objective-c的静态分析.功能非常强大,而且是出自国人之手.项目地址:http: ...
- 约跑APP测试
项目名:约跑APP 用户需求规格说明书URL:http://www.cnblogs.com/liquan/p/6071804.html 组长博客URL:http://www.cnblogs.com/l ...
- 关于对i++,++i的理解
i++,代表 先赋值,在加:++i,代表先自加再赋值:后台console例子中可以看到第一个例子:var a= i++; i是等于1的:先赋值,所以打印出a =1的:而i++后为2:所以打印出a = ...
- call()方法和apply()方法
最近又遇到了JacvaScript中的call()方法和apply()方法,而在某些时候这两个方法还确实是十分重要的,那么就让我总结这两个方法的使用和区别吧. 1. 每个函数都包含两个非继承而来的方法 ...
- [转帖]MBR与UEFI
从Intel 6系列主板之后,就开始提供UEFI BIOS支持,正式支持GPT硬盘分区表,一举取代了此前的MBR分区表格式,不过为了保持对老平台的兼容,微软即使最新的Windows 10系统也继续提供 ...
- [转帖]华为Hi 1620 等ARM 服务器版本CPU信息.
华为ARM服务器恐依赖党政输血续命 一旦制裁立马休克 http://www.sohu.com/a/240833070_99934330 几年前,ARM服务器被业界炒的火热,AMD.高通.Marvell ...
- Redis(一) 安装
选择在Linux下安装redis,现在采用虚拟机安装的centos7 进行安装的 1.安装gcc yum install gcc-c++ 2.下载redis安装包,在root目录下执行 wget ht ...
- UVA 12633 Super Rooks on Chessboard(FFT)
题意: 给你一个R*C的棋盘,棋盘上的棋子会攻击,一个棋子会覆盖它所在的行,它所在的列,和它所在的从左上到右下的对角线,那么问这个棋盘上没有被覆盖的棋盘格子数.数据范围R,C,N<=50000 ...
- CF992C Nastya and a Wardrobe
我是题面 题意很清晰,这种题,我们当然还是有两种方法来做啦 方法一:找规律 读完题我们来看样例,通过样例一已我们大概可以看出,答案或许是\(n*2^{k+1}\) 肯定不能这么简单对吧,那就来看样例二 ...
- Machine Learning CodeForces - 940F(带修改的莫队)
题解原文地址:https://www.cnblogs.com/lujiaju6555/p/8468709.html 给数组a,有两种操作,1 l r查询[l,r]中每个数出现次数的mex,注意是出现次 ...