Machine Learning Trick of the Day (1): Replica Trick

'Tricks' of all sorts are used throughout machine learning, in both research and in production settings. These tricks allow us to address many different types of data analysis problems, being roughly of either an analytical, statistical, algorithmic, or numerical flavour. Today's trick is in the analytical class and comes to us from statistical physics: the popular Replica trick.

The replica trick [1][2][3] is used for analytical computation of log-normalising constants (or log-partition functions). More formally, the replica trick provides one of the tools needed for a replica analysis of a probabilistic model — a theoretical analysis of the the properties and expected behaviour of a model. Replica analysis has been used to provide an insight into almost all model classes popular in machine learning today, including linear (GLM) regressionlatent variable modelsmulti-layer neural networks, and Gaussian processes, amongst others.

We are often interested in making statements about the generalisation ability of our models; whereas approaches such as PAC learning provide for a worse-case analysis, replica analysis allows for statements in the average case that can be more useful, especially in verifying our numerical implementations. Replica analysis can also be used to provide insight into transitions that might occur during learning, to show how non-linearities within our models affect learning, and to study the effect of noise in the learning dynamics [4]. This post aims to provide a brief review of the replica trick, the steps typically involved in a replica analysis, and links to the many ways it has been used to provide insight into popular machine learning approaches.

Replica Trick

Consider a probabilistic model whose normalising constant is Z(x), for data x. The replica trick says that:

E[lnZ]=limn→01nlnE[Zn]

The left-hand side is often called the quenched free energy (free energy averaged over multiple data sets). The replica trick transforms the expectation of the log into the log of an expectation, with the result involving the nth-power of the normalising constant Z, i.e. the expectation is computed by replicating the normalising constant n times.

To see how we obtained this expression, we will exploit two useful identities.

1. Exponential identity: xn=exp(nlnx)=limn→01+nlnx

⇒lnx=limn→0xn−1n(=limn→0ddnxn)

The application of this identity is often referred to as the replica trick, since it is what allows us to rewrite the initial expectation in its replicated form.

2. Logarithmic identity:

ln(1+nx)≈nx, if nx≪1

We can use these two identities to show that:

nE[lnZ]=limn→0ln(1+nE[lnZ]) using (1)
E[lnZ]=limn→01nln(1+nE[lnZ])
=limn→01nln(1+nE[Zn]−1n) using (2) 
∴E[lnZ]=limn→01nlnE[Zn]

which is the identity we sought out. But this alone does not help us much since the partition function it is not any easier to compute. Part of the trick lies in performing the analysis assuming that n is an integer, and computing the integral n times, i.e. using nreplicas. To compute the limit we do a continuation to the real line (and hope that the result will be valid). This final step is hard to justify and is one of the critiques of replica analysis.

Replica Analysis

Given the replica trick, we can now conduct a replica analysis, also known as a Gardener analysis, of our model[1][5]. This typically involves the following steps:

  • Apply the replica trick to the integral problem (computation of average free energy).
  • Solving the unknown integral, typically by the saddle-point integration scheme (sometimes known as the method of steepest descent). This will not always be easy to do, but certain assumptions can help.
  • Perform an analytic continuation to determine the limit n→0. This step (and the previous one) involves the assumption (or ansatz) of the structure of the solution, with the typical ansatz known as replica symmetry. Replica symmetry assumes that the replicas are symmetric under permutation of their labels. This is reasonable if we work with models with i.i.d. data assumptions where the specific allocation of data to any of the replicas will not matter. More advanced analyses make use other assumptions.

Following these steps, some of the popular models for which you can see replica analysis in action are:

Summary

One trick we have available in machine learning for the theoretical analysis of our models is the replica trick. The replica trick, when combined with an analytic continuation that allows us to compute limits, and the saddle-point method for integration, allows us to perform a replica analysis. This analysis allows us to examine the generalisation ability of our models, study transitions that might occur during learning, understand how non-linearities within our models affect learning, and to study the effect of noise in the learning dynamics. Such analysis provides us with a deeper insight into our models and how to train them, and provides needed stepping stones on our path towards building ever more powerful machine learning systems.


Some References
[1] Andreas Engel, Christian Van den Broeck, Statistical mechanics of learning, , 2001
[2] Kevin John Sharp, Effective Bayesian inference for sparse factor analysis models, , 2011
[3] Manfred Opper, Statistical mechanics of learning: Generalization, The Handbook of Brain Theory and Neural Networks,, 1995
[4] HS Seung, Haim Sompolinsky, N Tishby, Statistical mechanics of learning from examples, Physical Review A, 1992
[5] Tommaso Castellani, Andrea Cavagna, Spin-glass theory for pedestrians, Journal of Statistical Mechanics: Theory and Experiment, 2005

Machine Learning Trick of the Day (1): Replica Trick的更多相关文章

  1. Machine Learning Trick of the Day (2): Gaussian Integral Trick

    Machine Learning Trick of the Day (2): Gaussian Integral Trick Today's trick, the Gaussian integral ...

  2. Kernel Functions for Machine Learning Applications

    In recent years, Kernel methods have received major attention, particularly due to the increased pop ...

  3. Advice for applying Machine Learning

    https://jmetzen.github.io/2015-01-29/ml_advice.html Advice for applying Machine Learning This post i ...

  4. Machine Learning for Developers

    Machine Learning for Developers Most developers these days have heard of machine learning, but when ...

  5. GoodReads: Machine Learning (Part 3)

    In the first installment of this series, we scraped reviews from Goodreads. In thesecond one, we per ...

  6. 学习笔记之Machine Learning Crash Course | Google Developers

    Machine Learning Crash Course  |  Google Developers https://developers.google.com/machine-learning/c ...

  7. How do I learn mathematics for machine learning?

    https://www.quora.com/How-do-I-learn-mathematics-for-machine-learning   How do I learn mathematics f ...

  8. 5 Techniques To Understand Machine Learning Algorithms Without the Background in Mathematics

    5 Techniques To Understand Machine Learning Algorithms Without the Background in Mathematics Where d ...

  9. Portal:Machine learning机器学习:门户

    Machine learning Machine learning is a scientific discipline that explores the construction and stud ...

随机推荐

  1. Beta 冲刺 (4/7)

    队名:日不落战队 安琪(队长) 过去两天完成了那些任务 完善已完成的界面. 接下来的任务 建立和上传收藏夹. 还剩下的任务 完善手写涂鸦. 遇到的困难 明天考试,准备通宵中. 有哪些收获和疑问 无. ...

  2. 深入理解Java类加载器(1)

    类加载器概述: java类的加载是由虚拟机来完成的,虚拟机把描述类的Class文件加载到内存,并对数据进行校验,解析和初始化,最终形成能被java虚拟机直接使用的java类型,这就是虚拟机的类加载机制 ...

  3. C语言以字符形式读写文件

    一.字符读取函数 fgetc (一).函数介绍 fgetc 是 file get char 的缩写,意思是从指定的文件中读取一个字符.函数原型为: int fgetc(FILE* fp) fp 为文件 ...

  4. Iaas

    IaaS(Infrastructure as a Service),即基础设施即服务. 消费者通过Internet 可以从完善的计算机基础设施获得服务.这类服务称为基础设施即服务.基于 Interne ...

  5. rabbitmq .erlang.cookie文件疑惑

    1.安装方式常见的rabbitmq安装方式有两种:rpm安装和二进制安装(编译安装). 2..erlang.cookie是什么.erlang.cookie是erlang实现分布式的必要文件,erlan ...

  6. 1.红黑树和自平衡二叉(查找)树区别 2.红黑树与B树的区别

    1.红黑树和自平衡二叉(查找)树区别 1.红黑树放弃了追求完全平衡,追求大致平衡,在与平衡二叉树的时间复杂度相差不大的情况下,保证每次插入最多只需要三次旋转就能达到平衡,实现起来也更为简单. 2.平衡 ...

  7. NOI前总结

    最近也就是天天考试,总结一下. 7.1 开场T1T2都是不可做的概率期望,只有T3看起来可做,于是怒干4h+,将题解里面的所有结论都推出来了,大模拟写的一点毛病都没有,可还是因为2-SAT掌握不熟结果 ...

  8. BZOJ3829 [Poi2014]FarmCraft 【树形dp】

    题目链接 BZOJ3829 题解 设\(f[i]\)为从\(i\)父亲进入\(i\)之前开始计时,\(i\)的子树中最晚装好的时间 同时记\(siz[i]\)为节点\(i\)子树大小的两倍,即为从父亲 ...

  9. BZOJ1495 [NOI2006]网络收费 【树形dp + 状压dp】

    题目链接 BZOJ1495 题解 观察表格,实际上就是分\(A\)多和\(B\)两种情况,分别对应每个点选\(A\)权值或者\(B\)权值,所以成对的权值可以分到每个点上 所以每个非叶节点实际对应一个 ...

  10. Java之链表实现栈结构

    package com.wzlove.stack; import java.util.Iterator; import java.util.NoSuchElementException; /** * ...