Suppose you ask a bunch of users to rate a set of movies on a 0-100 scale. In classical factor analysis, you could then try to explain each movie and user in terms of a set of latent factors. For example, movies like Star Wars and Lord of the Rings might have strong associations with a latent science fiction and fantasy factor, and users who like Wall-E and Toy Story might have strong associations with a latent Pixar factor.

Restricted Boltzmann Machines essentially perform a binary version of factor analysis. (This is one way of thinking about RBMs; there are, of course, others, and lots of different ways to use RBMs, but I’ll adopt this approach for this post.) Instead of users rating a set of movies on a continuous scale, they simply tell you whether they like a movie or not, and the RBM will try to discover latent factors that can explain the activation of these movie choices.

More technically, a Restricted Boltzmann Machine is a stochastic neural network (neural network meaning we have neuron-like units whose binary activations depend on the neighbors they’re connected to; stochastic meaning these activations have a probabilistic element) consisting of:

One layer of visible units (users’ movie preferences whose states we know and set);
One layer of hidden units (the latent factors we try to learn); and
A bias unit (whose state is always on, and is a way of adjusting for the different inherent popularities of each movie).

Furthermore, each visible unit is connected to all the hidden units (this connection is undirected, so each hidden unit is also connected to all the visible units), and the bias unit is connected to all the visible units and all the hidden units. To make learning easier, we restrict the network so that no visible unit is connected to any other visible unit and no hidden unit is connected to any other hidden unit.

For example, suppose we have a set of six movies (Harry Potter, Avatar, LOTR 3, Gladiator, Titanic, and Glitter) and we ask users to tell us which ones they want to watch. If we want to learn two latent units underlying movie preferences – for example, two natural groups in our set of six movies appear to be SF/fantasy (containing Harry Potter, Avatar, and LOTR 3) and Oscar winners (containing LOTR 3, Gladiator, and Titanic), so we might hope that our latent units will correspond to these categories – then our RBM would look like the following:

(Note the resemblance to a factor analysis graphical model.)

State Activation

Restricted Boltzmann Machines, and neural networks in general, work by updating the states of some neurons given the states of others, so let’s talk about how the states of individual units change. Assuming we know the connection weights in our RBM (we’ll explain how to learn these below), to update the state of unit ii:

Compute the activation energy a_i=∑_jw_ijx_ja_i=∑_jw_ijx_j of unit ii, where the sum runs over all units jj that unit ii is connected to, w_ijw_ij is the weight of the connection between ii and jj, and x_jx_j is the 0 or 1 state of unit jj. In other words, all of unit ii’s neighbors send it a message, and we compute the sum of all these messages.
Let p_i=σ(a_i)p_i=σ(a_i), where σ(x)=1/(1+exp(−x))σ(x)=1/(1+exp(−x)) is the logistic function. Note that p_ip_i is close to 1 for large positive activation energies, and p_ip_i is close to 0 for negative activation energies.
We then turn unit ii on with probability p_ip_i, and turn it off with probability 1−p_i1−p_i.
(In layman’s terms, units that are positively connected to each other try to get each other to share the same state (i.e., be both on or off), while units that are negatively connected to each other are enemies that prefer to be in different states.)

For example, let’s suppose our two hidden units really do correspond to SF/fantasy and Oscar winners.

If Alice has told us her six binary preferences on our set of movies, we could then ask our RBM which of the hidden units her preferences activate (i.e., ask the RBM to explain her preferences in terms of latent factors). So the six movies send messages to the hidden units, telling them to update themselves. (Note that even if Alice has declared she wants to watch Harry Potter, Avatar, and LOTR 3, this doesn’t guarantee that the SF/fantasy hidden unit will turn on, but only that it will turn on with highprobability. This makes a bit of sense: in the real world, Alice wanting to watch all three of those movies makes us highly suspect she likes SF/fantasy in general, but there’s a small chance she wants to watch them for other reasons. Thus, the RBM allows us to generate models of people in the messy, real world.)
Conversely, if we know that one person likes SF/fantasy (so that the SF/fantasy unit is on), we can then ask the RBM which of the movie units that hidden unit turns on (i.e., ask the RBM to generate a set of movie recommendations). So the hidden units send messages to the movie units, telling them to update their states. (Again, note that the SF/fantasy unit being on doesn’t guarantee that we’ll always recommend all three of Harry Potter, Avatar, and LOTR 3 because, hey, not everyone who likes science fiction liked Avatar.)

Learning Weights

So how do we learn the connection weights in our network? Suppose we have a bunch of training examples, where each training example is a binary vector with six elements corresponding to a user’s movie preferences. Then for each epoch, do the following:

Take a training example (a set of six movie preferences). Set the states of the visible units to these preferences.
Next, update the states of the hidden units using the logistic activation rule described above: for the jjth hidden unit, compute its activation energy a_j=∑_iw_ijx_ia_j=∑_iw_ijx_i, and set x_jx_j to 1 with probability σ(a_j)σ(a_j) and to 0 with probability 1−σ(a_j)1−σ(a_j). Then for each edge e_ije_ij, compute Positive(e_ij)=x_i\*x_jPositive(e_ij)=x_i\*x_j (i.e., for each pair of units, measure whether they’re both on).
Now reconstruct the visible units in a similar manner: for each visible unit, compute its activation energy a_ia_i, and update its state. (Note that this reconstruction may not match the original preferences.) Then update the hidden units again, and computeNegative(e_ij)=x_i\*x_jNegative(e_ij)=x_i\*x_j for each edge.
Update the weight of each edge e_ije_ij by setting w_ij=w_ij+L\*(Positive(e_ij)−Negative(e_ij))w_ij=w_ij+L\*(Positive(e_ij)−Negative(e_ij)), where LL is a learning rate.
Repeat over all training examples.

Continue until the network converges (i.e., the error between the training examples and their reconstructions falls below some threshold) or we reach some maximum number of epochs.

Why does this update rule make sense? Note that

In the first phase, Positive(e_ij)Positive(e_ij) measures the association between the iith and jjth unit that we want the network to learn from our training examples;
In the “reconstruction” phase, where the RBM generates the states of visible units based on its hypotheses about the hidden units alone, Negative(e_ij)Negative(e_ij) measures the association that the network itself generates (or “daydreams” about) when no units are fixed to training data.

So by adding Positive(e_ij)−Negative(e_ij)Positive(e_ij)−Negative(e_ij) to each edge weight, we’re helping the network’s daydreams better match the reality of our training examples.

(You may hear this update rule called contrastive divergence, which is basically a funky term for “approximate gradient descent”.)

Examples

I wrote a simple RBM implementation in Python (the code is heavily commented, so take a look if you’re still a little fuzzy on how everything works), so let’s use it to walk through some examples.

First, I trained the RBM using some fake data.

Alice: (Harry Potter = 1, Avatar = 1, LOTR 3 = 1, Gladiator = 0, Titanic = 0, Glitter = 0). Big SF/fantasy fan.
Bob: (Harry Potter = 1, Avatar = 0, LOTR 3 = 1, Gladiator = 0, Titanic = 0, Glitter = 0). SF/fantasy fan, but doesn’t like Avatar.
Carol: (Harry Potter = 1, Avatar = 1, LOTR 3 = 1, Gladiator = 0, Titanic = 0, Glitter = 0). Big SF/fantasy fan.
David: (Harry Potter = 0, Avatar = 0, LOTR 3 = 1, Gladiator = 1, Titanic = 1, Glitter = 0). Big Oscar winners fan.
Eric: (Harry Potter = 0, Avatar = 0, LOTR 3 = 1, Gladiator = 1, Titanic = 1, Glitter = 0). Oscar winners fan, except for Titanic.
Fred: (Harry Potter = 0, Avatar = 0, LOTR 3 = 1, Gladiator = 1, Titanic = 1, Glitter = 0). Big Oscar winners fan.

The network learned the following weights:

                 Bias Unit       Hidden 1        Hidden 2

  Bias Unit       -0.08257658     -0.19041546      1.57007782

  Harry Potter    -0.82602559     -7.08986885      4.96606654

  Avatar          -1.84023877     -5.18354129      2.27197472

  LOTR 3           3.92321075      2.51720193      4.11061383

  Gladiator        0.10316995      6.74833901     -4.00505343

  Titanic         -0.97646029      3.25474524     -5.59606865

  Glitter         -4.44685751     -2.81563804     -2.91540988

Note that the first hidden unit seems to correspond to the Oscar winners, and the second hidden unit seems to correspond to the SF/fantasy movies, just as we were hoping.

What happens if we give the RBM a new user, George, who has (Harry Potter = 0, Avatar = 0, LOTR 3 = 0, Gladiator = 1, Titanic = 1, Glitter = 0) as his preferences? It turns the Oscar winners unit on (but not the SF/fantasy unit), correctly guessing that George probably likes movies that are Oscar winners.

What happens if we activate only the SF/fantasy unit, and run the RBM a bunch of different times? In my trials, it turned on Harry Potter, Avatar, and LOTR 3 three times; it turned on Avatar and LOTR 3, but not Harry Potter, once; and it turned on Harry Potter and LOTR 3, but not Avatar, twice. Note that, based on our training examples, these generated preferences do indeed match what we might expect real SF/fantasy fans want to watch.

Modifications

I tried to keep the connection-learning algorithm I described above pretty simple, so here are some modifications that often appear in practice:

Above, Negative(e_ij)Negative(e_ij) was determined by taking the product of the iith and jjth units after reconstructing the visible unitsonce and then updating the hidden units again. We could also take the product after some larger number of reconstructions (i.e., repeat updating the visible units, then the hidden units, then the visible units again, and so on); this is slower, but describes the network’s daydreams more accurately.
Instead of using Positive(e_ij)=x_i\*x_jPositive(e_ij)=x_i\*x_j, where x_ix_i and x_jx_j are binary 0 or 1 states, we could also let x_ix_i and/or x_jx_j be activation probabilities. Similarly for Negative(e_ij)Negative(e_ij).
We could penalize larger edge weights, in order to get a sparser or more regularized model.
When updating edge weights, we could use a momentum factor: we would add to each edge a weighted sum of the current step as described above (i.e., L\*(Positive(e_ij)−Negative(e_ij)L\*(Positive(e_ij)−Negative(e_ij)) and the step previously taken.
Instead of using only one training example in each epoch, we could use batches of examples in each epoch, and only update the network’s weights after passing through all the examples in the batch. This can speed up the learning by taking advantage of fast matrix-multiplication algorithms.

Further

If you’re interested in learning more about Restricted Boltzmann Machines, here are some good links.

A Practical guide to training restricted Boltzmann machines, by Geoffrey Hinton.
A talk by Andrew Ng on Unsupervised Feature Learning and Deep Learning.
Restricted Boltzmann Machines for Collaborative Filtering. I found this paper hard to read, but it’s an interesting application to the Netflix Prize.
Geometry of the Restricted Boltzmann Machine. A very readable introduction to RBMs, “starting with the observation that its Zariski closure is a Hadamard power of the first secant variety of the Segre variety of projective lines”. (I kid, I kid.)

from: http://blog.echen.me/2011/07/18/introduction-to-restricted-boltzmann-machines/

受限波兹曼机导论Introduction to Restricted Boltzmann Machines的更多相关文章

Introduction to Restricted Boltzmann Machines
转载,原贴地址:Introduction to Restricted Boltzmann Machines,by Edwin Chen, 2011/07/18. Suppose you ask a b ...
限制波尔兹曼机(Restricted Boltzmann Machines)
能量模型的概念从统计力学中得来,它描述着整个系统的某种状态,系统越有序,系统能量波动越小,趋近于平衡状态,系统越无序,能量波动越大.例如:一个孤立的物体,其内部各处的温度不尽相同,那么热就从温度较高的 ...
lecture12-玻尔兹曼机和受限玻尔兹曼机
这是Hinton的第12课,结合前一课可以知道RBM是来自BM,而BM是来自Hopfield的,因为水平有限,是直译的,虽然有时候会看不懂,但是好歹不会曲解原来的本意,看的话:1.先看ppt:2.通读 ...
受限玻尔兹曼机（RBM, Restricted Boltzmann machines）和深度信念网络（DBN, Deep Belief Networks）
受限玻尔兹曼机对于当今的非监督学习有一定的启发意义. 深度信念网络(DBN, Deep Belief Networks)于2006年由Geoffery Hinton提出.
受限玻尔兹曼机(Restricted Boltzmann Machine)
受限玻尔兹曼机(Restricted Boltzmann Machine) 作者:凯鲁嘎吉 - 博客园 http://www.cnblogs.com/kailugaji/ 1. 生成模型 2. 参数学 ...
透过统计力学，模拟软物质——EPJE专访2016年玻尔兹曼奖得主Daan Frenkel
原文来源:Eur. Phys. J. E (2016) 39: 68 2016年玻尔兹曼奖得主Daan Frenkel接受欧洲物理学报E专访,畅谈统计物理在交叉科学研究中的前所未有的重要性. 统计物理 ...
受限玻尔兹曼机(Restricted Boltzmann Machine, RBM) 简介
受限玻尔兹曼机(Restricted Boltzmann Machine,简称RBM)是由Hinton和Sejnowski于1986年提出的一种生成式随机神经网络(generative stochas ...
受限玻尔兹曼机（Restricted Boltzmann Machine，RBM）
这篇写的主要是翻译网上一篇关于受限玻尔兹曼机的tutorial,看了那篇博文之后感觉算法方面讲的很清楚,自己收获很大,这里写下来作为学习之用. 原文网址为:http://imonad.com/rbm/ ...
课程学习 - 人类疾病导论 | Introduction To Human Disease
完美人类假设:一类人,具有最完美的基因组,享受最健康的环境和饮食,同时拥有最健康的思想情绪,最终以最长的寿命,自然死亡. 自然死亡是自然生命最终的归宿,这是写在目前基因组里的铁律! 不管科技如何发展, ...

随机推荐

Thinkphp常用的方法和技巧（转）
2012年09月26日 Thinkphp 里一些常用方法和技巧的整理,包括常用的快捷键以及在程序开发时用到的一些实用方法,关于快捷键用得不是很熟练,总之,掌握这些方法和技巧,对于我们开发 thinkp ...
SetTimeOut jquery的作用
1. SetTimeOut() 1.1 SetTimeOut()语法例子 1.2 用SetTimeOut()执行Function 1.3 SetTimeOut()语法例子 1.4 设定条件使SetTi ...
快书包CEO徐智明反思：我犯下哪些错误
新浪科技刘璨 1月23日,快书包CEO徐智明在微博上公开“叫卖”快书包,在业内引起不小反响.这家创立于2010年要做“网上711”的创业公司,曾以独特的“一小时送达”服务在业内成为关注焦点. “如果 ...
2016 系统设计第一期（档案一）MVC 控制器接收表单数据
1.FormCollection collection user.UserId =Convert.ToInt32(collection["UserId"]); /// < ...
顺序容器：vector,deque,list
1.顺序容器:vector,deque,list 容器类共享公共接口,只要学会其中一种类型就能运用另一种类型.每种容器提供一组不同的时间和功能这种方案,通常不需要修改代码,秩序改变类型声明,每一种容器 ...
第一个Nodejs程序
我的第一个Nodejs程序:Hello World var http = require("http"); http.createServer(function(request, ...
微信诡异的 40029 不合法的oauth_code
最近几天在做微信公共平台开发,之前一切正常运行着,发布一套程序出去之后,发现时不时的报错! 小总结下问题出现原因:微信oauth2.0 接口说明第一步:用户同意授权,获取code 在确保微信公众账号 ...
python pip和easy_install使用方式(转载)
easy_install 跟 pip 都是Python 的套件管理程式,有了它们,在使用 Python 开发程式的时候会带来不少方便. easy_install 和pip 有什麼不一样?据 pip 官 ...
java笔记之类和对象
现在编程的思想分成了两大阵营,面向过程和面向对象.现在谈谈啥是面向对象. 作为一只单身狗,谈“对象”还是很伤心很伤心的(:′⌒`)...... 先看看百度怎么说? 好吧,百度说的太抽象,我换个简单的说 ...
SignalR发布后不能生成signalr/hubs
问题:代码写完后,在一台服务器上运行没有问题.换到另外一台服务器上,找不到signalr/hubs,显示404错误. SignalR版本:2.0.3 VS版本:2013 服务器:Windows Ser ...

受限波兹曼机导论Introduction to Restricted Boltzmann Machines