前两周的作业主要是关于Factor以及有向图的构造，但是概率图模型中还有一种更强大的武器——双向图（无向图、Markov Network）。与有向图不同，双向图可以描述两个var之间相互作用以及联系。描述的方式依旧是factor.本周的作业非常有实际意义——基于马尔科夫模型的图像文字识别系统(OCR)

　　图像文字识别系统(OCR)在人工智能中有着非常重要的应用。但是受到图像噪声，手写体变形，连笔等影响基于图像的文字识别系统比较复杂。PGM的重要作用就是解决那些测量过程复杂，测量结果不一定对，连续测量的情况（单次测量，前后比对，反复斟酌，寻找最优）。而英文文字往往由字母组成单词，所以适合利用概率图模型来进行建模。

概率图模型 OCR SLAM

单次测量——对单个字母的图像识别不准确；单次配准，转移矩阵求取不准确;

前后比对——结合单词字母组合规律；结合上一帧或前几帧的观测；

反复斟酌—— 图模型中更复杂的联系

寻找最优—— MAP估计

　　在文字识别系统中，文字的图像（var:I)总是被观测到，而所需要求得的字母(var(C)）总是无法被观测到。所以我们建模的是P(C|I)，此时的马尔科夫模型更为特殊，被称为条件随机场。

1、单次测量

　　在构建复杂的概率图模型之前，应该先从简单的入手。尽管单次不准，也应该先对单次观测进行推测。所以，对于给定图像，获取其与字母之间关系的factor是必要的。此时的图模型如图所示。

　　此时，每个字母都是单独的一个图，我们也只需要指定每个字母与图像之间的factor——phi(I,C).由于图像总是被观测到了，所以这个factor里的变量只有C。但是，对于每个不同的小图而言，factor的val是不一样的，因为val代表了var取card中每个值的概率。factor的var应为字母的序号。card=26代表var的取值范围。val则由computeImageFactor给出。

　　由于图片不同会导致var的取值分布不同，所以这不能像之前那样构造好一个factor然后批量复制，而需要单独计算。在factor.val的计算中，使用了以下函数

 function P = ComputeImageFactor (img, imgModel)

 % This function computes the singleton OCR factor values for a single

 % image.

 %

 % Input:

 %   img: The 16x8 matrix of the image

 %   imgModel: The provided, trained image model

 %

 % Output:

 %   P: A K-by-1 array of the factor values for each of the K possible

 %     character assignments to the given image

 %

 % Copyright (C) Daphne Koller, Stanford University, 2012

 X = img(:);

 N = length(X);

 K = imgModel.K;

 theta = reshape(imgModel.params(1:N*(K-1)), K-1, N);

 bias  = reshape(imgModel.params((1+N*(K-1)):end), K-1, 1);

 W = [ bsxfun(@plus, theta * X, bias) ; 0 ];

 W = bsxfun(@minus, W, max(W));

 W = exp(W);

 P=bsxfun(@rdivide, W, sum(W));

 end

　　函数中最重要的信息被藏在了imageModel.params里，点开params发现是一个3225×1的向量，而结合reshape指令来看，这个向量是图像像素点的值与字母一一对应的权重。最后利用sigmoid函数将此权重转换成了factor.val。故此factor的代码如下所示：

 function factors = ComputeSingletonFactors (images, imageModel)

 % This function computes the single OCR factors for all of the images in a

 % word.

 %

 % Input:

 %   images: An array of structs containing the 'img' value for each

 %     character in the word. You could, for example, pass in allWords{1} to

 %     use the first word of the provided dataset.

 %   imageModel: The provided OCR image model.

 %

 % Output:

 %   factors: An array of the OCR factors, one for every character in the

 %   image.

 %

 % Hint: You will want to use ComputeImageFactor.m when computing the 'val'

 % entry for each factor.

 %

 % Copyright (C) Daphne Koller, Stanford University, 2012

 % The number of characters in the word

 n = length(images);

 % Preallocate the array of factors

 factors = repmat(struct('var', [], 'card', [], 'val', []), n, 1);

 % Your code here:

 for i = 1:n

     factors(i).var = i;

     factors(i).card = imageModel.K;

     factors(i).val = ComputeImageFactor(images(i).img,imageModel);

 end

 end

　　显然对于一个9字母的单词而言，运行此代码可构建PGM，所谓的PGM实际上是一系列factor.

　　如此简单的模型同样可以进行有效的推断。推断通过调用预编译好的c代码实现，此时的识别结果为：字母识别率77%，单词识别率22%.显然此时的识别成功率基本完全由训练好的params决定。仅仅是图片与字母的一一对应。

2、前后比对

2.1 相邻字母

　　显然对于英文单次而言，相邻字母的组合也是有一定先验信息的。比如q后面接h的概率要小于q后面接u的概率。这种关系对推测是有益的。此时的图模型如下所示：

　　显然与之前相比，此时的图模型需要考虑相邻字母之间的关系（factor），此时factor的var应该有两个，且应该是相邻的，如：1 2；2 3；3 4...而每个var的card依旧是[26 26]，一幅图中var一旦确定了，有了唯一的编号，那么card是不可以改变的。两个相邻字母的factor.val规模非常庞大了，应该为26*26 = 676. 但是此时的factor.val与图像观测值并没有关系，它只是var之间的一种联系。也就是说，此时的factor是可以复制的。只需要改变var,其他的值都应该是一样的。factor的计算如下：

 function factors = ComputePairwiseFactors (images, pairwiseModel, K)

 % This function computes the pairwise factors for one word and uses the

 % given pairwise model to set the factor values.

 %

 % Input:

 %   images: An array of structs containing the 'img' value for each

 %     character in the word.

 %   pairwiseModel: The provided pairwise model. It is a K-by-K matrix. For

 %     character i followed by character j, the factor value should be

 %     pairwiseModel(i, j).

 %   K: The alphabet size (accessible in imageModel.K for the provided

 %     imageModel).

 %

 % Output:

 %   factors: The pairwise factors for this word.

 %

 % Copyright (C) Daphne Koller, Stanford University, 2012

 n = length(images);

 % If there are fewer than 2 characters, return an empty factor list.

 if (n < 2)

     factors = [];

     return;

 end

 val_ = reshape(pairwiseModel,K*K,1);

 factors = repmat(struct('var', [], 'card', [K K], 'val', val_), n - 1, 1);

 % Your code here:

 for i = 1 : n-1

     factors(i).var =[i,i+1];

 end

 end

　　显然，这里的关键val，又是由神秘参数pairwiseModel决定的。pairwiseModel实际上是一个26×26的矩阵，它指定了两个字母相邻的可能性。点开发现里面有很多0,即代表两个字母几乎不可能相邻。此模型可由字典统计获得。实际上，这里置0是一件挺危险的事情，这里可以这么做是因为足够自信。

　　如果使用相邻字母构建PGM，则又可以得到一些factor.如下所示：

　　此时的识别结果为：字母识别率79.16%，单词识别率26%，显然，单词识别率获得了较大的提升。

2.2 三字母组合

　　考虑到双字母组合可以对识别率提升较大，那么3字母组合也应该可以提升识别率。考虑一个单词的i,i+1,i+2个字母之间的关系，则图模型如下所示：

　　显然，我们需要做的工作是继续增加factor —— phi(i,i+1,i+2）,此factor的var为i,i+1,i+2,card为26 26 26. 剩下最重要的val依旧由神秘的数字决定。然后，val一共需要26*26*26=17576个值来决定。显然我们针对每个组合均设计一个val，哪怕是穷尽字典也需要大量的运算。所以，我们只针对2000个常用的组合（ing,ght.....)给予较高的权重（大于1），而其他组合则赋予1（不管之）。此时特征的稀疏性表现的更加明显了。factor的计算代码如下：

 function factors = ComputeTripletFactors (images, tripletList, K)

 % images = allWords{1};

 % K = 26;

 % This function computes the triplet factor values for one word.

 %

 % Input:

 %   images: An array of structs containing the 'img' value for each

 %     character in the word.

 %   tripletList: An array of the character triplets we will consider (other

 %     factor values should be 1). tripletList(i).chars gives character

 %     assignment, and triplistList(i).factorVal gives the value for that

 %     entry in the factor table.

 %   K: The alphabet size (accessible in imageModel.K for the provided

 %     imageModel).

 %

 % Hint: Every character triple in the word will use the same 'val' table.

 %   Consider computing that array once and then resusing for each factor.

 %

 % Copyright (C) Daphne Koller, Stanford University, 2012

 n = length(images);

 % If the word has fewer than three characters, then return an empty list.

 if (n < 3)

     factors = [];

     return

 end

 val_init = ones(K*K*K,1);

 num_zhiding = length(tripletList);

 for i = 1:num_zhiding

     triplet_i = tripletList(i);

     assign_ = triplet_i.chars;

     index_  = AssignmentToIndex(assign_,[K K K]);

     val_init(index_) = triplet_i.factorVal;

 end

 factors = repmat(struct('var', [], 'card', [K K K], 'val', [val_init]), n - 2, 1);

 % Your code here:

 for i = 1: n-2

    factors(i).var = [i i+1 i+2];

 end

 end

　　此时又可以得到一些factor，如下：

　　增加了这些factor之后，字母的识别率上升为80.3%，单词识别率上升为24%.显然，上升的速度放缓了，由于三字母的组合在很多单词中并不容易碰见，故对于识别率的提升效果有限。

3、反复斟酌

　　显然相邻信息已经无法满足效果提升的要求了，所以我们需要寻找更多有用的信息并将其带入PGM中。对于常见的手写体来说，人们对于同一个字母的书写总是相似的。也就是说，同一个单词中，字母之间应该两两存在联系，如果其观测值（图片）相似，则这两个字母有很大的可能性是相同的。此时的概率图如图所示：

　　节点与节点之间是两两相连的（图中为了查看方便，只连了第一个节点与其他节点）。显然，1 2；1 3;本就相连，此线条不是重复了么？实际上不是的，之前相连的factor所表示的是相邻信息，而此时相连的factor需要表示两幅图的相似程度。此factor的本质应该是phi(C1,C2,I1,I2)，但是由于I1,I2被观测到了，所以var仅为C1,C2，card 依旧为[26 26]——card表示随机变量的取值范围，而不是随机变量序号的取值范围，且要与之前对应。val则应该由两幅图的相似程度决定。所以这又是一个不能直接复制的factor，因为其与观测值有关。

　　此factor由以下程序给出：

 function factor = ComputeSimilarityFactor (images, K, i, j)

 % This function computes the similarity factor between two character images

 % in one word --- which characters is given by indices i and j (a

 % description of how the factor should be computed is given below).

 %

 % Input:

 %   images: A struct array of character images from one word.

 %   K: The alphabet size.

 %   i,j: The scope of that factor. That is, you should construct a factor

 %     between characters i and j in the images array.

 %

 % Output:

 %   factor: The similarity factor between these two characters. For any

 %     assignment C_i != C_j, the factor value should be one. For any

 %     assignment C_i == C_j, the factor value should be

 %     ImageSimilarity(I_i, I_j) --- ie, the computed value given by

 %     ImageSimilarity.m on the two images.

 %

 % Copyright (C) Daphne Koller, Stanford University, 2012

 factor = struct('var', [i j], 'card', [K K], 'val',ones(K*K,1));

 for i_ = 1:K

     for j_ = 1:K

         indx_ = AssignmentToIndex([i_ j_],[K K]);

         if(i_ == j_)

             factor.val(indx_) = ImageSimilarity(images(i).img,images(j).img);

         end

     end

 end

 % Your code here:

 end

　　其中，ImageSimilarity 计算的是两幅图的相似程度，利用两幅图向量化后夹角的余弦进行量化。

　　显然，在此factor不可直接复制的情况下，我们还需要生成整幅图所有的factors.由以下程序给出：

 function factors = ComputeAllSimilarityFactors (images, K)

 % This function computes all of the similarity factors for the images in

 % one word.

 %

 % Input:

 %   images: An array of structs containing the 'img' value for each

 %     character in the word.

 %   K: The alphabet size (accessible in imageModel.K for the provided

 %     imageModel).

 %

 % Output:

 %   factors: Every similarity factor in the word. You should use

 %     ComputeSimilarityFactor to compute these.

 %

 % Copyright (C) Daphne Koller, Stanford University, 2012

 n = length(images);

 nFactors = nchoosek (n, 2);

 factors = repmat(struct('var', [], 'card', [K K], 'val', []), nFactors, 1);

 % Your code here:

 num_factor_ =1;

 for i = 1:n

     for j = i+1:n

         factors(num_factor_) = ComputeSimilarityFactor(images,K,i,j);

         num_factor_ = num_factor_+1;

     end

 end

 end

　　但是值得注意的是，增加此factors后，其图模型增加了如下变量：

　　注意此图并未截全。如果单词较长（9字母）的情况下，factors会剧烈增长，这会给图模型的推断带来极大的计算困难。然后很多情况下，单词中重复的字母是少数的，考虑2组重复字母足以应对大部分单词。所以为了降低计算难度，我们将图像相似的factors减小，仅保留最相似（factor.val最大）的两组。所使用的代码如下：　　

 function factors = ChooseTopSimilarityFactors (allFactors, F)

 % This function chooses the similarity factors with the highest similarity

 % out of all the possibilities.

 %

 % Input:

 %   allFactors: An array of all the similarity factors.

 %   F: The number of factors to select.

 %

 % Output:

 %   factors: The F factors out of allFactors for which the similarity score

 %     is highest.

 %

 % Hint: Recall that the similarity score for two images will be in every

 %   factor table entry (for those two images' factor) where they are

 %   assigned the same character value.

 %

 % Copyright (C) Daphne Koller, Stanford University, 2012

 % If there are fewer than F factors total, just return all of them.

 if (length(allFactors) <= F)

     factors = allFactors;

     return;

 end

 % Your code here:

 n_factors = length(allFactors);

 n_img = max(allFactors(n_factors).var);

 start_  = n_factors-nchoosek(n_img,2)+1;

 Similarity =[];

 for i = start_ : n_factors

     Similarity =[Similarity;i max(allFactors(i).val)];

 end

 Similarity_paixu = sortrows(Similarity,-2);

 factors_to_keep = Similarity_paixu(1:F,1);

 factors_to_remove = setdiff(start_:n_factors,factors_to_keep);

 allFactors(factors_to_remove,:)=[];

 factors = allFactors; %%% REMOVE THIS LINE

 end

　　最终，PGM的factors如下所示：

　　利用推断算法对此模型计算，可以求得：文字识别率81.6%，单词识别率37%.相比于单纯的图片-文字识别，识别率提高了近一倍！！！概率图模型的效果是显著的。

4、总结

　　最后，大家肯定好奇文字图片到底是啥，如下：

　　识别结果如下：

　　所有代码请点这里

机器学习 —— 概率图模型（Homework: Representation）的更多相关文章

机器学习 —— 概率图模型（Homework: CRF Learning）
概率图模型的作业越往后变得越来越有趣了.当然,难度也是指数级别的上涨啊,以至于我用了两个周末才完成秋名山神秘车牌的寻找,啊不,CRF模型的训练. 条件随机场是一种强大的PGM,其可以对各种特征进行建模 ...
机器学习 —— 概率图模型（Homework: MCMC）
除了精确推理之外,我们还有非精确推理的手段来对概率图单个变量的分布进行求解.在很多情况下,概率图无法简化成团树,或者简化成团树后单个团中随机变量数目较多,会导致团树标定的效率低下.以图像分割为例,如果 ...
机器学习 —— 概率图模型（Homework: Exact Inference）
在前三周的作业中,我构造了概率图模型并调用第三方的求解器对器进行了求解,最终获得了每个随机变量的分布(有向图),最大后验分布(双向图).本周作业的主要内容就是自行编写概率图模型的求解器.实际上,从根本 ...
机器学习 —— 概率图模型（Homework: StructuredCPD）
Week2的作业主要是关于概率图模型的构造,主要任务可以分为两个部分:1.构造CPD;2.构造Graph.对于有向图而言,在获得单个节点的CPD之后就可依据图对Combine CPD进行构造.在获得C ...
机器学习 —— 概率图模型（Homework: Factors）
Talk is cheap, I show you the code 第一章的作业主要是关于PGM的因子操作.实际上,因子是整个概率图的核心.对于有向图而言,因子对应的是CPD(条件分布):对无向图而 ...
机器学习 —— 概率图模型（学习：CRF与MRF）
在概率图模型中,有一类很重要的模型称为条件随机场.这种模型广泛的应用于标签—样本(特征)对应问题.与MRF不同,CRF计算的是“条件概率”.故其表达式与MRF在分母上是不一样的. 如图所示,CRF只对 ...
机器学习 —— 概率图模型（Homework: Structure Learning）
概率图的学习真的要接近尾声了啊,了解的越多越发感受到它的强大.这周的作业本质上是data mining.从数据中学习PGM的结构和参数,完全使用数据驱动 —— No structure, No par ...
机器学习 —— 概率图模型（CPD）
CPD是conditional probability distribution的缩写,翻译成中文叫做条件概率分布.在概率图中,条件概率分布是一个非常重要的概念.因为概率图研究的是随机变量之间的练习 ...
机器学习 —— 概率图模型（推理：MAP）
MAP 是最大后验概率的缩写.后验概率指的是当有一定观测结果的情况下,对其他随机变量进行推理.假设随机变量的集合为X ,观察到的变量为 e, W = X-e , AP = P(W|e). 后验概率和联 ...

随机推荐

XML和HTML常用转义字符
XML和HTML中都有一些特殊的字符,这些字符在XML和HTML中是不能直接使用的,如果必须使用这些字符,应该使用其对应的转义字符. XML常用转义字符: 字符转义字符描述 & & ...
ArcGIS操作问题
1.利用分析工具——叠加分析——“空间连接”工具,将完全包含(COMPLETELY_CONTAINS)某点的面的属性值赋为该点的属性值. 其中定义用于匹配行的条件.匹配选项包括: 相交—如果连接要素与 ...
myeclipse配置下tomcat debug启动很无比慢
myeclipse配置下tomcat debug启动很无比慢,而run启动很快今天照常使用MyEclipse 6.5 Blue Edition进行开发,但是却遇到一个怪问题.在MyEclipse环境下 ...
Ubuntu下Apache+php+mysql网站架设详解
目录 1 基础 2 安装 2.1 安装LAMP 2.2 图形化管理软件(可选) 2.2.1 安装webmin 2.2.2 安装phpmyadmin 3 配置文件路径 3.1 常用命令 3.2 配置ap ...
Struts 2简单配置分析
要配置Struts 2,首先先要有Struts 2的Jar包,可以去Struts的官网下载(http://struts.apache.org/),这里有3个GA版本可以选择下载,我选择的是最新的2.2 ...
Linux进程操作信息
Linux进程操作简单小结 linux上进程有5种状态: 1. 运行(正在运行或在运行队列中等待) 2. 中断(休眠中, 受阻, 在等待某个条件的形成或接受到信号) 3. 不可中断(收到信号不唤醒和不 ...
JAVA类与对象（一）----基础概念理解
面向对象基本概念面向对象是一种新兴的程序设计方法,或者说是一种新的程序设计规范,其基本思想是使用对象.类.继承.封装.消息等基本概念来进行程序设计.它是从现实世界客观存在的事物(即对象)出发来构造软 ...
如何自定义Liferay 7 portal的Log in登录界面
前提: 1. Liferay portal 7 2. Liferay IDE 3.0.1 Liferay现有的工具中提供了很多修改portal的模板,以满足开发者的各种自定义需求. 修改的原理是利用M ...
Asp.net 同时下载多个文件
整理自网络下载思路是首先把多个文件进行压缩,然后再下载压缩成的压缩包引用文件dll:ICSharpCode.SharpZipLib.dll 1. 合成下载文件夹 Protected Sub btn ...
小明很喜欢数学,有一天他在做数学作业时,要求计算出9~16的和,他马上就写出了正确答案是100。但是他并不满足于此,他在想究竟有多少种连续的正数序列的和为100(至少包括两个数)。没多久,他就得到另一组连续正数和为100的序列:18,19,20,21,22。现在把问题交给你,你能不能也很快的找出所有和为S的连续正数序列? Good Luck!
// test20.cpp : 定义控制台应用程序的入口点. // #include "stdafx.h" #include<iostream> #include< ...

机器学习 —— 概率图模型（Homework: Representation）