Reference: https://www.cs.swarthmore.edu/~meeden/cs81/s10/BackPropDeriv.pdf

  I spent nearly one hour to deduce the vector form of the back propagation. Just in case that I may forget, but need to utilize them, I will write down all the formula here to make a backup.

Structure:

  Standard BP Network with $\displaystyle \lambda$ hidden layers, one input layer and one output layer.

  Activation function: sigmoid.

Notations:

$\displaystyle W^{i+1,i}$, denotes the weight matrix connecting from $i$th layer to $i+1$th layer.

$\displaystyle N^i$, denotes the net input of the $i$th layer.

$\displaystyle A^i$, denotes the activation input of the $i$th layer.

$\displaystyle \delta ^i$, denotes the error of the $i$th layer.

$\displaystyle \epsilon$, denotes the learning rate.

*, stands for element by element multiplication.

(omit), stands for matrix multiplication.

  Specifically,

$\displaystyle X$, denotes the input layer, while equals $\displaystyle A^0$.

$\displaystyle A^{\lambda + 1}$, denotes the output layer.

$\displaystyle Y$, denotes the expected output.

Propagations:

  Forward:

$\displaystyle N^i = W^{i,i-1}A^{i-1}$.

$\displaystyle A^i = \frac{1}{1+e^{-N^i}}$.

  Backward:

$\displaystyle \Delta W^{i+1,i} = \epsilon \delta^{i+1}(A^{i})^{T}$.

$\displaystyle \delta ^i = ((\delta^{i+1})^{T}W^{i+1,i})^{T}*A^{i}*(1-A^{i})$.

$\displaystyle \delta ^{\lambda + 1} = (Y - A^{\lambda + 1})*A^{\lambda + 1}*(1-A^{\lambda + 1})$.

Deduction:

  I am not capable of taking the partial derivative of vector or matrix over vector or matrix, so I derive these formulas by observing the formula for each element in the matrix and extend it to the vector form.

$\displaystyle \Delta W^{\lambda+1,\lambda}_{i,j} = \epsilon (Y_i - A^{\lambda+1}_i)A^{\lambda+1}_i(1-A^{\lambda +1}_i)A^{\lambda}_j$.

  Let's assume $\displaystyle \delta ^{\lambda+1}_{i} := (Y_i - A^{\lambda+1}_i)A^{\lambda+1}_i(1-A^{\lambda +1}_i)$.

$\displaystyle \Delta W^{\lambda,\lambda-1}_{i,j}=\epsilon (\delta^{\lambda+1})^{T}W^{\lambda+1,\lambda}_{col(i)}A_i^{\lambda}(1-A_i^{\lambda})A_j^{\lambda-1}$.

  Let's assume $\displaystyle \delta ^{\lambda}_{i} := (\delta^{\lambda+1})^{T}W^{\lambda+1,\lambda}_{col(i)}A_i^{\lambda}(1-A_i^{\lambda})$.

  The left are reserved for the readers to complete.

[Machine Learning][BP]The Vectorized Back Propagation Algorithm的更多相关文章

  1. CheeseZH: Stanford University: Machine Learning Ex4:Training Neural Network(Backpropagation Algorithm)

    1. Feedforward and cost function; 2.Regularized cost function: 3.Sigmoid gradient The gradient for t ...

  2. Bayesian machine learning

    from: http://www.metacademy.org/roadmaps/rgrosse/bayesian_machine_learning Created by: Roger Grosse( ...

  3. 机器学习算法之旅A Tour of Machine Learning Algorithms

    In this post we take a tour of the most popular machine learning algorithms. It is useful to tour th ...

  4. [GPU] Machine Learning on C++

    一.MPI为何物? 初步了解:MPI集群环境搭建 二.重新认识Spark 链接:https://www.zhihu.com/question/48743915/answer/115738668 马铁大 ...

  5. A Gentle Introduction to the Gradient Boosting Algorithm for Machine Learning

    A Gentle Introduction to the Gradient Boosting Algorithm for Machine Learning by Jason Brownlee on S ...

  6. Machine Learning—Mixtures of Gaussians and the EM algorithm

    印象笔记同步分享:Machine Learning-Mixtures of Gaussians and the EM algorithm

  7. AUTOML --- Machine Learning for Automated Algorithm Design.

    自动算法的机器学习: Machine Learning for Automated Algorithm Design. http://www.ml4aad.org/ AutoML——降低机器学习门槛的 ...

  8. (转)Introduction to Gradient Descent Algorithm (along with variants) in Machine Learning

    Introduction Optimization is always the ultimate goal whether you are dealing with a real life probl ...

  9. machine learning model(algorithm model) .vs. statistical model

    https://www.analyticsvidhya.com/blog/2015/07/difference-machine-learning-statistical-modeling/ http: ...

随机推荐

  1. 「HNOI2004」宠物收养场

    「HNOI2004」宠物收养场 传送门 对宠物和人各维护一棵平衡树,每次 \(\text{split}\) 的时候记得判一下子树是否为空,然后模拟就好了. 参考代码: #include <alg ...

  2. Groovy轻松入门——通过与Java的比较,迅速掌握Groovy

    转自 :Groovy轻松入门——通过与Java的比较,迅速掌握Groovy (更新于2008.10.18) 在前几篇文章中,我已经向大家介绍了Groovy是什么,学习Groovy的重要性等内容,还不了 ...

  3. 三 Spring对于延迟加载问题的解决

    Spring提供了延迟加载问题的解决方法 什么是延迟加载? 延迟加载:lazy(懒加载) 执行到该行代码的时候不会发送语句,真正使用这个对象的属性的时候才会发送sql语句进行查询. 类级别延迟加载:指 ...

  4. day17-Python运维开发基础(类的封装 / 对象和类的相关操作、构造方法)

    1. 类的封装及相关操作 # ### oop 面向对象程序开发 """ #用几大特征表达一类事物称为一个类,类更像是一张图纸,表达的是一个抽象概念 "" ...

  5. android中按back键返回上一个activity,如何重新调用上一个activity的oncreate方法?

    默认情况下是不会调用的. @Override public void onBackPressed() { String titleStr = edittitle.getText().toString( ...

  6. English-Number

    English-Number 1. 基数与序数 2. 数量级前缀 3. 与数量级前缀连用的常用单位 4. 数量的完整写法式例 5. 数学符号 6. 其他常用单位 7. 数字的用法 7.1 日期时间 7 ...

  7. 吴裕雄--天生自然JAVAIO操作学习笔记:压缩流与回退流

    import java.io.File ; import java.io.FileInputStream ; import java.io.InputStream ; import java.util ...

  8. Tornado -- 7 - 查询结果

    查询结果 查询结果总结: 条件查询 多表查询

  9. Windows驱动开发-内核常用内存函数

    搞内存常用函数 C语言 内核 malloc ExAllocatePool memset RtlFillMemory memcpy RtlMoveMemory free ExFreePool

  10. firewalld学习-zone

    原文地址:http://www.excelib.com/article/290/show firewalld默认提供了九个zone配置文件: block.xml.dmz.xml.drop.xml.ex ...