[Machine Learning][BP]The Vectorized Back Propagation Algorithm
Reference: https://www.cs.swarthmore.edu/~meeden/cs81/s10/BackPropDeriv.pdf
I spent nearly one hour to deduce the vector form of the back propagation. Just in case that I may forget, but need to utilize them, I will write down all the formula here to make a backup.
Structure:
Standard BP Network with $\displaystyle \lambda$ hidden layers, one input layer and one output layer.
Activation function: sigmoid.
Notations:
$\displaystyle W^{i+1,i}$, denotes the weight matrix connecting from $i$th layer to $i+1$th layer.
$\displaystyle N^i$, denotes the net input of the $i$th layer.
$\displaystyle A^i$, denotes the activation input of the $i$th layer.
$\displaystyle \delta ^i$, denotes the error of the $i$th layer.
$\displaystyle \epsilon$, denotes the learning rate.
*, stands for element by element multiplication.
(omit), stands for matrix multiplication.
Specifically,
$\displaystyle X$, denotes the input layer, while equals $\displaystyle A^0$.
$\displaystyle A^{\lambda + 1}$, denotes the output layer.
$\displaystyle Y$, denotes the expected output.
Propagations:
Forward:
$\displaystyle N^i = W^{i,i-1}A^{i-1}$.
$\displaystyle A^i = \frac{1}{1+e^{-N^i}}$.
Backward:
$\displaystyle \Delta W^{i+1,i} = \epsilon \delta^{i+1}(A^{i})^{T}$.
$\displaystyle \delta ^i = ((\delta^{i+1})^{T}W^{i+1,i})^{T}*A^{i}*(1-A^{i})$.
$\displaystyle \delta ^{\lambda + 1} = (Y - A^{\lambda + 1})*A^{\lambda + 1}*(1-A^{\lambda + 1})$.
Deduction:
I am not capable of taking the partial derivative of vector or matrix over vector or matrix, so I derive these formulas by observing the formula for each element in the matrix and extend it to the vector form.
$\displaystyle \Delta W^{\lambda+1,\lambda}_{i,j} = \epsilon (Y_i - A^{\lambda+1}_i)A^{\lambda+1}_i(1-A^{\lambda +1}_i)A^{\lambda}_j$.
Let's assume $\displaystyle \delta ^{\lambda+1}_{i} := (Y_i - A^{\lambda+1}_i)A^{\lambda+1}_i(1-A^{\lambda +1}_i)$.
$\displaystyle \Delta W^{\lambda,\lambda-1}_{i,j}=\epsilon (\delta^{\lambda+1})^{T}W^{\lambda+1,\lambda}_{col(i)}A_i^{\lambda}(1-A_i^{\lambda})A_j^{\lambda-1}$.
Let's assume $\displaystyle \delta ^{\lambda}_{i} := (\delta^{\lambda+1})^{T}W^{\lambda+1,\lambda}_{col(i)}A_i^{\lambda}(1-A_i^{\lambda})$.
The left are reserved for the readers to complete.
[Machine Learning][BP]The Vectorized Back Propagation Algorithm的更多相关文章
- CheeseZH: Stanford University: Machine Learning Ex4:Training Neural Network(Backpropagation Algorithm)
1. Feedforward and cost function; 2.Regularized cost function: 3.Sigmoid gradient The gradient for t ...
- Bayesian machine learning
from: http://www.metacademy.org/roadmaps/rgrosse/bayesian_machine_learning Created by: Roger Grosse( ...
- 机器学习算法之旅A Tour of Machine Learning Algorithms
In this post we take a tour of the most popular machine learning algorithms. It is useful to tour th ...
- [GPU] Machine Learning on C++
一.MPI为何物? 初步了解:MPI集群环境搭建 二.重新认识Spark 链接:https://www.zhihu.com/question/48743915/answer/115738668 马铁大 ...
- A Gentle Introduction to the Gradient Boosting Algorithm for Machine Learning
A Gentle Introduction to the Gradient Boosting Algorithm for Machine Learning by Jason Brownlee on S ...
- Machine Learning—Mixtures of Gaussians and the EM algorithm
印象笔记同步分享:Machine Learning-Mixtures of Gaussians and the EM algorithm
- AUTOML --- Machine Learning for Automated Algorithm Design.
自动算法的机器学习: Machine Learning for Automated Algorithm Design. http://www.ml4aad.org/ AutoML——降低机器学习门槛的 ...
- (转)Introduction to Gradient Descent Algorithm (along with variants) in Machine Learning
Introduction Optimization is always the ultimate goal whether you are dealing with a real life probl ...
- machine learning model(algorithm model) .vs. statistical model
https://www.analyticsvidhya.com/blog/2015/07/difference-machine-learning-statistical-modeling/ http: ...
随机推荐
- WEB, Flask - Session&Cookie
参考: https://blog.csdn.net/nunchakushuang/article/details/74652877 http://portal.xiaoxiangzi.com/Prog ...
- kubernetes 使用flannel网络模式 错误分析
今天按照网上和书上的要求,将目前的kubernetes网络换成flannel.其实配置起来还是很简单的,但是一旦出现了问题,将很难解决. 配置方法我这边不给出了.因为网上这样的教程一大把,在说下去也无 ...
- python导入第三方库
2.Python的库一般包含两个方面:第三方库和标准库 3.Python的time标准库主要包含三个方面的内容:(1)时间处理函数(2)时间格式化(3)程序计时 4.turtle画笔运动函数的功能是进 ...
- Android数据库高手秘籍(二):创建表和LitePal的基本用法
原文:http://blog.jobbole.com/77157/ 上一篇文章中我们学习了一些Android数据库相关的基础知识,和几个颇为有用的SQLite命令,都是直接在命令行操作的.但是我们都知 ...
- python 开启http服务并下载文件
Python <= 2.3python -c "import SimpleHTTPServer as s; s.test();" 8000 Python >= 2.4p ...
- SpringCloud 跨域访问cors
import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Co ...
- 在linux环境中配置solr
第一步:安装linux.jdk.tomcat. 第二步:把solr的压缩包上传到服务器.并解压.我的solr压缩包是解压在/usr/local/solr/包下的 系统默认是没有solr包的需要自己创建 ...
- 6专题总结-动态规划dynamic programming
专题6--动态规划 1.动态规划基础知识 什么情况下可能是动态规划?满足下面三个条件之一:1. Maximum/Minimum -- 最大最小,最长,最短:写程序一般有max/min.2. Yes/N ...
- 「牛客CSP-S2019赛前集训营2」服务器需求
传送门 NowCoder 解题思路 考虑一种贪心选择方法:每次选出最大的 \(m\) 个 \(a_i\) 进行覆盖. 那么就会出现一种特殊情况,最高的那个 \(a_i\) 需要多次选择,而且不得不每次 ...
- GSON使用笔记(3) -- 如何反序列化出List
GSON使用笔记(3) -- 如何反序列化出List 时间 2014-06-26 17:57:06 CSDN博客原文 http://blog.csdn.net/zxhoo/article/deta ...