[Machine Learning][BP]The Vectorized Back Propagation Algorithm
Reference: https://www.cs.swarthmore.edu/~meeden/cs81/s10/BackPropDeriv.pdf
I spent nearly one hour to deduce the vector form of the back propagation. Just in case that I may forget, but need to utilize them, I will write down all the formula here to make a backup.
Structure:
Standard BP Network with $\displaystyle \lambda$ hidden layers, one input layer and one output layer.
Activation function: sigmoid.
Notations:
$\displaystyle W^{i+1,i}$, denotes the weight matrix connecting from $i$th layer to $i+1$th layer.
$\displaystyle N^i$, denotes the net input of the $i$th layer.
$\displaystyle A^i$, denotes the activation input of the $i$th layer.
$\displaystyle \delta ^i$, denotes the error of the $i$th layer.
$\displaystyle \epsilon$, denotes the learning rate.
*, stands for element by element multiplication.
(omit), stands for matrix multiplication.
Specifically,
$\displaystyle X$, denotes the input layer, while equals $\displaystyle A^0$.
$\displaystyle A^{\lambda + 1}$, denotes the output layer.
$\displaystyle Y$, denotes the expected output.
Propagations:
Forward:
$\displaystyle N^i = W^{i,i-1}A^{i-1}$.
$\displaystyle A^i = \frac{1}{1+e^{-N^i}}$.
Backward:
$\displaystyle \Delta W^{i+1,i} = \epsilon \delta^{i+1}(A^{i})^{T}$.
$\displaystyle \delta ^i = ((\delta^{i+1})^{T}W^{i+1,i})^{T}*A^{i}*(1-A^{i})$.
$\displaystyle \delta ^{\lambda + 1} = (Y - A^{\lambda + 1})*A^{\lambda + 1}*(1-A^{\lambda + 1})$.
Deduction:
I am not capable of taking the partial derivative of vector or matrix over vector or matrix, so I derive these formulas by observing the formula for each element in the matrix and extend it to the vector form.
$\displaystyle \Delta W^{\lambda+1,\lambda}_{i,j} = \epsilon (Y_i - A^{\lambda+1}_i)A^{\lambda+1}_i(1-A^{\lambda +1}_i)A^{\lambda}_j$.
Let's assume $\displaystyle \delta ^{\lambda+1}_{i} := (Y_i - A^{\lambda+1}_i)A^{\lambda+1}_i(1-A^{\lambda +1}_i)$.
$\displaystyle \Delta W^{\lambda,\lambda-1}_{i,j}=\epsilon (\delta^{\lambda+1})^{T}W^{\lambda+1,\lambda}_{col(i)}A_i^{\lambda}(1-A_i^{\lambda})A_j^{\lambda-1}$.
Let's assume $\displaystyle \delta ^{\lambda}_{i} := (\delta^{\lambda+1})^{T}W^{\lambda+1,\lambda}_{col(i)}A_i^{\lambda}(1-A_i^{\lambda})$.
The left are reserved for the readers to complete.
[Machine Learning][BP]The Vectorized Back Propagation Algorithm的更多相关文章
- CheeseZH: Stanford University: Machine Learning Ex4:Training Neural Network(Backpropagation Algorithm)
1. Feedforward and cost function; 2.Regularized cost function: 3.Sigmoid gradient The gradient for t ...
- Bayesian machine learning
from: http://www.metacademy.org/roadmaps/rgrosse/bayesian_machine_learning Created by: Roger Grosse( ...
- 机器学习算法之旅A Tour of Machine Learning Algorithms
In this post we take a tour of the most popular machine learning algorithms. It is useful to tour th ...
- [GPU] Machine Learning on C++
一.MPI为何物? 初步了解:MPI集群环境搭建 二.重新认识Spark 链接:https://www.zhihu.com/question/48743915/answer/115738668 马铁大 ...
- A Gentle Introduction to the Gradient Boosting Algorithm for Machine Learning
A Gentle Introduction to the Gradient Boosting Algorithm for Machine Learning by Jason Brownlee on S ...
- Machine Learning—Mixtures of Gaussians and the EM algorithm
印象笔记同步分享:Machine Learning-Mixtures of Gaussians and the EM algorithm
- AUTOML --- Machine Learning for Automated Algorithm Design.
自动算法的机器学习: Machine Learning for Automated Algorithm Design. http://www.ml4aad.org/ AutoML——降低机器学习门槛的 ...
- (转)Introduction to Gradient Descent Algorithm (along with variants) in Machine Learning
Introduction Optimization is always the ultimate goal whether you are dealing with a real life probl ...
- machine learning model(algorithm model) .vs. statistical model
https://www.analyticsvidhya.com/blog/2015/07/difference-machine-learning-statistical-modeling/ http: ...
随机推荐
- redis之string数据类型常用方法总结
目录 redis 字符串(string)[需要掌握] 特点 语法 redis 字符串(string)[需要掌握] 特点 一个键能存储512MB数据 string类型是二进制安全的,可以存储任何数据,比 ...
- Javascript调用本地数据库
window.location.href = urls; // 本窗口打开下载 window.open(urls, '_blank'); // 新开窗口下载 (1)new ActiveXObject( ...
- 引用类型--Date类型
要创建一个日期对象,使用new操作符和Date构造函数即可. var now = new Date() 在调用Date构造函数而不传递参数的情况下,新创建的对象自动获得当前日期和时间.如果想根据特定的 ...
- @Controller 和 @RestController 的区别
@Controller和@RestController的区别? 官方文档:@RestController is a stereotype annotation that combines @Respo ...
- modelsim10.4环境变量的设置
在用户变量中设置,注意路径还要包括license.txt 点击高级属性设置 点击环境变量 在用户变量一件名为:MGLS_LICENSE_FILE的变量 点击确定
- JVM源码分析-类加载场景实例分析
A类调用B类的静态方法,除了加载B类,但是B类的一个未被调用的方法间接使用到的C类却也被加载了,这个有意思的场景来自一个提问:方法中使用的类型为何在未调用时尝试加载?. 场景如下: public cl ...
- Pyspider的基本使用 -- 入门
简介 一个国人编写的强大的网络爬虫系统并带有强大的WebUI 采用Python语言编写,分布式架构,支持多种数据库后端,强大的WebUI支持脚本编辑器,任务监视器,项目管理器以及结果查看器 官方文档: ...
- acid-事务的原子性、一致性、隔离性、持久性
博客分类: oracle-dba 原子性 多个事情组成一个单元,要么同时成功或失败,不能只运行其中一个 一致性 事务处理要将数据库从一种状态转变为另一种状态. 一旦提交了修改数据,那么其它人读 ...
- CTE With as 递归调用
WITH AS的含义 WITH AS短语,也叫做子查询部分(subquery factoring),可以让你做很多事情,定义一个SQL片断,该SQL片断会 被整个SQL语句所用到.有的时候,是为了 ...
- springboot中文官方文档
springboot中文官方文档 https://www.breakyizhan.com/springboot/3028.html spring框架 https://www.breakyizhan.c ...