DeepLearning Intro - sigmoid and shallow NN
This is a series of Machine Learning summary note. I will combine the deep learning book with the deeplearning open course . Any feedback is welcomed!
First let's go through some basic NN concept using Bernoulli classification problem as an example.
Activation Function
1.Bernoulli Output
1. Definition
When dealing with Binary classification problem, what activavtion function should we use in the output layer?
Basically given \(x \in R^n\), How to get \(P(y=1|x)\) ?
2. Loss function
Let $\hat{y} = P(y=1|x) $, We would expect following output
\[P(y|x) = \begin{cases}
\hat{y} & \quad when & y= 1\\
1-\hat{y} & \quad when & y= 0\\
\end{cases}
\]
Above can be simplified as
\[P(y|x)= \hat{y}^{y}(1-\hat{y})^{1-y}\]
Therefore, the maximum likelihood of m training samples will be
\[\theta_{ML} = argmax\prod^{m}_{i=1}P(y^i|x^i)\]
As ususal we take the log of above function and get following. Actually for gradient descent log has other advantages, which we will discuss later.
\[log(\theta_{ML}) = argmax\sum^{m}_{i=1} ylog(\hat{y})+(1-y)log(1-\hat{y}) \]
And the cost function for optimization is following
\[J(w,b) = \sum ^{m}_{i=1}L(y^i,\hat{y}^i)= -\sum^{m}_{i=1}ylog(\hat{y})+(1-y)log(1-\hat{y})\]
The cost function is the sum of loss from m training samples, which measures the performance of classification algo.
And yes here it is exactly the negative of log likelihood. While Cost function can be different from negative log likelihood, when we apply regularization. But here let's start with simple version.
So here comes our next problem, how can we get 1-dimension $ log(\hat{y})$, given input \(x\), which is n-dimension vector ?
3. Activtion function - Sigmoid
Let \(h\) denotes the output from the previous hidden layer that goes into the final output layer. And a linear transformation is applied to \(h\) before activation function.
Let \(z = w^Th +b\)
The assumption here is
\[log(\hat{y}) = \begin{cases}
z & \quad when & y= 1\\
0 & \quad when & y= 0\\
\end{cases}
\]
Above can be simplified as
\[log(\hat{y}) = yz\quad \to \quad \hat{y} = exp(yz)\]
This is an unnormalized distribution of \(\hat{y}\). Because \(y\) denotes probability, we need to further normalize it to $ [0,1]$.
\[\hat{y} = \frac{exp(yz)} {\sum^1_{y=0}exp(yz)} \\
=\frac{exp(z)}{1+exp(z)}\\
\quad \quad \quad \quad = \frac{1}{1+exp(-z)} = \sigma(z)
\]
Bingo! Here we go - Sigmoid Function: \(\sigma(z) = \frac{1}{1+exp(-z)}\)
\[p(y|x) = \begin{cases}
\sigma(z) & \quad when & y= 1\\
1-\sigma(z) & \quad when & y= 0\\
\end{cases}
\]
Sigmoid function has many pretty cool features like following:
\[ 1- \sigma(x) = \sigma(-x) \\
\frac{d}{dx} \sigma(x) = \sigma(x)(1-\sigma(x)) \\
\quad \quad = \sigma(x)\sigma(-x)
\]
Using the first feature above, we can further simply the bernoulli output into following:
\[p(y|x) = \sigma((2y-1)z)\]
4. gradient descent and back propagation
Now we have target cost fucntion to optimize. How does the NN learn from training data? The answer is -- Back Propagation.
Actually back propagation is not some fancy method that is designed for Neural Network. When training sample is big, we can use back propagation to train linear regerssion too.
Back Propogation is iteratively using the partial derivative of cost function to update the parameter, in order to reach local optimum.

$ \
Looping \quad m \quad samples :\
w= w - \frac{\partial J(w,b)}{\partial w} \
b= b - \frac{\partial J(w,b)}{\partial b}
$
Bascically, for each training sample \((x,y)\), we compare the \(y\) with \(\hat{y}\) from output layer. Get the difference, and compute which part of difference is from which parameter( by partial derivative). And then update the parameter accordingly.

And the derivative of sigmoid function can be calcualted using chaining method:
For each training sample, let \(\hat{y}=a = \sigma(z)\)
\[ \frac{\partial L(a,y)}{\partial w} =
\frac{\partial L(a,y)}{\partial a} \cdot
\frac{\partial a}{\partial z} \cdot
\frac{\partial z}{\partial w}\]
Where
1.$\frac{\partial L(a,y)}{\partial a}
=-\frac{y}{a} + \frac{1-y}{1-a} $
Given loss function is
\(L(a,y) = -(ylog(a) + (1-y)log(1-a))\)
2.\(\frac{\partial a}{\partial z} = \sigma(z)(1-\sigma(z)) = a(1-a)\).
See above for sigmoid features.
3.\(\frac{\partial z}{\partial w} = x\)
Put them together we get :
\[ \frac{\partial L(a,y)}{\partial w} = (a-y)x\]
This is exactly the update we will have from each training sample \((x,y)\) to the parameter \(w\).
5. Entire work flow.
Summarizing everything. A 1-layer binary classification neural network is trained as following:
- Forward propagation: From \(x\), we calculate \(\hat{y}= \sigma(z)\)
- Calculate the cost function \(J(w,b)\)
- Back propagation: update parameter \((w,b)\) using gradient descent.
- keep doing above until the cost function stop improving (improment < certain threshold)
6. what's next?
When NN has more than 1 layer, there will be hidden layers in between. And to get non-linear transformation of x, we also need different types of activation function for hidden layer.
However sigmoid is rarely used as hidden layer activation function for following reasons
- vanishing gradient descent
the reason we can't use [left] as activation function is because the gradient is 0 when \(z>1 ,z <0\).
Sigmoid only solves this problem partially. Becuase \(gradient \to 0\), when \(z>1 ,z <0\).
| \(p(y=1\|x)= max\{0,min\{1,z\}\}\) | \(p(y=1\|x)= \sigma(z)\) |
|---|---|
![]() |
![]() |
- non-zero centered
To be continued
Reference
- Ian Goodfellow, Yoshua Bengio, Aaron Conrville, "Deep Learning"
- Deeplearning.ai https://www.deeplearning.ai/
DeepLearning Intro - sigmoid and shallow NN的更多相关文章
- Sigmoid function in NN
X = [ones(m, ) X]; temp = X * Theta1'; t = size(temp, ); temp = [ones(t, ) temp]; h = temp * Theta2' ...
- Deeplearning - Overview of Convolution Neural Network
Finally pass all the Deeplearning.ai courses in March! I highly recommend it! If you already know th ...
- DeepLearning - Regularization
I have finished the first course in the DeepLearnin.ai series. The assignment is relatively easy, bu ...
- DeepLearning - Forard & Backward Propogation
In the previous post I go through basic 1-layer Neural Network with sigmoid activation function, inc ...
- Pytorch_第六篇_深度学习 (DeepLearning) 基础 [2]---神经网络常用的损失函数
深度学习 (DeepLearning) 基础 [2]---神经网络常用的损失函数 Introduce 在上一篇"深度学习 (DeepLearning) 基础 [1]---监督学习和无监督学习 ...
- 0802_转载-nn模块中的网络层介绍
0802_转载-nn 模块中的网络层介绍 目录 一.写在前面 二.卷积运算与卷积层 2.1 1d 2d 3d 卷积示意 2.2 nn.Conv2d 2.3 转置卷积 三.池化层 四.线性层 五.激活函 ...
- Neural Networks and Deep Learning
Neural Networks and Deep Learning This is the first course of the deep learning specialization at Co ...
- 关于BP算法在DNN中本质问题的几点随笔 [原创 by 白明] 微信号matthew-bai
随着deep learning的火爆,神经网络(NN)被大家广泛研究使用.但是大部分RD对BP在NN中本质不甚清楚,对于为什这么使用以及国外大牛们是什么原因会想到用dropout/sigmoid ...
- Pytorch实现UNet例子学习
参考:https://github.com/milesial/Pytorch-UNet 实现的是二值汽车图像语义分割,包括 dense CRF 后处理. 使用python3,我的环境是python3. ...
随机推荐
- 使用OrgChart插件生成家谱组织结构图
1.orgchart插件: github地址:https://github.com/dabeng/OrgChart 2.前端代码: //1.加载树形数据:ajax请求获取json格式的数据(flag参 ...
- SpringSecurity
1.1 SpringSecurity技术简介与使用 1.1.1 简介 Spring Security是一个能够为基于Spring的企业应用系统提供声明式的安全访问控制解决方案的安全框架. ...
- 跨域详解之-----Jsonp跨域
一.通过jsonp跨域 在js中,我们直接用XMLHttpRequest请求不同域上的数据时,是不可以的.但是,在页面上引入不同域上的js脚本文件却是可以的,jsonp正是利用这个特性来实现的. 比如 ...
- thinkphp5实现定位功能
一.所需资源链接:百度网盘.主要包含一个ip地址库和一个ip类文件. 二.下载好后,在extend目录下面创建一个location的目录,将下载的文件解压到该目录.给类文件增加一个命名空间,便于我们使 ...
- 2.1 摄像头V4L2驱动框架分析
学习目标:学习V4L2(V4L2:vidio for linux version 2)摄像头驱动框架,分析vivi.c(虚拟视频硬件相关)驱动源码程序,总结V4L2硬件相关的驱动的步骤: 一.V4L ...
- VSCode 配置 C++
每次换台电脑写c++,就要找配置,很是繁琐.这次自己写篇博客,记录下相关配置过程. 安装编译器 打开下面的网站 http://www.msys2.org ,下载 64 位的 MSYS2,按照主页上的步 ...
- python学习笔记:第14天 内置函数补充和递归
一.匿名函数 匿名函数主要是为了解决一些简单需求而设计的一种函数,匿名函数的语法为: lambda 形参: 返回值 先来看一个例子: # 计算n的n次方 In[2]: lst = lambda n: ...
- Django学习之mysql应用基础
使用pip 安装mysql pip install mysql 使用命令行打开数据库且选择使用已有的数据库 显示已有数据库show databases; 选择已有数据库 use s23; 显示s23数 ...
- ACM数论-快速幂
ACM数论——快速幂 快速幂定义: 顾名思义,快速幂就是快速算底数的n次幂.其时间复杂度为 O(log₂N), 与朴素的O(N)相比效率有了极大的提高. 原理: 以下以求a的b次方来介绍: 把b转换成 ...
- filebeat5与filebeat6配置index的差异
filebeat6配置index需要添加setup对应项,不然启动会报错. 6.0以后版本具体设置在filebeat.yml如下: setup.template.name: "myname- ...

