Week Six

F Score

\[\begin{aligned}
P &= &\dfrac{2}{\dfrac{1}{P}+\dfrac{1}{R}}\\
&= &2 \dfrac{PR}{P+R}
\end{aligned}\]

Week Seven

Support Vector Machine

Cost Function

\[\begin{aligned}
&\min_{\theta}\lbrack-\dfrac{1}{m}{\sum_{y_{i}\in Y, x_{i} \in X}{y_{i} \log h(\theta^{T}x_{i})}+(1-y_{i})\log (1-h(\theta^{T}x_{i}))+\dfrac{\lambda}{2m} \sum_{\theta_{i} \in \theta}{\theta_{i}^{2}}}\rbrack\\
&\Rightarrow \min_{\theta}[-\sum_{y_{i} \in Y,x_{i} \in X}{y_{i} \log{h(\theta^{T}x_{i})}+(1-y_{i})\log(1-h(\theta^{T}x_{i}}))+\dfrac{\lambda}{2}\sum_{\theta_{i} \in \theta }{\theta^2_{i}}]\\
&\Rightarrow\min_{\theta}[C\sum_{y_{i} \in Y,x_{i} \in X}{y_{i} \log{h(\theta^{T}x_{i})}+(1-y_{i})\log(1-h(\theta^{T}x_{i}}))+\sum_{\theta_{i} \in \theta }{\theta^2_{i}}]\\
\end{aligned}\]
C is somewhat \(\dfrac{1}{\lambda}\).

  • Large C:

    • lower bias, high variance
  • Small C:
    • Higher bias, low variance
  • Large \(\sigma^2\): Features \(f_{i}\) vary more smoothly.
    • Higher bias, low variance
  • Small \(\sigma^2\): Features \(f_{i}\) vary more sharply.
    • Lower bias, high variance.
      \[\begin{aligned}
      & \dfrac{1}{2} \sum_{\theta_{i} \in \theta}{\theta_{i}^2}\\
      &s.t&\theta^{T}x_{i} \geq 1, if\ y_{i} = 1&\\
      &&\theta^{T}x_{i} \leq -1, if\ y_{i} = 0&
      \end{aligned}\]

PS

If features are too many related to m, use logistic regression or SVM without a kernel.

If n is small, m is intermediate, use SVM with Gaussian kernal.

If n is small, m is large, add more features and use logistic regression or SVM without a kernel.

Week Eight

K-means

Cost Function

It try to minimize
\[\min_{\mu}{\dfrac{1}{m} \sum_{i=1}^{m} ||x^{(i)} - \mu_{c^{(i)}}}||^2\]
For the first loop, minimize the cost function by varing the centorid. For the second loop, it minimize the cost funcion with cetorid fixed and realign the centorid of every x in the training set.

Initialize

Initialize the centorids randomly. Randomly select k samples from the training set and set the centorids to these random selected samples.

It is possible that K-meas fall into the local minimum, So repeat to initialize the centorids randomly until the cost(distortion) is suitable for your purposes.

K-means converge all the time and it will not increase the cost during the training processs. More centoirds will decease the cost, if not, the k-means must fall into the local minimum and reinitialize the centorid until the cost is less.

PCA (Principal Component Analysis)

Restruct x from z meeting the below nonequation
\[1-\dfrac{\dfrac{1}{m} \sum_{i=1}^{m}||x^{(i)}-x^{(i)}_{approximation}||^2}{\dfrac{1}{m} \sum_{i=1}^{m} ||x^{(i)}||^2} \geq 0.99\]
PS:
the nonequation can be equal to the below
\[\begin{aligned}
[U, S, D] &= svd(sigma)\\
U_{reduce} &= U(:, 1:k)\\
z &= U_{reduce}' * x\\
x_{approximation} &= U_{reduce} * x\\\\
S &= \left( \begin{array}{ccc}
s_{11}&0&\cdots&0\\
0&s_{22}&\cdots&0\\
\vdots&\vdots&\ddots&\vdots\\
0&0&\cdots&s_{nn}
\end{array} \right)\\\\
\dfrac{\sum_{i=1}^{k}s_{ii}^2}{\sum_{i=1}^{m} s_{ii}^2} &\geq 0.99
\end{aligned}\]

Week Nine

Anomaly Detection

Gaussian Distribution

Multivariate Gaussian Distribution takes the connection of different variants into account
\[p(x) = \dfrac{1}{(2\pi)^{\frac{n}{2}}|\Sigma|^{\frac{1}{2}}}e^{-\frac{1}{2}(x-\mu)^{T}\Sigma^{-1}(x-\mu)}\]
Single variant Gaussian Distribution is a special example of Multivariate Gaussian Distribution, where
\[\Sigma = \left(\begin{array}{ccc}
\sigma_{11}&&&&\\
&\sigma_{22}&&&\\
&&\ddots&&\\
&&&\sigma_{nn}&\\
\end{array}\right)\]
When training the Anomaly Detection, we can use Maximum Likelihood Estimation
\[\begin{aligned}
\mu &= \dfrac{1}{m} \sum_{i=1}^{m}x^{(i)}\\
\Sigma &= \dfrac{1}{m} \sum_{i=1}^{m} (x^{(i)}-\mu)(x^{(i)}-\mu)^{T}
\end{aligned}\]
When we use single variant anomaly detection, the numerical cost is much cheaper than multivariant. But may need to add some new features to distinguish the normal and non-normal.

Recommender System

Cost Function

\[\begin{aligned}
J(X,\Theta) = \dfrac{1}{2} \sum_{(i,j):r(i,j)=1}((\theta^{(j)})^{T}x^{(i)}-y^{(i,j)})^2 + \dfrac{\lambda}{2}[\sum_{i=1}^{n_{m}}\sum_{k=1}^{n}(x_k^{(i)})^2 + \sum_{j=1}^{n_\mu} \sum_{k=1}^n(\theta_{k}^{(j)})^2]\\
J(X,\Theta) = \dfrac{1}{2}Sum\{(X\Theta'-Y).*R\} + \dfrac{\lambda}{2}(Sum\{\Theta.^2\} + Sum\{X.^2\}\\
\end{aligned}\]
\[\begin{aligned}
\dfrac{\partial J}{\partial X} = ((X\Theta'-Y).*R) \Theta + \lambda X\\
\dfrac{\partial J}{\partial \Theta} = ((X\Theta'-Y).*R)'X + \lambda \Theta
\end{aligned}\]

MachineLearningOnCoursera的更多相关文章

随机推荐

  1. express session 和 socketio session关联

    express session http是没有状态的协议, 需要web框架自己实现会话和会话管理工作. express框架有session插件可以使用. 见如下介绍: https://www.tuto ...

  2. QT背景颜色,菜单颜色更改

    1.进入QT上置菜单栏 工具->选项 2.进入选项中 环境->interface (1)颜色:点击重置是默认颜色,想修改其他颜色,点击重置旁边的颜色自行选择. (2)Theme:这个里面提 ...

  3. Web开发常见安全问题

    转载自: http://blog.csdn.net/fengyinchao/article/details/50775121 不是所有 Web 开发者都有安全的概念,甚至可能某些安全漏洞从来都没听说过 ...

  4. 示波器X1探头和X10探头

    示波器探头有X1和X10档,当测量一个信号时应该如何选择? 1.先我们看它们的区别? X1档,表示信号没有经过衰减进入示波器 X10档,表示信号衰减10倍进入示波器(当示波器也设置为X10档,直接读数 ...

  5. UGUI打字机效果文本组件

    using System.Collections; using System.Collections.Generic; using UnityEngine; using UnityEngine.UI; ...

  6. 简单易懂的解释c#的abstract和virtual的用法和区别

    先来看abstract方法,顾名思义,abstract方法就是抽象方法. 1.抽象方法就是没有实现的,必须是形如: public abstract void Init(); 2.拥有抽象方法的类必须修 ...

  7. get 和free

    1.ngx_pool_t ** ngx_get_pool()//use:getngx_pool_t **pool_address;ngx_pool_t *pool;pool_address = ngx ...

  8. Codeforces 558E A Simple Task (计数排序&&线段树优化)

    题目链接:http://codeforces.com/contest/558/problem/E E. A Simple Task time limit per test5 seconds memor ...

  9. NOI2004郁闷的出纳员

    传送门 题目看起来玄乎,但其实只需要一点点小 trick 就可以了. 我们可以用一个全局的 delta 来维护工资的调整记录 对于每一个新加入的员工,先判断是否低于最低工资下限,如果是,直接踢出,不做 ...

  10. Java中值传递和引用传递的区别

    在Java中参数的传递主要有两种:值传递和参数传递: 下面是对两种传递方式在内存上的分析: 一:值传递 解释:实参传递给形参的是值  形参和实参在内存上是两个独立的变量 对形参做任何修改不会影响实参 ...