1. hypothsis

2. cost function:

3. Goal:

4. Gradient descent algorithm

repeat until convergence {


  (for j = 0 and j = 1)


note: simultaneous update

α:learning rate

if α is too small, gradient descent can be slow.

if α is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge.

5. Gradient descent algorithm for one variable

repeat until convergence {




6. "batch" gradient descent: each step of gradient descent uses all the training examples

