We want to model a function , our hypotesis is that this function is linear, that is

where is a vector of parameters.

Task: predict from using . Performance metric: MSE

We hypotewsis that reducing will also reduce .

We pack our examples in a matrix by stacking them in rows. We assume that (max rank) (i.e. no usless features).

With this matrix we can write the like

We can use normal equation to minimze the residue!