We want to model a function , our hypotesis is that this function is linear, that is
where is a vector of parameters.
Task: predict from using . Performance metric: MSE
We hypotewsis that reducing will also reduce .
We pack our examples in a matrix by stacking them in rows. We assume that (max rank) (i.e. no usless features).
With this matrix we can write the like
We can use normal equation to minimze the residue!