Suppose that we want to learn a function . A simple hypotesis would be a step function:
but this is non differentiable! Another simple choice that is smooth is the logistic function:
so that ourmodel is defined as
- Input:
- output:
- hypotesis:
Since , we find that
We call the ‘true’ probability. Our goal is to make as similar as possible to , to do so we introduce
- Performance metric: Binary crossentropy
we minimze the average binary cross-entropy on the training examples
note that
let’s compute the gradient!
(c’è un +1 che non dovrebbe esserci, alla fine è molto clearn) imposing it equals to zero
which is
this is not analitically tractable! We need to use numeric techincs like Gradient Methods.