Suppose that we want to learn a function . A simple hypotesis would be a step function:

but this is non differentiable! Another simple choice that is smooth is the logistic function:

so that ourmodel is defined as

  • Input:
  • output:
  • hypotesis:

Since , we find that

We call the ‘true’ probability. Our goal is to make as similar as possible to , to do so we introduce

  • Performance metric: Binary crossentropy

we minimze the average binary cross-entropy on the training examples

note that

let’s compute the gradient!

(c’è un +1 che non dovrebbe esserci, alla fine è molto clearn) imposing it equals to zero

which is

this is not analitically tractable! We need to use numeric techincs like Gradient Methods.