RBM

Architecture

Is a neural network architecture on a complete bipartite graph. The two layer are called visible and hiddent, in total we have

total neurons. Indichiamo lo stato dei neuroni visibili con e for the hidden neurons.

The hamiltonian is the usual of neural networks models

Supervised learning

The goal is to learn the joint distribution where is the input and the label.

The idea is to use the visible neurons to encode , and the hidden to encode the label (class).

We need to find parameters such that the equilibrium distribution of the network is as close as possible to the data

Then we can use the network to predict the correct label of unseen data, the best guess

Unsupervides learning

Input: some data . Task: learn the data distribution .

Again we encode our data in the visible layer neurons . So we need to find the best parameters such that

Once the network is trained, it can be used to generate new data, denoise, and to reconstruct.

Learning

The task is to learn a distribution. The obvious choice is to define a metric (loss) beetween the network distribution and the goal and try to minimize that using gradient descent.

We will use the KL-divergence . Using gradient descent we obtain update rules for the parameters:

thus we need to compute this derivatives. Let’s consider the RBM in the unsupervised learning setting.

since we only care about visibile neurons, we had to marginalize.

calcoliamo le derivate che compaiono al numeratore

rimettendo insieme…

per la procedura di training devo far termalizzare la rete… Computazionalmente non trattabile per reti non miniuscole.

Teniche per approssimare:

  • MCMC Monte Carlo Markov Chain, più passi faccio meglio è (empiricamente ne basta anche uno solo!)
  • Approssimazione mean field usando le equazioni di autoconsistenza.