Modelling Sequence Data
Some task where the input is of sequential nature:
- speech recognition
- machine translation
- language modeling
- sentiment analysis
- video data
How should we choose our model? Again, FCN are universal, but the size of the model can grow too much.
If our sequence is long the FCN would become prohibitively large: Idea input one at a time, re-use the same weights! With this idea we don’t care about the sequence length .
To memorize prviously information, we employ a recurrent mechanism.
we can unfold this formula times.
A simple RNN model:
Training an RNN
It’s like we arte backpropagating through time. Just unfold the network! The loss is just the sum:
for simplicity let’s assume that there is no bias and that the activation function is the identity. Then
Let , then
but in our simple case
so that
things could get ugly if is big: if the greatest eigenvalue norm is greater than we get exploding gradient, if it’s less than vanishing gradient.
LSTM
Long-Short-Term-Memory
Past input contribute less and less to the loss. We need to add memory!
Main ideas
- a new hidden state called cell state, with the ability to store long-term information.
- LSTM can read, write and erase information from a cell.
- Gates are defined to get the ability to select information, they are also vector of length .
- At each time step the gates can be open (1) or close (0), or somewhere in between.
Forget Gate
Controls what is kept vs forgotten from previous cell state.
Input Gate
Controls what parts of the new input are written to the cell.
Output Gate
Controls what parts of the cell state is trasnfered to the hidden state.
New Cell content
We update the content of the cell and the hidden state usins the gates
forget and write
the new hidden state uses the output gate
GRU
Gated Recurrent Units It doesn’t use a cell state, but it has gates!
Reset Gate
controls what parts of previous hidden state are used to compute new content
Update Gate
controls what parts of previous hidden state are used to compute new content
We use the gates very similarly to create a new hidden state: