Lorenzo Gregoris

Search

❯

Machine Learning

❯

ReLu

Oct 21, 2024, 1 min read

The most popular and used activation function:

R e Lu (x) := max (0, x)

Pros:

Computationally less expensive compared to other activation functions
Fewer neauron beeing activated, leading to network sparsity and thus computational efficency
Avoida vaniscing gradient: since its derivative is always $1$ (in the positive input range) gradiens flow well on active paths of neurons. Cons
The dying ReLu: if too many values are negative, most of the network is inactive and the network is unable to learn further

Possible causes of the dying ReLu:

High learning rate: the update will push weights into negative numbers, thus overall negative scalar product.
Large negative bias obvious.

Graph View

Backlinks

Deep Learning

Created with Quartz v4.1.4, © 2024

GitHub
Discord Community