Convolutions from first principles

We would like to build from scratch a new neural network taking as input an image and producing as output another image. For example in semantic segmentation, each pixel in the input image is linked to a class.

When a object moves in the image, we want the associated labels to move with it. Hence, before constructing such a neural network, we first need to figure out a way to build a layer having this property: when an object is translated in an image, the output of the layer should be translated with the same translation. This is what we will do here.

Mathematical model

Let’s simplify our setting, instead of images our input are just vectors (tensors of dimension $1$ ). A translation of this vector is also called a shift. Define an operator $S$ that shifts to the right each component (modulo $n$ ), it’easy to see that

S = 010 ⋮ 0 \dots 01 ⋱ 0 \dots \dots 0 ⋱ 0 00011000

Now our task it to find a linear layer $W$ which commutes with respect to S_: when the input is shifted, the output is also shifted:

W S = S W

One can start from a random matrix $W$ , and then gradient descent to minimize the rescaled norm of the commutator:

W min \frac{∥ S W - W S ∥ _{2}^{2}}{∥ W ∥ _{2}^{2}}

we rescale to remove the trivial solution $W = 0$ .

Note the diagonal structure! This is a circulant matrix.

Def (circulant matrix) Given a vector $a = (a_{0}, a_{1}, \dots, a_{n - 1})$ we define the associated matrix $C_{a}$ whose first column is made up of these numbers and each subsequent column is obtained by a shift of the previous column:

C_{a} = a_{0} a_{1} a_{2} ⋮ a_{n - 1} a_{n - 1} a_{0} a_{1} ⋮ a_{n - 1} a_{n - 2} a_{1} ⋱ \dots \dots a_{1} a_{2} a_{3} a_{0}

Not we prove this characterization. Prop A matrix $W$ is circulant iff it commutes with shift, $S W = W S$ Proof $(⟹)$ It’s easy to check that for any circulant matric $C$ it holds that $SC = CS$ .

$(⟸)$ First note that $S_{ik} = δ (i - 1 mod n, k)$ where $δ$ is the usual Kronecker symbol (we omit the $mod n$ further), hence:

(S W)_{ij} = k \sum S_{ik} W_{kj} = k \sum δ (i - 1, k) W_{kj} = W_{i - 1, j}

(W S)_{ij} = k \sum W_{ik} S_{kj} = k \sum W_{ik} δ (k - 1, j) = W_{i, j + 1}

so that

W_{i - 1, j} = W_{i, j + 1} ⟺ W_{ij} = W_{i + 1, j + 1}

this means that $W$ need to be constant along diagonals, i.e. it’s a circulant matrix with the associated vector

a_{i} := W_{0 i} □

The get-away from this is that if you want to learni a shift-invariant lineare transformation, you only need to learn the vector $a$ . In more dimensione this is a matrix, and you apply it by shifting: convolutional layers.

Circular convolutions

What is the connection with convolution? If we apply any vector to a circulant matrix $y = C_{a} x$ we get

y_{i} = k \sum C_{a}_{ik} x_{k} = k \sum a_{i - k} x_{k} = (a * x)_{i}

this is the same as the (discrete) $1 D$ -convolution!

Discrete DFT

All circulant matrix commute, this means that are simultaneously diagonalizable. Let’s find the eigenvector of $S^{- 1}$ , the left shift operator.

The eigenvalues of $S^{- 1}$ are the $n$ -th roots of unity:

ρ_{m} = e^{i \frac{2 πm}{n}}, m \in Z_{n}

with eigenvectors

v_{m} = (1, e^{i \frac{2 πm}{n}}, e^{i \frac{4 πm}{n}}, \dots)

this basis diagonalize also all circulant matrices! Let’s find now the eigenvalues for a generic circulant matrix $C_{a}$ :

C_{a} v_{m} = λ_{m} v_{m}

solving the first row

k \sum a_{k} v_{k} = λ_{m}

k \sum a_{k} ρ^{k} = \overset{a}{^}_{k}

which is the Discrete Fourier Transform.

Lorenzo Gregoris

Explorer

Convolutions from first principles

Mathematical model

Circular convolutions

Discrete DFT

Graph View

Table of Contents

Backlinks

Lorenzo Gregoris

Explorer

Convolutions from first principles

Mathematical model §

Circular convolutions §

Discrete DFT §

Graph View

Table of Contents

Backlinks

Mathematical model

Circular convolutions

Discrete DFT