We would like to build from scratch a new neural network taking as input an image and producing as output another image. For example in semantic segmentation, each pixel in the input image is linked to a class.

When a object moves in the image, we want the associated labels to move with it. Hence, before constructing such a neural network, we first need to figure out a way to build a layer having this property: when an object is translated in an image, the output of the layer should be translated with the same translation. This is what we will do here.

Mathematical model

Let’s simplify our setting, instead of images our input are just vectors (tensors of dimension ). A translation of this vector is also called a shift. Define an operator that shifts to the right each component (modulo ), it’easy to see that

Now our task it to find a linear layer which commutes with respect to S_: when the input is shifted, the output is also shifted:

One can start from a random matrix , and then gradient descent to minimize the rescaled norm of the commutator:

we rescale to remove the trivial solution .

Note the diagonal structure! This is a circulant matrix.

Def (circulant matrix) Given a vector we define  the associated matrix ​ whose first column is made up of these numbers and each subsequent column is obtained by a shift of the previous column:

Not we prove this characterization. Prop A matrix is circulant iff it commutes with shift, Proof It’s easy to check that for any circulant matric it holds that .

First note that where is the usual Kronecker symbol (we omit the further), hence:

so that

this means that need to be constant along diagonals, i.e. it’s a circulant matrix with the associated vector

The get-away from this is that if you want to learni a shift-invariant lineare transformation, you only need to learn the vector . In more dimensione this is a matrix, and you apply it by shifting: convolutional layers.

Circular convolutions

What is the connection with convolution? If we apply any vector to a circulant matrix we get

this is the same as the (discrete) -convolution!

Discrete DFT

All circulant matrix commute, this means that are simultaneously diagonalizable. Let’s find the eigenvector of , the left shift operator.

The eigenvalues of are the -th roots of unity:

with eigenvectors

this basis diagonalize also all circulant matrices! Let’s find now the eigenvalues for a generic circulant matrix :

solving the first row

which is the Discrete Fourier Transform.