As always - no explanation

Published: June 04, 2023

Table of contents

Positional encoding
- What?
- Explanation

Positional encoding

From: Attention is all you need this reprint with comments

PE_{pos, 2i} = sin(pos / 10000^{2i/d_{model}}) \\ PE_{pos, 2i+1} = cos(pos / 10000^{2i/d_{model}})

where pos is the position and i is the dimension. That is, each dimension of the positional encoding corresponds to a sinusoid. The wavelengths form a geometric progression from $2\pi$ to $10000 \cdot 2\pi$ . We chose this function because we hypothesized it would allow the model to easily learn to attend by relative positions, since for any fixed offset $k$ , $PE_{pos+k}$ can be represented as a linear function of $PE_{pos}$ . We chose the sinusoidal version because it may allow the model to extrapolate to sequence lengths longer than the ones encountered during training.

Jupyter notebook to illustrate it

What?

Maybe it's just me but I don't understand how sin/cos functions can save positional relations. I can understand convolutional operations how they transform data based on their positions.

Why $sin$ for even indices and $cos$ for odd ones?