This demo shows how the representation of a query token changes in a transformer based on its relation to other tokens and the specified transformation \( v(x) \).

Equation for \( x_q \):

\[ x_q' = \sum_{x_k \in M(x_q, S_{x_k})} a(x_q, x_k; \theta_a) v(x_k; \theta_v) \]

1.0

(c) Fayyaz Minhas

Tokens in original feature space

Tokens in \( v(x) \)

Representation of the query token (red) after transformation