This demo shows how the representation of a query token changes in a transformer based on its relation to other tokens and the specified transformation \( v(x) \).
Equation for \( x_q \):
\[ x_q' = \sum_{x_k \in M(x_q, S_{x_k})} a(x_q, x_k; \theta_a) v(x_k; \theta_v) \]
(c) Fayyaz Minhas
Tokens in original feature space
Tokens in \( v(x) \)
Representation of the query token (red) after transformation