Measuring association: from variance to information

Interactive demo (mean, variance, covariance, correlation, entropy, KL divergence, mutual information, total correlation)

Key idea
If knowing one variable reduces our uncertainty about the other, the variables share information.
Surprise: $$s(x)=-\log_2 p(x)$$ Entropy: $$H(X)=-\sum_x p(x)\log_2 p(x)$$ KL: $$D_{KL}(P\|Q)=\sum_x p(x)\log_2\frac{p(x)}{q(x)}$$ Mutual information: $$I(X;Y)=D_{KL}(P_{XY}\|P_XP_Y)$$ Total correlation (two variables): $$TC(X,Y)=H(X)+H(Y)-H(X,Y)=I(X;Y)$$
Note: with quantile binning, the marginals are (approximately) uniform by construction, so $P(X)$ and $P(Y)$ will look flat even for strongly non-Gaussian data.

Scatter plot (axis locked to square)
Stacked: observed $P(X,Y)$ and independence model $Q(X,Y)=P(X)P(Y)$ (same colour scale). Values shown as probabilities with two decimals.
Marginal: X
Marginal: Y
Observed joint $P(X,Y)$ (binned)
Independence model $Q(X,Y)=P(X)P(Y)$