Key idea
If knowing one variable reduces our uncertainty about the other, the variables share information.
Surprise: $$s(x)=-\log_2 p(x)$$
Entropy: $$H(X)=-\sum_x p(x)\log_2 p(x)$$
KL: $$D_{KL}(P\|Q)=\sum_x p(x)\log_2\frac{p(x)}{q(x)}$$
Mutual information: $$I(X;Y)=D_{KL}(P_{XY}\|P_XP_Y)$$
Total correlation (two variables): $$TC(X,Y)=H(X)+H(Y)-H(X,Y)=I(X;Y)$$
Note: with quantile binning, the marginals are (approximately) uniform by construction, so $P(X)$ and $P(Y)$ will look flat even for strongly non-Gaussian data.