Permutation tests: non-parametric p-values by label shuffling
A permutation test (also called a randomization test) is a non-parametric way to compute a p-value
by repeatedly shuffling group labels and measuring how unusual the observed statistic is under the null hypothesis.
It is especially useful when you do not want to assume normality or equal variances.
1) The key assumption: exchangeability under the null
Suppose you have two groups (e.g., treatment vs. control). Under the null hypothesis \(H_0\) (“no treatment effect”),
the labels are exchangeable: if there is truly no difference, then any assignment of the pooled values
into two groups of sizes \(n_1\) and \(n_2\) should be equally plausible.
This is exactly the logic of randomized experiments: if labels were assigned randomly, then shuffling labels is consistent with \(H_0\).
2) Choose a test statistic
The permutation framework works with many statistics (mean difference, median difference, correlation, regression slopes, etc.).
This calculator uses the difference in means:
\[
T=\bar{x}_1-\bar{x}_2.
\]
Compute the observed statistic \(T_{\text{obs}}\) from your original groups.
3) Build the permutation distribution
Pool the data, shuffle, split back into two groups, recompute the statistic, and repeat many times:
\[
\text{Pool } \{x_1,\dots,x_{n_1},y_1,\dots,y_{n_2}\},
\ \text{shuffle labels, form } (\mathbf{x}^*,\mathbf{y}^*),
\ \text{then compute } T^*.
\]
After \(N\) permutations you have \(T^{*(1)},\dots,T^{*(N)}\). This collection approximates the sampling distribution of \(T\) under \(H_0\).
4) Tail choice and “extreme” permutations
The p-value depends on the alternative hypothesis:
- Two-sided: count permutations with \(|T^*|\ge |T_{\text{obs}}|\).
- Right-tailed: count permutations with \(T^*\ge T_{\text{obs}}\).
- Left-tailed: count permutations with \(T^*\le T_{\text{obs}}\).
5) Monte Carlo p-value estimate
If \(E\) out of \(N\) permutations are at least as extreme as observed, the standard estimate is:
\[
\hat{p}=\frac{E}{N}.
\]
Many texts also recommend a small adjustment to avoid reporting p-value \(0\) when \(E=0\):
\[
\hat{p}_{\text{adj}}=\frac{E+1}{N+1}.
\]
6) Exact vs approximate permutation tests (university note)
If your sample sizes are very small, you can enumerate all labelings exactly (an “exact permutation test”).
For moderate/large samples, enumeration is infeasible, so we use a Monte Carlo approximation with large \(N\).
7) Interpreting the histogram animation
The histogram shows the permutation distribution of \(T^*\). The observed \(T_{\text{obs}}\) is drawn as a vertical line,
and the shaded region indicates what counts as “extreme” for the chosen tail. A smaller p-value means the observed statistic
falls in a rare tail region under \(H_0\).
8) Practical tips
- Increase \(N\) to reduce Monte Carlo error (histogram becomes smoother).
- Permutation tests reflect your chosen statistic—mean difference focuses on location shifts, but not necessarily variance changes.
- For paired data, do not shuffle labels across individuals; use a paired permutation approach (e.g., sign flipping).