Core idea behind permutation tests
A permutation test (also called a randomization test) evaluates a null hypothesis by generating the distribution of a chosen test statistic under rearrangements that are valid if the null is true. The p-value is computed as the fraction of rearrangements that produce a statistic at least as extreme as the observed statistic.
Key decision in paired vs unpaired permutation tests: the randomization scheme must match the data structure.
- Paired (dependent) samples: observations come in matched pairs (before/after on the same subject, twin pairs, matched units).
- Unpaired (independent) samples: observations come from two separate groups with no natural one-to-one matching.
Paired permutation test (dependent samples)
In paired data, the pairing is essential and must be preserved. A standard approach reduces each pair to a within-pair difference \(d_i\), then tests whether the typical difference is \(0\).
Null hypothesis and statistic
Typical null: \(H_0:\) the distribution of within-pair differences is centered at \(0\) (often stated as “no systematic paired effect”).
Choose a test statistic based on the differences, for example:
- Mean difference: \(T = \bar d = \dfrac{1}{n}\sum_{i=1}^{n} d_i\)
- Sum of differences: \(T = \sum_{i=1}^{n} d_i\)
- Median difference (less common for exact enumeration but possible)
Valid permutations for paired data
Under \(H_0\), swapping the two labels inside a pair (e.g., “before” and “after”) should not change the joint distribution. This induces a sign change in the difference:
\[ d_i = (\text{after})_i - (\text{before})_i \quad\Longrightarrow\quad d_i \text{ becomes } -d_i \text{ if the within-pair labels are swapped.} \]
Therefore, a common paired permutation test enumerates or samples all sign patterns \((s_1,\dots,s_n)\) where each \(s_i \in \{+1,-1\}\), forming \[ T^{(b)} = \frac{1}{n}\sum_{i=1}^{n} s_i \cdot d_i. \] There are \(2^n\) possible sign-flips (exact enumeration is feasible for modest \(n\)).
Paired p-value calculation
For a two-sided test using \(T=\bar d\): \[ p = \frac{\#\left\{b:\, \left|T^{(b)}\right| \ge \left|T_{\text{obs}}\right|\right\}}{2^n}. \] The “\(\ge\)” (not “\(>\)”) ensures the observed statistic is counted and gives an exact finite-sample p-value.
Unpaired permutation test (independent samples)
With two independent groups, the null hypothesis typically states that both samples come from the same distribution, so group labels are exchangeable.
Null hypothesis and statistic
Typical null: \(H_0:\) the two groups have the same distribution (a shift in location is absent).
A common statistic is the difference in sample means: \[ T = \bar X - \bar Y. \] Alternatives include difference in medians, trimmed means, or other robust location measures (the permutation framework remains the same).
Valid permutations for unpaired data
Pool all \(N=n_1+n_2\) observations, then repeatedly reassign \(n_1\) of them to “Group 1” and the rest to “Group 2” (label shuffling). Each reassignment produces a permuted statistic \(T^{(b)}\).
The number of distinct label assignments is \[ \binom{N}{n_1}. \] Exact enumeration is feasible when \(\binom{N}{n_1}\) is not too large; otherwise, Monte Carlo sampling of many random shuffles is used.
Unpaired p-value calculation
For a two-sided test: \[ p = \frac{\#\left\{b:\, \left|T^{(b)}\right| \ge \left|T_{\text{obs}}\right|\right\}}{\binom{N}{n_1}} \quad \text{(exact)} \qquad\text{or}\qquad p \approx \frac{1+\#\left\{b:\, \left|T^{(b)}\right| \ge \left|T_{\text{obs}}\right|\right\}}{1+B} \quad \text{(Monte Carlo)}. \] The “\(+1\)” form is a standard finite-sample correction that prevents a p-value of \(0\) when using random shuffles.
Worked examples comparing paired vs unpaired permutation tests
Example A: paired permutation test (sign-flips)
Scenario: measurements taken on the same \(n=6\) subjects before and after an intervention; analyze differences \(d_i=(\text{after})_i-(\text{before})_i\).
| Subject | \(d_i\) |
|---|---|
| 1 | 2 |
| 2 | -1 |
| 3 | 3 |
| 4 | 0 |
| 5 | 4 |
| 6 | -2 |
Observed mean difference: \[ T_{\text{obs}}=\bar d=\frac{2+(-1)+3+0+4+(-2)}{6}=\frac{6}{6}=1. \]
Under \(H_0\), each difference can be sign-flipped. There are \(2^6=64\) sign patterns. The exact two-sided p-value is the fraction of sign patterns with \(\left|\bar d^{(b)}\right|\ge 1\). For this dataset, that fraction equals \[ p=\frac{28}{64}=0.4375. \]
Interpretation: evidence against \(H_0\) is weak for a two-sided paired effect using the mean difference as statistic.
Example B: unpaired permutation test (label shuffling)
Scenario: two independent groups (\(n_1=5\), \(n_2=5\)); test for a difference in means using \(T=\bar X-\bar Y\).
| Group 1 values | Group 2 values |
|---|---|
| 12 | 8 |
| 9 | 7 |
| 11 | 9 |
| 10 | 6 |
| 13 | 10 |
Compute sample means: \[ \bar X=\frac{12+9+11+10+13}{5}=\frac{55}{5}=11, \qquad \bar Y=\frac{8+7+9+6+10}{5}=\frac{40}{5}=8. \] Observed statistic: \[ T_{\text{obs}}=\bar X-\bar Y=11-8=3. \]
Under \(H_0\), all \(N=10\) observations are exchangeable across labels. The number of distinct labelings is \[ \binom{10}{5}=252. \] Enumerating all 252 labelings and recomputing \(T^{(b)}\) gives an exact two-sided p-value: \[ p=\frac{\#\left\{b:\, \left|T^{(b)}\right|\ge 3\right\}}{252}=\frac{10}{252}\approx 0.03968. \]
Interpretation: evidence against \(H_0\) is strong at \(\alpha=0.05\) for a two-sided difference in means, assuming independent samples.
Visualization: permutation distribution and “extreme” regions
How to choose correctly between paired vs unpaired permutation tests
- Use a paired permutation test when each observation in one condition corresponds to a specific observation in the other condition (matched pairs). Randomize within each pair (swap labels or sign-flip differences).
- Use an unpaired permutation test when observations are independent across groups. Randomize by shuffling group labels across the pooled sample.
- Do not break the structure: treating paired data as unpaired discards pairing information and can misstate variability; treating unpaired data as paired invents pairings and invalidates the exchangeability argument.
Practical notes for reliable implementation
- Exact vs Monte Carlo: exact enumeration uses \(2^n\) (paired) or \(\binom{N}{n_1}\) (unpaired); otherwise approximate with a large number \(B\) of random permutations.
- Two-sided vs one-sided: for a two-sided test with symmetric statistic \(T\), count permutations with \(\left|T^{(b)}\right|\ge\left|T_{\text{obs}}\right|\); for a one-sided test, count \(T^{(b)}\ge T_{\text{obs}}\) (or \(\le\)).
- Choice of statistic: difference in means targets mean shifts; median-based or trimmed statistics target robust shifts. The permutation logic stays the same as long as the statistic is computed consistently for each shuffle.
- Reporting: state whether the test was paired vs unpaired, specify the statistic, and specify whether the p-value is exact or Monte Carlo (including \(B\)).
Correct use of paired vs unpaired permutation tests rests on matching the randomization scheme to the data-generating design: preserve pairs when pairs exist, and shuffle labels only when samples are independent.