Wilcoxon Rank Sum / Mann–Whitney U
A nonparametric test for comparing two independent samples. It uses the ranks of the combined data
(not the raw values) and is often described as a test of a location shift between the two populations.
Independent samples
Rank-based (robust to outliers)
Works for ordinal / non-normal data
When there are no ties and distributions have the same shape, this test is commonly interpreted as comparing medians.
More generally, it detects differences in location/distribution using ranks.
When to use it
- You have two separate groups (Sample A and Sample B).
- Data are at least ordinal (can be ranked); continuous data is ideal.
- You want an alternative to the two-sample t-test when normality is doubtful or outliers exist.
Hypotheses
\[
\begin{aligned}
H_0 &: \text{The two populations have the same distribution (no location shift).} \\
H_1 &: \text{Two-sided: the distributions differ} \\
&\text{Right-tailed: A tends to be larger than B} \\
&\text{Left-tailed: A tends to be smaller than B}
\end{aligned}
\]
How the test works (ranking)
Combine all observations from both groups, sort them, and assign ranks \(1,2,\dots,N\).
If values are tied, use midranks (average of the ranks that would have been assigned).
\[
\begin{aligned}
N &= n_1 + n_2 \\
R_1 &= \sum_{i \in A} \text{rank}(x_i), \qquad
R_2 = \sum_{i \in B} \text{rank}(x_i)
\end{aligned}
\]
Mann–Whitney statistics
\[
\begin{aligned}
U_1 &= R_1 - \frac{n_1(n_1+1)}{2}, \\
U_2 &= n_1 n_2 - U_1
\end{aligned}
\]
Many references report \(U = \min(U_1, U_2)\) for two-sided tests. In this calculator, \(U_1\) is tied to “A vs B”
and the chosen alternative determines which tail is used.
p-value methods
- Exact (recommended for small samples): uses the exact distribution of \(U\) when ties are absent.
- Permutation (Monte Carlo): simulates random re-labelings of group membership to approximate the tail probability (works with ties).
- Normal approximation: for larger samples, \(U\) is approximated by a normal distribution (usually with tie correction and optional continuity correction).
Normal approximation (with tie correction)
\[
\begin{aligned}
\mu_U &= \frac{n_1 n_2}{2} \\
\sigma_U^2 &= \frac{n_1 n_2}{12}\left[(N+1) - \frac{\sum_j (t_j^3 - t_j)}{N(N-1)}\right] \\
z &= \frac{U_1 - \mu_U - c}{\sigma_U}
\end{aligned}
\]
Here \(t_j\) are the tie group sizes (e.g., a value repeated 3 times has \(t=3\)).
The continuity correction \(c\) is typically \(\pm 0.5\) (optional).
Effect size (recommended interpretation)
A rank-based effect size is often more informative than just a p-value.
Common language / AUC
\[
A = \frac{U_1}{n_1 n_2}
\]
\(A\) can be interpreted as approximately
\(P(X_A > X_B) + 0.5\,P(X_A = X_B)\).
So \(A=0.50\) suggests no tendency for A to be larger than B.
Cliff’s delta
\[
\delta = 2A - 1
\]
\(\delta \in [-1,1]\). Positive values suggest A tends to be larger; negative values suggest B tends larger.
Effect sizes help convey practical importance:
a statistically significant result can still have a very small effect, and vice versa.
Key assumptions and notes
- Observations are independent within and between groups.
- Measurement scale is at least ordinal.
- For a “median shift” interpretation, the two distributions should have a similar shape/spread.
- Ties are allowed, but they require midranks and (for normal approximation) tie correction.