In statistics, how is the Mann–Whitney U test (Wilcoxon rank sum) carried out for two independent samples, and how is the conclusion interpreted?

The Mann–Whitney U (Wilcoxon rank-sum) test pools and ranks both samples, converts rank sums into a U statistic, and uses an exact or normal-approximation p-value to assess whether the two independent populations differ in location/distribution.

Mann–Whitney U Test (Wilcoxon Rank-Sum) for Two Independent Samples

Accepted answer Answer included

The mann whitney u test wilcoxon rank sum (also called the Wilcoxon rank-sum test) is a nonparametric method for comparing two independent samples using the ranks of the pooled observations rather than assuming normality.

When the test is appropriate

Two independent samples (no pairing or repeated measurements between groups).
Outcome is at least ordinal (ranks make sense) and often continuous.
Primary goal: detect a systematic shift between groups; under similar-shape distributions, this is often interpreted as a difference in medians.

Hypotheses and test idea

Let Group 1 have sample size \(n_1\) and Group 2 have sample size \(n_2\), with total \(N=n_1+n_2\). The test begins by ranking all \(N\) observations together (smallest rank 1, largest rank \(N\)), then comparing how large the ranks tend to be in one group versus the other.

\[ H_0:\ \text{the two population distributions are the same} \qquad H_A:\ \text{the distributions differ (two-sided) or one tends to be larger (one-sided)} \]

Core statistics: rank sum \(R_1\) (or \(W\)) and Mann–Whitney \(U\)

Step 1: Pool and rank (tie rule)

Combine both samples and assign ranks 1 through \(N\). If ties occur, assign each tied value the average of the ranks it would have occupied.

Step 2: Compute rank sums

Let \(R_1\) be the sum of ranks for Group 1 (often called \(W\), the Wilcoxon rank-sum statistic). Similarly define \(R_2\).

\[ R_1=\sum_{i \in \text{Group 1}} \text{rank}(x_i), \qquad R_2=\sum_{j \in \text{Group 2}} \text{rank}(y_j) \]

Step 3: Convert to \(U\)

The Mann–Whitney statistics are:

\[ U_1 = R_1 - \frac{n_1(n_1+1)}{2}, \qquad U_2 = R_2 - \frac{n_2(n_2+1)}{2} \] \[ U_1 + U_2 = n_1 n_2 \]

The smaller of \(U_1\) and \(U_2\) is often used as \(U_{\min}\) for a two-sided test because it measures how far the rank allocation deviates from balance.

Worked example (with full ranking)

Consider two independent samples (Group A and Group B), each of size 5:

Observation	Group	Pooled order	Rank
8	B	1st	1
9	B	2nd	2
10	A	3rd	3
11	B	4th	4
12	A	5th	5
13	B	6th	6
14	A	7th	7
15	A	8th	8
16	B	9th	9
18	A	10th	10

\[ n_1=n_2=5,\quad N=10 \] \[ R_A = 3+5+7+8+10 = 33,\qquad R_B = 1+2+4+6+9 = 22 \] \[ U_A = R_A - \frac{n_1(n_1+1)}{2} = 33 - \frac{5\cdot 6}{2} = 33 - 15 = 18 \] \[ U_B = R_B - \frac{n_2(n_2+1)}{2} = 22 - 15 = 7 \] \[ U_{\min}=\min(18,7)=7 \]

From \(U\) to a p-value (normal approximation)

For moderate to large samples, \(U\) is commonly standardized to a \(z\)-score under \(H_0\). The mean and (no-ties) standard deviation are:

\[ \mu_U = \frac{n_1 n_2}{2}, \qquad \sigma_U = \sqrt{\frac{n_1 n_2 (N+1)}{12}} \]

With ties, the variance is reduced. If tie groups have sizes \(t_1,t_2,\dots\), a common correction is:

\[ \sigma_U^2 = \frac{n_1 n_2}{12} \left[ (N+1) - \frac{\sum_k (t_k^3 - t_k)}{N(N-1)} \right] \]

Applying the no-ties approximation to the example (and using a continuity correction because \(U_{\min}\) is below \(\mu_U\)):

\[ \mu_U=\frac{5\cdot 5}{2}=12.5, \qquad \sigma_U=\sqrt{\frac{5\cdot 5\cdot 11}{12}}=\sqrt{\frac{275}{12}} \approx 4.787 \] \[ z \approx \frac{U_{\min}-\mu_U+0.5}{\sigma_U} = \frac{7-12.5+0.5}{4.787} = \frac{-5}{4.787} \approx -1.044 \]

A two-sided p-value is obtained as \(p = 2\cdot P(Z \le -|z|)\). The conclusion depends on the chosen significance level \(\alpha\) (commonly 0.05).

Interpretation of the decision

If \(p \le \alpha\): evidence that the two independent populations differ in location/distribution (often described as one group tending to have larger values).
If \(p > \alpha\): insufficient evidence to claim a difference; this does not prove the distributions are identical.

Effect size (recommended alongside the p-value)

Common-language effect size

A useful probability interpretation is:

\[ \hat{A}=\frac{U_1}{n_1 n_2} \]

\(\hat{A}\) estimates the probability that a randomly chosen observation from Group 1 exceeds a randomly chosen observation from Group 2 (with a standard tie convention depending on software).

Rank-biserial correlation

\[ r_{rb}=\frac{U_1-U_2}{n_1 n_2}=1-\frac{2U_{\min}}{n_1 n_2} \]

For the example:

\[ \hat{A}=\frac{18}{25}=0.72, \qquad r_{rb}=\frac{18-7}{25}=\frac{11}{25}=0.44 \]

Visualization: pooled order with group membership

Each marker sits at its pooled rank position. Group A occupies more of the higher ranks (7, 8, 10), which matches \(U_A=18\) and \(U_B=7\): Group A tends to have larger values than Group B in this sample.

Common pitfalls and reporting checklist

Independence: paired data require the Wilcoxon signed-rank test, not the rank-sum/Mann–Whitney U.
Ties: use average ranks and apply a tie correction when using a normal approximation.
Interpretation: the test detects distributional differences; “median difference” is most defensible under a shift model with similarly shaped distributions.
Report: \(n_1,n_2\), the statistic (\(U\) or \(W\)), p-value (exact or approximate), and an effect size (such as \(\hat{A}\) or \(r_{rb}\)).

Vote on the accepted answer

Upvotes: 1 Downvotes: 0 Score: 1