Spearman Rho Rank Correlation Coefficient Test
Spearman’s rank correlation coefficient, denoted \(\rho_s\), measures the strength and direction of a
monotonic association between two variables. It is a nonparametric alternative to Pearson’s correlation,
because it is based on ranks rather than the original values. This makes it more robust to outliers and useful
when the relationship is monotone but not necessarily linear.
Data and ranking
We observe paired data \((X_i, Y_i)\) for \(i=1,2,\dots,n\). Convert each variable to ranks:
\[
\begin{aligned}
r_{X,i} &= \operatorname{rank}(X_i), \\
r_{Y,i} &= \operatorname{rank}(Y_i).
\end{aligned}
\]
If ties occur (equal values), a standard approach is to assign average ranks within each tie group.
Spearman’s method supports ties; however, ties can slightly change the numeric value of \(\rho_s\) depending on the
rank convention.
Definition of \(\rho_s\)
The Spearman coefficient is the Pearson correlation applied to the rank variables:
\[
\begin{aligned}
\rho_s
&= \operatorname{corr}(r_X, r_Y) \\
&= \frac{\sum_{i=1}^{n}\left(r_{X,i}-\bar r_X\right)\left(r_{Y,i}-\bar r_Y\right)}
{\sqrt{\sum_{i=1}^{n}\left(r_{X,i}-\bar r_X\right)^2}\;\sqrt{\sum_{i=1}^{n}\left(r_{Y,i}-\bar r_Y\right)^2}}.
\end{aligned}
\]
Classic teaching formula (no ties)
When there are no ties, a commonly taught equivalent formula uses rank differences:
\[
\begin{aligned}
d_i &= r_{X,i} - r_{Y,i}, \\
\rho_s &= 1 - \frac{6\sum_{i=1}^{n} d_i^2}{n(n^2-1)}.
\end{aligned}
\]
If ties are present, this expression may not match the correlation-of-ranks definition exactly. For tied data, the
recommended computation is always \(\rho_s = \operatorname{corr}(r_X, r_Y)\).
Hypothesis test
The Spearman rank correlation test checks whether there is evidence of a monotonic association in the population.
Typical hypotheses are:
\[
\begin{aligned}
H_0 &: \rho_s = 0, \\
H_1 &: \rho_s \ne 0 \quad \text{(two-sided)}
\end{aligned}
\]
Directional alternatives are also possible:
\[
\begin{aligned}
H_1 &: \rho_s > 0 \quad \text{(positive association)}, \\
H_1 &: \rho_s < 0 \quad \text{(negative association)}.
\end{aligned}
\]
t-approximation p-value
A common large-sample approach uses a t-approximation with \(\text{df}=n-2\):
\[
\begin{aligned}
t &= \rho_s\sqrt{\frac{n-2}{1-\rho_s^2}}, \\
\text{df} &= n-2.
\end{aligned}
\]
The p-value depends on the alternative:
\[
\begin{aligned}
p\text{-value}
&=
\begin{cases}
2\left(1 - F_t(|t|;\text{df})\right), & \text{two-sided}, \\
1 - F_t(t;\text{df}), & \rho_s > 0, \\
F_t(t;\text{df}), & \rho_s < 0,
\end{cases}
\end{aligned}
\]
where \(F_t(\cdot;\text{df})\) is the CDF of the t distribution.
Permutation p-value (recommended for small \(n\))
For small samples, a permutation approach is often more reliable. Under \(H_0\), the pairing between X and Y is
exchangeable. A permutation procedure:
- Compute the observed \(\rho_s\) from the ranks.
- Randomly permute the Y values (or their ranks) and recompute \(\rho_s\).
- Repeat \(B\) times to obtain a reference distribution \(\rho_s^{(1)},\dots,\rho_s^{(B)}\).
-
Estimate the p-value as the fraction of permuted correlations that are at least as extreme as the observed value.
\[
\begin{aligned}
p\text{-value}
&\approx \frac{1 + \#\{\rho_s^{(b)} \text{ as/extreme as observed}\}}{1 + B}.
\end{aligned}
\]
For a two-sided test, “as extreme” means \(\left|\rho_s^{(b)}\right| \ge \left|\rho_s\right|\).
Optional bootstrap confidence interval
A bootstrap CI can be constructed by repeatedly resampling the \(n\) pairs \((X_i,Y_i)\) with replacement and
recomputing \(\rho_s\) each time. A percentile CI at confidence level \(1-\alpha\) is:
\[
\begin{aligned}
\left[\rho_s^{(\alpha/2)},\;\rho_s^{(1-\alpha/2)}\right],
\end{aligned}
\]
where \(\rho_s^{(p)}\) denotes the p-th quantile of the bootstrap distribution.
Decision
Using significance level \(\alpha\):
\[
\begin{aligned}
\text{Reject } H_0 &\text{ if } p\text{-value} \le \alpha, \\
\text{otherwise } &\text{fail to reject } H_0.
\end{aligned}
\]
Interpretation: A significant result suggests evidence of a monotonic relationship between X and Y, but it does not
imply causation and does not describe the exact functional form (only monotone association).