Paired t-test (dependent samples)
A paired t-test compares two measurements taken on the same units (people, devices, matched items),
such as before vs after a treatment. Instead of treating the samples as independent, we reduce the problem to
a single list of differences. This usually increases sensitivity because each unit acts as its own control.
1) Data setup and hypotheses
For each pair \(i=1,\dots,n\), we observe \(\text{Before}_i\) and \(\text{After}_i\). Define:
\[
d_i=\text{After}_i-\text{Before}_i.
\]
The paired t-test is then a one-sample t-test on \(\{d_i\}\):
\[
H_0:\ \mu_d=0
\qquad\text{vs}\qquad
H_1:\ \mu_d\ne 0\ (\text{two-sided})
\]
You can also use a one-sided alternative:
\(H_1:\mu_d>0\) (increases on average) or \(H_1:\mu_d<0\) (decreases on average).
2) Test statistic
Compute the sample mean and sample standard deviation of the differences:
\[
\bar d=\frac{1}{n}\sum_{i=1}^{n} d_i,
\qquad
s_d=\sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(d_i-\bar d)^2}.
\]
The standard error of \(\bar d\) is:
\[
SE=\frac{s_d}{\sqrt{n}}.
\]
The paired t statistic is:
\[
t=\frac{\bar d}{SE}=\frac{\bar d}{s_d/\sqrt{n}}.
\]
Under \(H_0\) (and assumptions below), this statistic follows a Student t distribution with
degrees of freedom:
\[
df=n-1.
\]
3) p-value interpretation
Let \(F_{t,df}(\cdot)\) be the CDF of the t distribution with \(df=n-1\). Then:
\[
p =
\begin{cases}
2\min\{F_{t,df}(t),\,1-F_{t,df}(t)\}, & \text{two-sided}\\[6pt]
1-F_{t,df}(t), & \text{right-tailed}\\[6pt]
F_{t,df}(t), & \text{left-tailed}
\end{cases}
\]
A small p-value indicates the observed mean difference \(\bar d\) would be unlikely if the true mean difference were zero.
4) Why pairing helps
Pairing removes between-subject variability. If people differ a lot in baseline measurements, comparing the raw
before and after samples as independent can inflate noise. Differences \(d_i\) focus directly on the change
for each individual.
5) Assumptions and checks
- Independence across pairs: the pairs \((\text{Before}_i,\text{After}_i)\) are independent of each other.
- Approximately normal differences: the list \(\{d_i\}\) is approximately normal (especially important for small \(n\)).
- No extreme outliers in \(d_i\): strong outliers can dominate \(\bar d\) and inflate \(s_d\).
If differences are strongly non-normal and \(n\) is small, a common alternative is the Wilcoxon signed-rank test.
6) Worked example
Before: \([80,85]\), After: \([82,88]\). Differences:
\[
d=[2,3],\quad
\bar d=\frac{2+3}{2}=2.5.
\]
The sample SD of differences is:
\[
s_d=\sqrt{\frac{(2-2.5)^2+(3-2.5)^2}{2-1}}
=\sqrt{0.5}\approx 0.707.
\]
\[
SE=\frac{0.707}{\sqrt{2}}\approx 0.5,\qquad
t=\frac{2.5}{0.5}=5,\qquad df=1.
\]
With extremely small \(n\), p-values can be unintuitive; real studies typically require larger samples.
Use this tool to compute exact t-based p-values for the paired differences.
7) University extensions
- Repeated measures ANOVA: extends pairing to 3+ time points or conditions.
- Mixed-effects models: handle repeated measures with missing data and random effects.
- Effect sizes for paired designs: e.g., standardized mean change or \(d_z=\bar d/s_d\).