Definition
The-sampling-distribution-of-the-sample-mean is the probability distribution of the statistic \(\bar X\) obtained by taking many repeated samples of the same size \(n\) from a population and computing the mean of each sample. It is a distribution of sample means, not a distribution of individual observations.
Setup and notation
Let a population have mean \(\mu\) and standard deviation \(\sigma\). A simple random sample of size \(n\) is drawn, producing observations \(X_1, X_2, \dots, X_n\). The sample mean is
\[ \bar X=\frac{1}{n}\sum_{i=1}^{n}X_i. \]
Center: mean of the sampling distribution
The sampling distribution of \(\bar X\) is centered at the population mean:
\[ \mathbb{E}(\bar X)=\mu. \]
Interpretation: over many repeated samples, the average of the sample means equals \(\mu\); \(\bar X\) is an unbiased estimator of \(\mu\).
Spread: standard deviation (standard error) of \(\bar X\)
The variability of sample means is smaller than the variability of individual observations. Under independence (or an appropriate sampling condition), the variance and standard deviation of \(\bar X\) are:
\[ \mathrm{Var}(\bar X)=\frac{\sigma^2}{n},\qquad \mathrm{SD}(\bar X)=\frac{\sigma}{\sqrt{n}}. \]
The quantity \(\sigma/\sqrt{n}\) is the standard error of the sample mean. Increasing \(n\) shrinks the standard error by a factor of \(\sqrt{n}\).
Shape: when is the sampling distribution (approximately) normal?
| Population distribution | Sample size \(n\) | Shape of the sampling distribution of \(\bar X\) |
|---|---|---|
| Normal | Any \(n\) | Exactly normal |
| Not normal (skewed or unknown) | Large enough \(n\) | Approximately normal by the Central Limit Theorem |
| Not normal with extreme outliers/heavy tails | Even large \(n\) may be needed | Approximation can be slow; robust methods may be preferable |
A common practical guideline is that \(n \ge 30\) often yields a good normal approximation, though the required \(n\) depends on how non-normal the population is.
Visualization: \(\bar X\) becomes more concentrated as \(n\) increases
Using the sampling distribution to compute probabilities
When the sampling distribution of \(\bar X\) is normal (exactly or approximately), probabilities about \(\bar X\) are found by standardizing:
\[ Z=\frac{\bar X-\mu}{\sigma/\sqrt{n}}. \]
Then \(P(\bar X \le a)\) becomes \(P\!\left(Z \le \dfrac{a-\mu}{\sigma/\sqrt{n}}\right)\), which can be evaluated using standard normal probabilities.
Numerical example
Suppose \(\mu=50\), \(\sigma=10\), and a sample size \(n=25\) is used. The standard error is
\[ \mathrm{SD}(\bar X)=\frac{10}{\sqrt{25}}=\frac{10}{5}=2. \]
To find \(P(\bar X>54)\), standardize \(54\):
\[ Z=\frac{54-50}{2}=2. \]
\[ P(\bar X>54)=P(Z>2)=1-P(Z\le 2)\approx 1-0.9772=0.0228. \]
Common pitfalls
- Mixing up distributions: the population distribution describes individual \(X\); the sampling distribution describes \(\bar X\).
- Forgetting the \(\sqrt{n}\) effect: the spread of \(\bar X\) is \(\sigma/\sqrt{n}\), not \(\sigma/n\).
- Assuming normality too quickly: the Central Limit Theorem supports approximation, but extreme skewness/outliers can require larger \(n\).
The sampling distribution of the sample mean provides the theoretical bridge from a population \((\mu,\sigma)\) to probability statements about \(\bar X\), enabling confidence intervals and hypothesis tests for \(\mu\).