Population and Sampling Distributions
This topic connects two ideas: (1) a probability model for the entire population (population distribution),
and (2) the probability model for a statistic computed from repeated samples (sampling distribution).
Population distribution
A population distribution is the probability distribution of the population data.
When the population is finite and fully known, probabilities are often built from relative frequencies.
If a value x appears f times in a population of size N, then:
\[
P(x) = \frac{f}{N}, \qquad \sum P(x) = 1
\]
Population mean and population variance are computed directly from the distribution:
\[
\mu = \sum x \cdot P(x), \qquad
\sigma^{2} = \sum (x-\mu)^{2} \cdot P(x), \qquad
\sigma = \sqrt{\sigma^{2}}
\]
Sampling distribution of the sample mean x̄
A sampling distribution describes how a statistic varies across repeated samples of the same size drawn
from the same population. For this topic, the statistic is the sample mean:
\[
\bar{x} = \frac{x_{1}+x_{2}+\cdots+x_{n}}{n}
\]
The sampling distribution of x̄ lists the possible values that x̄ can take and the probability of each value.
In a finite population, you can build it by:
- Listing all possible samples of size
n (exact when feasible),
- Computing x̄ for each sample,
- Converting frequencies of x̄ values into probabilities.
Number of possible samples depends on the sampling method:
\[
\text{Without replacement: } \binom{N}{n}, \qquad
\text{With replacement: } N^{n}
\]
Key results for x̄ (mean and standard error)
The sampling distribution has its own mean and spread. Two results are especially important:
-
Mean of x̄ matches the population mean:
\[
E(\bar{x}) = \mu
\]
-
Standard error of x̄ measures typical sampling-to-sampling variation.
\[
SE(\bar{x}) = \frac{\sigma}{\sqrt{n}} \quad \text{(with replacement)}
\]
\[
SE(\bar{x}) = \frac{\sigma}{\sqrt{n}} \cdot \sqrt{\frac{N-n}{N-1}}
\quad \text{(without replacement)}
\]
In practice, if the exact list of all samples is too large, a simulation (many random samples) provides a good
approximation to the sampling distribution.
Probability properties used
- Bounds:
0 ≤ P(·) ≤ 1
- Total probability:
Σ P(·) = 1
-
Expected value: \(E(X)=\sum x \cdot P(x)\)
-
Variance and standard deviation:
\(Var(X)=\sum (x-E(X))^{2}\cdot P(x)\), \(SD(X)=\sqrt{Var(X)}\)
Tip: Use the calculator to paste your population (or import CSV), then compare the exact sampling distribution
(when feasible) to a simulated one. The two should align closely when the number of simulation trials is large.