Mean, standard deviation, and shape of the sampling distribution of p̂
The sample proportion p̂ is a statistic (a random variable) because it changes from sample to sample.
The sampling distribution of p̂ describes the values that p̂ can take and the probability of each value.
Key definitions
Sample proportion. If x is the number of “successes” in a sample of size n, then
\[
\begin{aligned}
\hat p &= \frac{x}{n}
\end{aligned}
\]
Complement. If the population proportion is p, define
\[
\begin{aligned}
q &= 1 - p
\end{aligned}
\]
Mean and standard deviation
Mean. The sampling distribution of p̂ is centered at the true population proportion:
\[
\begin{aligned}
\mu_{\hat p} &= p
\end{aligned}
\]
Standard deviation (large population / independent trials).
\[
\begin{aligned}
\sigma_{\hat p} &= \sqrt{\frac{p\,q}{n}}
\end{aligned}
\]
Standard deviation (finite population correction).
If sampling is done without replacement from a finite population of size N and the sampling fraction is not small,
the standard deviation is reduced by a correction factor:
\[
\begin{aligned}
\sigma_{\hat p}
&= \sqrt{\frac{p\,q}{n}}\cdot \sqrt{\frac{N-n}{N-1}}
\end{aligned}
\]
Practical guideline: when n/N is at most 0.05, the correction factor is close to 1 and is often omitted.
Shape (normal approximation rule for p̂)
For sufficiently large samples, the sampling distribution of p̂ is approximately normal.
A common rule is to check that both expected counts exceed 5:
\[
\begin{aligned}
n\,p &> 5 \\
n\,q &> 5
\end{aligned}
\]
If one of these is not satisfied, the sampling distribution of p̂ can be noticeably skewed,
especially when p is close to 0 or 1.
Worked example (numbers only)
Step 1. Suppose p = 0.56 and n = 1500. Compute q.
\[
\begin{aligned}
q &= 1 - p \\
&= 1 - 0.56 \\
&= 0.44
\end{aligned}
\]
Step 2. Mean and standard deviation.
\[
\begin{aligned}
\mu_{\hat p} &= p = 0.56 \\
\sigma_{\hat p} &= \sqrt{\frac{p\,q}{n}}
= \sqrt{\frac{0.56\cdot 0.44}{1500}}
\approx 0.0128
\end{aligned}
\]
Step 3. Check the normal approximation rule.
\[
\begin{aligned}
n\,p &= 1500 \cdot 0.56 = 840 \\
n\,q &= 1500 \cdot 0.44 = 660
\end{aligned}
\]
Since both values are greater than 5, the sampling distribution of p̂ is well-approximated by a normal curve in this case.