Determining the Sample Size for the Estimation of a Proportion
When we estimate a population proportion \(p\) with a sample proportion
\(\hat p\), we often want the estimate to be within a chosen margin of error
\(E\) at a given confidence level.
Definitions
-
Sample proportion:
\(\hat p = \dfrac{x}{n}\), where \(x\) is the number of successes in a sample of size
\(n\).
-
Complement:
\(\hat q = 1 - \hat p\).
-
Large-sample guideline for using the normal approximation:
\[
n\hat p \ge 5 \quad \text{and} \quad n\hat q \ge 5.
\]
Standard error of \(\hat p\)
When \(p\) is unknown, we estimate the standard deviation of the sampling distribution of
\(\hat p\) using
\[
s_{\hat p} = \sqrt{\frac{\hat p\hat q}{n}}.
\]
Confidence interval and margin of error
A two-sided \((1-\alpha)100\%\) confidence interval for \(p\) is
\[
\hat p \pm z_{\alpha/2}\, s_{\hat p}
\;=\;
\hat p \pm z_{\alpha/2}\sqrt{\frac{\hat p\hat q}{n}}.
\]
The margin of error is the amount added/subtracted:
\[
E = z_{\alpha/2}\sqrt{\frac{\hat p\hat q}{n}}.
\]
Solving for the required sample size
Rearranging the margin-of-error formula to solve for \(n\) gives
\[
\begin{aligned}
E &= z_{\alpha/2}\sqrt{\frac{\hat p\hat q}{n}} \\
\frac{E}{z_{\alpha/2}} &= \sqrt{\frac{\hat p\hat q}{n}} \\
\left(\frac{E}{z_{\alpha/2}}\right)^2 &= \frac{\hat p\hat q}{n} \\
n &= \frac{z_{\alpha/2}^{2}\,\hat p\,\hat q}{E^{2}}.
\end{aligned}
\]
Since sample size must be an integer, we always round up:
\[
n_{\text{required}} = \left\lceil \frac{z_{\alpha/2}^{2}\,\hat p\,\hat q}{E^{2}} \right\rceil.
\]
Choosing \(\hat p\) when it is unknown
-
Most conservative approach: if no preliminary estimate is available, use
\(\hat p = 0.50\) and \(\hat q = 0.50\).
This maximizes \(\hat p\hat q\) and produces the largest required sample size.
-
Preliminary (pilot) sample: take an initial sample, compute \(\hat p = x/n\),
then use that value in the sample-size formula.
How inputs affect \(n\)
-
Smaller \(E\) requires a larger sample size because
\[
n \propto \frac{1}{E^{2}}.
\]
-
Higher confidence level increases \(z_{\alpha/2}\), which increases the required
\(n\).