Type I and Type II errors, and statistical power
In hypothesis testing, you make a decision (reject or fail to reject \(H_0\)) based on sample data.
Because samples vary, mistakes are possible. The two classic error probabilities are:
\[
\alpha = P(\text{reject } H_0 \mid H_0 \text{ true}) \quad \text{(Type I error, false positive)}
\]
\[
\beta = P(\text{fail to reject } H_0 \mid H_1 \text{ true}) \quad \text{(Type II error, false negative)}
\]
The power of a test is the probability of correctly rejecting \(H_0\) when the alternative is true:
\[
\boxed{\text{power} = 1-\beta}
\]
Error matrix (outcomes table)
The same ideas can be organized in a 2×2 table:
1) Why \(\alpha\) is chosen first
In many scientific settings, a “false positive” is costly (claiming an effect when none exists).
So \(\alpha\) is typically set before seeing data (common choices: 0.05, 0.01).
A smaller \(\alpha\) makes it harder to reject \(H_0\), which tends to increase \(\beta\) unless you increase sample size.
2) How effect size and sample size change power
Power increases when:
- the true effect is larger (bigger separation between \(H_0\) and \(H_1\)),
- the sample size \(n\) is larger (smaller standard error),
- \(\alpha\) is larger (more rejection region), or
- you use a one-tailed test in the correct direction (when justified).
3) A common z-test power approximation
A widely used approximation assumes the standardized test statistic is normal:
\[
Z \sim \mathcal N(0,1) \text{ under } H_0,
\qquad
Z \sim \mathcal N(\delta,1) \text{ under } H_1.
\]
Here \(\delta\) is the “signal-to-noise” shift. In many mean-testing contexts, a simplified model is:
\[
\delta = d\sqrt{n},
\]
Where \(d\) is a standardized effect size (often introduced as “Cohen’s \(d\)” in basic power discussions).
4) Critical values for one- and two-tailed tests
Let \(z_q\) be the \(q\)-quantile of the standard normal distribution.
\[
\text{Two-tailed: reject if } |Z|\ge z_{1-\alpha/2}.
\]
\[
\text{Right-tailed: reject if } Z\ge z_{1-\alpha}.
\qquad
\text{Left-tailed: reject if } Z\le -z_{1-\alpha}.
\]
5) Computing \(\beta\) and power
\(\beta\) is the probability of landing in the non-rejection region when \(H_1\) is true.
With \(\Phi(\cdot)\) as the standard normal CDF and shift \(\delta\), common formulas are:
\[
\begin{aligned}
\text{Right-tailed: } & \beta = \Phi\!\left(z_{1-\alpha}-\delta\right), \quad \text{power}=1-\beta \\
\text{Left-tailed: } & \beta = 1-\Phi\!\left(-z_{1-\alpha}-\delta\right), \quad \text{power}=1-\beta \\
\text{Two-tailed: } & \beta = \Phi\!\left(z_{1-\alpha/2}-\delta\right)-\Phi\!\left(-z_{1-\alpha/2}-\delta\right).
\end{aligned}
\]
6) What a power curve shows
A power curve plots power versus sample size \(n\) (or sometimes effect size \(d\)).
It helps you choose a sample size that achieves a target power (often 0.80) for an expected effect size at a chosen \(\alpha\).
7) University extension: noncentral distributions and ROC curves
Exact power for many tests uses noncentral distributions (e.g., noncentral t for t-tests).
In classification-style language, \(\alpha\) relates to “false positive rate” and power relates to “true positive rate”;
varying the decision threshold traces an ROC curve.