Why hypothesis tests are used
A hypothesis test is a structured way to decide whether sample evidence is strong enough to challenge a claim
about a population parameter (a mean μ or a proportion p). The key idea is that
sample results vary because of random sampling. A test helps determine whether the observed difference from a
claimed value is plausibly due to chance, or too large to attribute to chance alone.
Core goal: Use a sample statistic (like x̄ or p̂) to make a decision about a population parameter (μ or p).
Two hypotheses
Every test is built from two competing statements:
-
Null hypothesis (H0): the claim assumed true unless evidence suggests otherwise.
It always includes equality ( = ) or a “boundary” form (≥ or ≤).
-
Alternative hypothesis (H1): the claim supported when evidence contradicts H0.
It uses one of the signs: ≠, <, or >.
Important: In hypotheses we write the population parameter (μ or p), not the sample statistic (x̄ or p̂).
Rejection, nonrejection, and critical values
The test procedure divides possible sample outcomes into two regions:
- Rejection region: results so extreme that they are unlikely under H0. If the sample lands here, we reject H0.
- Nonrejection region: results not extreme enough to contradict H0. If the sample lands here, we do not reject H0.
The boundary between these regions is given by one or two critical values. The size of the rejection region
is controlled by the significance level α.
Significance level (α) and Type I error
The significance level α is the probability of rejecting a true null hypothesis (a Type I error).
\[
\alpha = P(\text{reject }H_0 \mid H_0 \text{ is true})
\]
Smaller α means a stricter test (harder to reject H0), but it generally increases β unless the sample size increases.
Type II error (β) and power
A Type II error occurs when we fail to reject H0 even though H0 is false.
\[
\beta = P(\text{do not reject }H_0 \mid H_0 \text{ is false})
\]
The power of the test is the probability that the test correctly rejects H0 when H0 is false:
\[
\text{Power} = 1 - \beta
\]
For a fixed sample size, α and β typically move in opposite directions. Increasing the sample size can reduce both.
Tail(s) of a test
The sign in H1 determines where the rejection region sits:
Test statistic (z) and the p-value
Many introductory tests standardize the distance between the sample statistic and the null value into a z-score.
The p-value is then the probability (under H0) of getting a result at least as extreme as the observed one.
Decision rule (p-value approach): Reject H0 if p-value ≤ α. Otherwise, do not reject H0.
Intro z setup for a population mean
When testing a mean with known population standard deviation σ, the standard error is σ/√n and:
\[
z=\frac{\bar{x}-\mu_0}{\sigma/\sqrt{n}}
\]
Intro z setup for a population proportion
For a proportion, p̂ = x/n and the standard error under H0 uses p0:
\[
z=\frac{\hat{p}-p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}
\]
Normal approximation check (common rule): n·p0 ≥ 5 and n·(1-p0) ≥ 5.
p-value formulas by tail type
If Z is standard normal and Φ is its CDF:
How this calculator uses these ideas
- You choose parameter type (mean or proportion), tail type, and α.
- You enter the null value (μ0 or p0) and sample information.
- The calculator computes z, the critical value(s), the p-value, and the decision (reject / do not reject H0).
- Optional: if you provide an assumed true value (μtrue or ptrue), it estimates β and power.
Data entry tip: You can paste raw values directly from a CSV file (comma/newline separated). For proportions, paste 0/1 values (or true/false).