The chi-square goodness of fit test real world example below shows how to test whether observed categorical data follow a claimed distribution. The method compares observed counts \(O_i\) to expected counts \(E_i\) and uses the chi-square distribution to quantify how unusual the discrepancies are under the null hypothesis.
Real-world scenario
A coffee shop expects the following long-run order proportions during morning hours: Espresso 25%, Latte 30%, Cappuccino 25%, Tea 20%. A random sample of \(n=200\) morning orders yields the observed counts: Espresso 40, Latte 70, Cappuccino 60, Tea 30. Determine whether the observed distribution matches the claimed proportions at \(\alpha=0.05\).
Step 1: State hypotheses
Let \(p_1,p_2,p_3,p_4\) denote the true category probabilities (Espresso, Latte, Cappuccino, Tea).
Step 2: Compute expected counts
For each category, \(E_i = n \cdot p_i\).
Conditions check
- Independence: the sample should be random and each order counted once.
- Expected counts: each \(E_i\) should be at least 5 (here: 50, 60, 50, 40).
- Categories: mutually exclusive and collectively exhaustive.
Step 3: Compute the chi-square statistic
The test statistic for a goodness-of-fit test is:
| Category | Observed \(O_i\) | Expected \(E_i\) | \((O_i-E_i)\) | \(\dfrac{(O_i-E_i)^2}{E_i}\) |
|---|---|---|---|---|
| Espresso | 40 | 50 | \(-10\) | \(\dfrac{(-10)^2}{50}=\dfrac{100}{50}=2.0000\) |
| Latte | 70 | 60 | \(10\) | \(\dfrac{(10)^2}{60}=\dfrac{100}{60}=1.6667\) |
| Cappuccino | 60 | 50 | \(10\) | \(\dfrac{(10)^2}{50}=\dfrac{100}{50}=2.0000\) |
| Tea | 30 | 40 | \(-10\) | \(\dfrac{(-10)^2}{40}=\dfrac{100}{40}=2.5000\) |
Step 4: Degrees of freedom
For a chi-square goodness-of-fit test with \(k\) categories and no parameters estimated from the data:
Here \(k=4\), so \(df=3\).
Step 5: p-value (or critical value) and conclusion
The p-value is the right-tail probability:
Since \(p \approx 0.0427 \le 0.05\), reject \(H_0\). The observed order pattern is inconsistent with the claimed distribution at the 5% significance level.
(Equivalently, the 0.05 critical value for \(df=3\) is approximately \(7.815\), and \(8.1667 > 7.815\).)
Step 6: Practical magnitude (effect size)
A common effect size for goodness-of-fit is Cramer's \(V\):
This indicates a small-to-moderate departure from the expected proportions in this sample (interpretation depends on context and domain standards).
Visualization: observed vs expected counts
Common pitfalls (quick checks)
- Using proportions instead of counts: the chi-square formula requires counts \(O_i\) and \(E_i\), not percentages.
- Small expected counts: if some \(E_i\) are below 5, combine categories when scientifically reasonable.
- Wrong test type: goodness-of-fit uses one categorical variable; independence/homogeneity use a two-way table.
- Interpretation: rejecting \(H_0\) indicates mismatch with the claimed distribution, not which specific causes produced the mismatch.