Loading…

Chi-Square Goodness of Fit Test: Real World Example With Full Calculation

Using a chi-square goodness of fit test real world example, how can observed category counts be tested against a claimed distribution using \( \chi^2=\sum (O-E)^2/E \) and a p-value decision?

Subject: Statistics Chapter: Chi Square Tests Topic: A Goodness of Fit Test Answer included
chi-square goodness of fit test real world example chi-square goodness-of-fit chi-square test observed vs expected categorical distribution chi-square statistic degrees of freedom p-value
Accepted answer Answer included

The chi-square goodness of fit test real world example below shows how to test whether observed categorical data follow a claimed distribution. The method compares observed counts \(O_i\) to expected counts \(E_i\) and uses the chi-square distribution to quantify how unusual the discrepancies are under the null hypothesis.

Real-world scenario

A coffee shop expects the following long-run order proportions during morning hours: Espresso 25%, Latte 30%, Cappuccino 25%, Tea 20%. A random sample of \(n=200\) morning orders yields the observed counts: Espresso 40, Latte 70, Cappuccino 60, Tea 30. Determine whether the observed distribution matches the claimed proportions at \(\alpha=0.05\).

Step 1: State hypotheses

Let \(p_1,p_2,p_3,p_4\) denote the true category probabilities (Espresso, Latte, Cappuccino, Tea).

\[ H_0:\ (p_1,p_2,p_3,p_4)=(0.25,0.30,0.25,0.20) \qquad\text{vs}\qquad H_A:\ \text{at least one } p_i \text{ differs.} \]

Step 2: Compute expected counts

For each category, \(E_i = n \cdot p_i\).

\[ E_{\text{Espresso}} = 200 \cdot 0.25 = 50,\quad E_{\text{Latte}} = 200 \cdot 0.30 = 60,\quad E_{\text{Cappuccino}} = 200 \cdot 0.25 = 50,\quad E_{\text{Tea}} = 200 \cdot 0.20 = 40 \]

Conditions check

  • Independence: the sample should be random and each order counted once.
  • Expected counts: each \(E_i\) should be at least 5 (here: 50, 60, 50, 40).
  • Categories: mutually exclusive and collectively exhaustive.

Step 3: Compute the chi-square statistic

The test statistic for a goodness-of-fit test is:

\[ \chi^2=\sum_{i=1}^{k}\frac{(O_i-E_i)^2}{E_i} \]
Category Observed \(O_i\) Expected \(E_i\) \((O_i-E_i)\) \(\dfrac{(O_i-E_i)^2}{E_i}\)
Espresso 40 50 \(-10\) \(\dfrac{(-10)^2}{50}=\dfrac{100}{50}=2.0000\)
Latte 70 60 \(10\) \(\dfrac{(10)^2}{60}=\dfrac{100}{60}=1.6667\)
Cappuccino 60 50 \(10\) \(\dfrac{(10)^2}{50}=\dfrac{100}{50}=2.0000\)
Tea 30 40 \(-10\) \(\dfrac{(-10)^2}{40}=\dfrac{100}{40}=2.5000\)
\[ \chi^2 = 2.0000 + 1.6667 + 2.0000 + 2.5000 = 8.1667 \]

Step 4: Degrees of freedom

For a chi-square goodness-of-fit test with \(k\) categories and no parameters estimated from the data:

\[ df = k - 1 \]

Here \(k=4\), so \(df=3\).

Step 5: p-value (or critical value) and conclusion

The p-value is the right-tail probability:

\[ p = P\!\left(\chi^2_{(3)} \ge 8.1667\right) \approx 0.0427 \]

Since \(p \approx 0.0427 \le 0.05\), reject \(H_0\). The observed order pattern is inconsistent with the claimed distribution at the 5% significance level.

(Equivalently, the 0.05 critical value for \(df=3\) is approximately \(7.815\), and \(8.1667 > 7.815\).)

Step 6: Practical magnitude (effect size)

A common effect size for goodness-of-fit is Cramer's \(V\):

\[ V=\sqrt{\frac{\chi^2}{n \cdot (k-1)}} = \sqrt{\frac{8.1667}{200 \cdot 3}} = \sqrt{\frac{8.1667}{600}} \approx \sqrt{0.013611} \approx 0.1166 \]

This indicates a small-to-moderate departure from the expected proportions in this sample (interpretation depends on context and domain standards).

Visualization: observed vs expected counts

0 20 40 60 80 Espresso Latte Cappuccino Tea Observed Expected Observed vs Expected Counts (Goodness-of-Fit)
The largest discrepancies are visible where the observed bar diverges from the expected outline, which drives the statistic \( \chi^2 = 8.1667 \) for \(df=3\).

Common pitfalls (quick checks)

  • Using proportions instead of counts: the chi-square formula requires counts \(O_i\) and \(E_i\), not percentages.
  • Small expected counts: if some \(E_i\) are below 5, combine categories when scientifically reasonable.
  • Wrong test type: goodness-of-fit uses one categorical variable; independence/homogeneity use a two-way table.
  • Interpretation: rejecting \(H_0\) indicates mismatch with the claimed distribution, not which specific causes produced the mismatch.
Vote on the accepted answer
Upvotes: 1 Downvotes: 0 Score: 1
Community answers No approved answers yet

No approved community answers are published yet. You can submit one below.

Submit your answer Moderated before publishing

Plain text only. Your name is required. Links, HTML, and scripts are blocked.

Fresh

Most recent questions

109 questions · Sorted by newest first

Showing 1–10 of 109
per page
  1. Mar 5, 2026 Published
    Formula of the Variance (Population and Sample)
    Statistics Numerical Descriptive Measures Measures of Dispersion for Ungrouped Data
  2. Mar 5, 2026 Published
    Mean Median Mode Calculator (Formulas, Interpretation, and Example)
    Statistics Numerical Descriptive Measures Measures of Central Tendency for Ungrouped Data
  3. Mar 4, 2026 Published
    How to Calculate Standard Deviation in Excel (STDEV.S vs STDEV.P)
    Statistics Numerical Descriptive Measures Measures of Dispersion for Ungrouped Data
  4. Mar 4, 2026 Published
    Suppose T and Z Are Random Variables: How T Relates to Z in the t Distribution
    Statistics Estimation of the Mean and Proportion Estimation of a Population Mean σ Not Known the T Distribution
  5. Mar 4, 2026 Published
    What Does R Squared Mean in Statistics (Coefficient of Determination)
    Statistics Simple Linear Regression Coefficient of Determination
  6. Mar 3, 2026 Published
    Box and Plot Graph (Box Plot) Explained
    Statistics Numerical Descriptive Measures Box and Whisker Plot
  7. Mar 3, 2026 Published
    How to Calculate a Z Score
    Statistics Continuous Random Variables and the Normal Distribution Standardizing a Normal Distribution
  8. Mar 3, 2026 Published
    How to Calculate Relative Frequency
    Statistics Organizing and Graphing Data Organizing and Graphing Quantitative Data
  9. Mar 3, 2026 Published
    Is zero an even number?
    Statistics Numerical Descriptive Measures Measures of Central Tendency for Ungrouped Data
  10. Mar 3, 2026 Published
    Monty Hall Paradox (Conditional Probability Explained)
    Statistics Probability Marginal and Conditional Probabilities
Showing 1–10 of 109
Open the calculator for this topic