Chi-square tests: comparing observed vs expected
Chi-square (\(\chi^2\)) tests are used for categorical data. They measure how far observed counts deviate from what a null hypothesis predicts.
The core statistic is always a sum of standardized squared deviations:
\[
\boxed{\chi^2 = \sum \frac{(O-E)^2}{E}}
\]
Under \(H_0\) and with adequate expected counts, the statistic approximately follows a chi-square distribution with appropriate degrees of freedom.
The p-value is a right-tail probability.
1) Goodness of fit (GOF)
A GOF test checks whether a single categorical variable matches an expected distribution. For \(k\) categories with observed counts
\(O_1,\dots,O_k\) and expected counts \(E_1,\dots,E_k\):
\[
\chi^2 = \sum_{i=1}^{k}\frac{(O_i-E_i)^2}{E_i}.
\]
Degrees of freedom for basic GOF is typically:
\[
df = k-1,
\]
If you estimate \(m\) parameters from the data to form expected counts, a common adjustment is \(df=k-1-m\).
2) Test of independence (contingency tables)
A chi-square test of independence checks whether two categorical variables are associated.
Put the data in an \(r\times c\) table of observed counts \(O_{ij}\).
Under \(H_0\) (independence), expected counts are computed from row and column totals:
\[
E_{ij}=\frac{(\text{row}_i\ \text{total})(\text{col}_j\ \text{total})}{N}.
\]
The chi-square statistic is:
\[
\chi^2 = \sum_{i=1}^{r}\sum_{j=1}^{c}\frac{(O_{ij}-E_{ij})^2}{E_{ij}}.
\]
Degrees of freedom are:
\[
df=(r-1)(c-1).
\]
3) p-value (right tail)
Because \(\chi^2\) is nonnegative and large values indicate greater disagreement with \(H_0\),
the p-value is:
\[
p = P(\chi^2_{df}\ge \chi^2_{\text{obs}}).
\]
4) Yates continuity correction (2×2 tables)
For a 2×2 table, some instructors introduce the Yates correction to reduce approximation error for small counts. It modifies each cell’s deviation:
\[
\chi^2_{\text{Yates}}=\sum \frac{(|O-E|-0.5)^2}{E}.
\]
In modern practice, alternatives include Fisher’s exact test for small 2×2 tables.
Yates is still useful pedagogically and appears in many textbooks.
5) Assumptions and rules of thumb
- Counts should be independent observations (no repeated counting of the same item).
- Expected counts should not be too small (often “at least 5 in most cells” is used as a rule of thumb).
- For GOF, expected counts should sum to the same total as observed counts.
6) Interpreting results
If \(p \le \alpha\), you reject \(H_0\) at significance \(\alpha\). This means the observed discrepancies are unlikely under the null model.
If \(p > \alpha\), you fail to reject \(H_0\); the data are not sufficiently inconsistent with the null at that threshold.
7) University extension: effect size for independence
For contingency tables, a common effect size is Cramér’s \(V\):
\[
V=\sqrt{\frac{\chi^2}{N\cdot \min(r-1,c-1)}}.
\]
This rescales \(\chi^2\) into a 0–1 measure of association strength (interpretation depends on context and table size).