“How to find degrees of freedom” is best answered by a single principle: degrees of freedom (df) count how many pieces of information can vary freely after constraints and estimated parameters are taken into account.
Core principle (constraint / parameter viewpoint)
Start with the number of data values (or cells) that could vary, then subtract:
- Constraints (equalities that must hold), and
- Estimated parameters that “use up” information (for example, estimating a mean or regression coefficients).
Why \(n-1\) appears so often
Many procedures use deviations from a sample mean. The deviations \(x_i-\bar{x}\) are not all independent because they must sum to zero:
That single constraint reduces the free variation by 1, producing df \(=n-1\) in one-mean t procedures and in the sample variance.
Degrees of freedom formulas (most used cases)
| Procedure / distribution | Typical df | How to find degrees of freedom (reason) |
|---|---|---|
| One-sample t (mean, \(\sigma\) unknown) | \(df=n-1\) | Estimating \(\bar{x}\) imposes \(\sum(x_i-\bar{x})=0\): one constraint. |
| Paired t (mean of differences) | \(df=n-1\) | Convert to one sample on the differences \(d_i\); estimate \(\bar{d}\). |
| Two-sample t (equal variances pooled) | \(df=n_1+n_2-2\) | Two sample means estimated, giving \((n_1-1)+(n_2-1)\). |
| Two-sample t (unequal variances, Welch) | \(\nu\) (approx.) | Use the Welch–Satterthwaite approximation shown below. |
| Chi-square goodness-of-fit | \(df=k-1\) (often) | Counts across \(k\) categories sum to \(n\): one constraint; subtract more if parameters are estimated. |
| Chi-square independence / homogeneity | \(df=(r-1)(c-1)\) | Row and column totals constrain the \(r\times c\) cell counts. |
| One-way ANOVA (F test) | \(df_1=k-1,\; df_2=n-k\) | \(k\) group means estimated; total df \(=n-1\) splits into between and within parts. |
| Simple linear regression (error / residual) | \(df=n-2\) | Two parameters (\(\beta_0,\beta_1\)) estimated, leaving \(n-2\) residual df. |
| Multiple regression (error / residual) | \(df=n-p-1\) | \(p+1\) coefficients estimated (including intercept), leaving \(n-(p+1)\). |
Key formulas that answer “how to find degrees of freedom”
Welch–Satterthwaite df (two independent samples, unequal variances)
If two samples have sizes \(n_1,n_2\) and sample variances \(s_1^2,s_2^2\), Welch’s t statistic uses an approximate df:
Chi-square goodness-of-fit when parameters are estimated
If \(k\) categories are used but \(m\) parameters of the expected distribution are estimated from the same data (for example, estimating a probability from the sample), df is reduced:
Chi-square independence for an \(r\times c\) table
Intuition: once the first \(r-1\) rows and first \(c-1\) columns are chosen, remaining cells are forced by marginal totals.
One-way ANOVA degrees of freedom split
With \(k\) groups and total sample size \(n\):
Worked mini-examples (quick practice)
Example 1: One-sample t
A sample of \(n=15\) observations is used to test a mean with \(\sigma\) unknown. Degrees of freedom: \(df=n-1=14\).
Example 2: Chi-square independence
A contingency table has \(r=3\) rows and \(c=4\) columns. Degrees of freedom: \(df=(3-1)(4-1)=2\cdot 3=6\).
Example 3: One-way ANOVA
Four groups (\(k=4\)) have a total of \(n=28\) observations. Degrees of freedom: \(df_1=k-1=3\) and \(df_2=n-k=24\).
Common mistakes to avoid
- Confusing \(n\) with \(k\): in goodness-of-fit, \(k\) is the number of categories, not sample size.
- Forgetting parameter estimation: if expected proportions are fitted from the sample, df decreases (use \(k-1-m\)).
- Wrong chi-square df: independence uses \((r-1)(c-1)\), not \(rc-1\).
- Regression df: residual df subtracts the number of estimated coefficients (including the intercept).