Meaning of variance
The formula of the variance expresses numerical spread by averaging squared deviations from the mean. Squaring makes positive and negative deviations contribute equally and gives extra weight to values farther from the mean.
Units: variance is measured in squared units (for example, cm2 if the data are in cm). Standard deviation is the square root of variance and returns to the original units.
Core formulas (population and sample)
| Setting | Notation | Variance formula | Interpretation |
|---|---|---|---|
| Population (all values) | \(x_1, x_2, \dots, x_N\), mean \( \mu \) | \(\sigma^2 = \dfrac{1}{N}\sum_{i=1}^{N}(x_i-\mu)^2\) | Average squared deviation across the entire population. |
| Sample (subset, unbiased) | \(x_1, x_2, \dots, x_n\), mean \( \bar{x} \) | \(s^2 = \dfrac{1}{n-1}\sum_{i=1}^{n}(x_i-\bar{x})^2\) | Unbiased estimator of \(\sigma^2\) under random sampling. |
Computational (shortcut) forms
The same formula of the variance can be written in a form that reduces repeated subtraction:
\[ \sigma^2 = \left(\frac{1}{N}\sum_{i=1}^{N}x_i^2\right) - \mu^2 \]
\[ s^2 = \frac{1}{n-1}\left(\sum_{i=1}^{n}x_i^2 - n\bar{x}^2\right) \]
Both forms are algebraically equivalent to averaging squared deviations; the difference lies only in how the arithmetic is organized.
Worked example with interpretation
Data (ungrouped): \(2, 4, 4, 4, 6\). The mean is \( \bar{x} = \dfrac{2+4+4+4+6}{5} = \dfrac{20}{5} = 4 \).
\[ \sum (x_i-\bar{x})^2 = (2-4)^2 + (4-4)^2 + (4-4)^2 + (4-4)^2 + (6-4)^2 \]
\[ \sum (x_i-\bar{x})^2 = 4 + 0 + 0 + 0 + 4 = 8 \]
\[ \sigma^2 = \frac{8}{5} = 1.6 \qquad\text{and}\qquad s^2 = \frac{8}{4} = 2 \]
The sample variance \(s^2\) exceeds the population variance \(\sigma^2\) for the same numbers because dividing by \(n-1\) compensates for estimating the mean from the sample.
Visualization of squared deviations
Grouped-data adaptation (frequency tables)
When values are presented with frequencies (or grouped into classes), the same variance concept is applied using representative values and weights. For a frequency table with distinct values \(x_j\) and frequencies \(f_j\), the total count is \(n=\sum f_j\) and the mean is \( \bar{x}=\dfrac{\sum f_j x_j}{n} \).
\[ s^2 = \frac{1}{n-1}\sum_{j}(f_j)\,(x_j-\bar{x})^2 \]
For grouped classes, the representative value is often the class midpoint; the resulting variance approximates the variance of the original raw data.
Common pitfalls
- Squared units are expected; standard deviation \(s=\sqrt{s^2}\) restores the original units.
- \(n\) versus \(n-1\) reflects population variance versus unbiased sample variance; mixing denominators changes the numerical value.
- Rounding the mean too early can shift the sum of squared deviations; higher precision in intermediate calculations reduces rounding error.