Mean, Variance, and Standard Deviation for Grouped Data
When data are given in a frequency table with classes
(intervals) and frequencies, we can no longer see every individual value.
Instead, we treat each class as if all observations in that class are
concentrated at the class midpoint.
Class midpoint and total frequency
For a class with lower limit \(L\) and upper limit \(U\), the
midpoint is
\[
m = \frac{L + U}{2}.
\]
If \(m_i\) is the midpoint of the \(i\)-th class and \(f_i\) is its frequency,
then
\[
N = \sum f_i
\]
is the total number of observations in the data set.
Grouped-data mean
The mean for grouped data is found by multiplying each
midpoint by its frequency, adding these products, and dividing by the total
frequency:
\[
\bar{x} = \frac{\sum m_i f_i}{N}.
\]
This mean is an approximation because we assume all values
in a class are equal to the midpoint \(m_i\).
Shortcut formulas for variance and standard deviation
To measure the spread of grouped data we use the variance and standard
deviation. With midpoints \(m_i\) and frequencies \(f_i\) the
shortcut formulas are:
Here \(\sigma^2\) and \(\sigma\) are the population variance and population
standard deviation, while \(s^2\) and \(s\) are the sample variance and
sample standard deviation. The grouped-data values are again approximate
because they are based on class midpoints, not on every individual
observation.
Typical procedure
- List each class with its midpoint \(m_i\) and frequency \(f_i\).
- Compute the columns \(m_i f_i\) and \(m_i^2 f_i\).
- Find the totals \(\sum f_i\), \(\sum m_i f_i\), and \(\sum m_i^2 f_i\).
- Use these sums in the formulas above to obtain \(\bar{x}\),
\(\sigma^2\)/\(s^2\), and \(\sigma\)/\(s\).