A five number summary reports five key order statistics—minimum, \(Q_1\), median, \(Q_3\), and maximum—capturing center, spread, and skewness and providing the numeric backbone of a boxplot.
Definition of the five number summary
For a numerical data set, sort the observations from smallest to largest. The five number summary consists of:
| Symbol | Name | Meaning in the ordered data |
|---|---|---|
| min | Minimum | Smallest observed value |
| Q1 | First quartile (lower quartile) | 25th percentile; about 25% of values are at or below \(Q_1\) |
| median | Second quartile | 50th percentile; splits the data into two halves |
| Q3 | Third quartile (upper quartile) | 75th percentile; about 75% of values are at or below \(Q_3\) |
| max | Maximum | Largest observed value |
The interquartile range (IQR) is computed from the five number summary: \[ \mathrm{IQR}=Q_3-Q_1. \] The IQR measures the spread of the middle 50% of the data.
Quartile conventions and a consistent computation rule
Quartiles depend on how the ordered data are split into lower and upper halves. Several conventions exist (especially when the sample size is odd). A consistent, widely used rule in introductory statistics is:
The median is the middle value (or the average of the two middle values). The lower half consists of the values below the median, and the upper half consists of the values above the median. \(Q_1\) is the median of the lower half, and \(Q_3\) is the median of the upper half. When the sample size is even, the lower and upper halves each contain exactly \(n/2\) values; when the sample size is odd, the median is excluded from both halves under this rule.
Visualization: boxplot anatomy from the five number summary
Worked example with explicit quartiles
Consider the ordered data set (assumed already sorted): 4, 6, 7, 8, 10, 12, 13, 18, 22, 24. The sample size is \(n=10\).
The median is the average of the 5th and 6th values: \[ \text{median}=\frac{10+12}{2}=11. \] The lower half is 4, 6, 7, 8, 10, so \(Q_1\) is its median, \(Q_1=7\). The upper half is 12, 13, 18, 22, 24, so \(Q_3=18\). The minimum is \(4\) and the maximum is \(24\).
| Component | Value | How it is located in the ordered list |
|---|---|---|
| Minimum | 4 | Smallest observation |
| \(Q_1\) | 7 | Median of lower half (4, 6, 7, 8, 10) |
| Median | 11 | Average of 5th and 6th values (10 and 12) |
| \(Q_3\) | 18 | Median of upper half (12, 13, 18, 22, 24) |
| Maximum | 24 | Largest observation |
The interquartile range is \[ \mathrm{IQR}=Q_3-Q_1=18-7=11, \] describing the spread of the middle 50% of values.
Outlier fences from the five number summary
A common outlier screen (Tukey rule) uses the IQR to create lower and upper fences: \[ \text{lower fence}=Q_1-1.5\,\mathrm{IQR},\qquad \text{upper fence}=Q_3+1.5\,\mathrm{IQR}. \] Observations below the lower fence or above the upper fence are flagged as potential outliers.
Interpretation of shape using the five numbers
Skewness is often visible through unequal gaps: \((Q_1-\min)\), \((\text{median}-Q_1)\), \((Q_3-\text{median})\), and \((\max-Q_3)\). A long upper tail (large \(\max-Q_3\)) suggests right skew; a long lower tail (large \(Q_1-\min\)) suggests left skew.
Common pitfalls
Quartiles can differ slightly across software packages because multiple quartile definitions exist; the five number summary remains interpretable as long as one consistent definition is used within a course or analysis. Sorting errors and mixing different units in the same list are frequent practical causes of incorrect five number summaries.