Box and plot graph
A box and plot graph is commonly used to mean a box plot (box-and-whisker plot). The display summarizes a quantitative dataset through quartiles and the median, with whiskers indicating the spread of typical values and optional points marking outliers.
Five-number summary and the box structure
A box plot is anchored by the five-number summary: minimum, first quartile \(Q_1\), median, third quartile \(Q_3\), and maximum. Many modern (modified) box plots replace the extreme minimum/maximum with the most extreme non-outlier values for the whiskers.
The central box spans \(Q_1\) to \(Q_3\). The line inside the box marks the median. The width of the box represents the interquartile range:
\[ \text{IQR} = Q_3 - Q_1 \]
Quartiles and interquartile range
- Median
- The middle of the ordered data (or the average of the two middle values when the sample size is even).
- First quartile \(Q_1\)
- The median of the lower half of the ordered data (definition varies slightly by convention; the idea remains “25th percentile”).
- Third quartile \(Q_3\)
- The median of the upper half of the ordered data (approximately the 75th percentile).
- Interquartile range (IQR)
- The spread of the middle 50% of observations, \(Q_3 - Q_1\). IQR is resistant to extreme values compared with the full range.
- Skewness cues
- A median closer to \(Q_1\) with a longer upper whisker suggests right-skew; a median closer to \(Q_3\) with a longer lower whisker suggests left-skew.
Whiskers and outlier fences
A widely used outlier rule defines “fences” based on IQR. Values beyond the fences are plotted as individual outlier points, and whiskers extend to the most extreme values within the fences.
\[ \text{Lower fence} = Q_1 - 1.5 \cdot \text{IQR} \qquad \text{Upper fence} = Q_3 + 1.5 \cdot \text{IQR} \]
Worked example with numerical values
Consider the ordered dataset (n = 16): 52, 55, 57, 60, 61, 63, 65, 66, 68, 70, 72, 73, 75, 78, 84, 95. Quartiles follow the common “median of halves” convention.
| Quantity | Meaning | Value |
|---|---|---|
| \(Q_1\) | First quartile (median of lower 8 values) | \(\frac{60 + 61}{2} = 60.5\) |
| Median | Middle of all 16 values | \(\frac{66 + 68}{2} = 67\) |
| \(Q_3\) | Third quartile (median of upper 8 values) | \(\frac{73 + 75}{2} = 74\) |
| IQR | Middle-50% spread | \(74 - 60.5 = 13.5\) |
| Fences | Outlier thresholds |
Lower: \(60.5 - 1.5 \cdot 13.5 = 40.25\) Upper: \(74 + 1.5 \cdot 13.5 = 94.25\) |
| Whiskers (modified) | Most extreme non-outlier values | Lower whisker: 52 Upper whisker: 84 |
| Outliers | Values beyond the fences | 95 (above 94.25) |
Visualization: box plot graph with quartiles, whiskers, and an outlier
Interpretation of a box plot
A box plot emphasizes center, spread, and unusual values while suppressing fine detail. The middle 50% of observations lie inside the box, and the median line gives a resistant measure of center.
Common pitfalls
- Quartile convention differences
- Several accepted definitions exist for \(Q_1\) and \(Q_3\) in finite samples; small changes in quartiles produce small changes in IQR and fences. Consistency within a course or software environment matters more than a single convention.
- Whiskers interpreted as minimum and maximum
- Modified box plots use whiskers for the most extreme non-outlier values; outliers appear as separate points beyond the whiskers.
- Box plot viewed as a histogram
- Box plots do not display frequencies within the quartile ranges; equal box widths do not imply equal counts per unit length, only equal counts per quartile region.
A box plot graph is a compact descriptive summary built from quartiles and IQR, suitable for comparing distributions across groups on a common scale.