Interquartile range (IQR)
In descriptive statistics, “what is interquartile range” refers to the numerical width of the middle half of a dataset after the observations are arranged in order. It uses quartiles as reference points and is widely used in box-and-whisker plots and outlier screening.
The interquartile range is defined by \[ \mathrm{IQR} = Q_3 - Q_1, \] where \(Q_1\) is the lower quartile (25th percentile) and \(Q_3\) is the upper quartile (75th percentile).
The IQR measures spread in a way that is resistant to extreme values because it ignores the lowest 25% and highest 25% of the data.
Quartiles and the ordered data view
Let the ordered observations be \(x_{(1)} \le x_{(2)} \le \dots \le x_{(n)}\). The median is commonly denoted \(Q_2\). The lower quartile \(Q_1\) and upper quartile \(Q_3\) locate the 25% and 75% positions in the ordered list, respectively, so that approximately 25% of observations lie at or below \(Q_1\) and approximately 75% lie at or below \(Q_3\).
Multiple quartile conventions exist in textbooks and software. A widely taught convention for hand calculation (often aligned with Tukey-style box plots) treats \(Q_1\) as the median of the lower half of the ordered data and \(Q_3\) as the median of the upper half; when \(n\) is odd, the overall median \(Q_2\) is not included in either half.
Other conventions compute quartiles as the 25th and 75th percentiles using interpolation between adjacent order statistics, especially when the desired percentile position falls between integers. The IQR definition \(\mathrm{IQR} = Q_3 - Q_1\) remains unchanged, while numeric values may shift slightly across conventions.
Worked example with a clean five-number summary
Consider the dataset (already in ascending order): \[ 4,\ 7,\ 7,\ 9,\ 10,\ 12,\ 13,\ 15,\ 18,\ 20,\ 21. \] Here \(n = 11\), so the median is the 6th value: \(Q_2 = 12\).
Lower half: \(4, 7, 7, 9, 10\) gives \(Q_1 = 7\). Upper half: \(13, 15, 18, 20, 21\) gives \(Q_3 = 18\).
\[ \mathrm{IQR} = Q_3 - Q_1 = 18 - 7 = 11. \]
| Statistic | Symbol | Value |
|---|---|---|
| Minimum | min | 4 |
| Lower quartile | Q1 | 7 |
| Median | Q2 | 12 |
| Upper quartile | Q3 | 18 |
| Maximum | max | 21 |
| Interquartile range | IQR | 11 |
IQR and outlier screening (Tukey fences)
A common rule for flagging potential outliers uses inner fences based on the interquartile range: \[ \text{Lower fence} = Q_1 - 1.5\cdot \mathrm{IQR}, \qquad \text{Upper fence} = Q_3 + 1.5\cdot \mathrm{IQR}. \] Values outside these fences are often treated as outliers for exploratory analysis and for box-plot whisker conventions in many textbooks.
For the example above, \(\mathrm{IQR}=11\), so \[ Q_1 - 1.5\cdot \mathrm{IQR} = 7 - 1.5\cdot 11 = 7 - 16.5 = -9.5, \qquad Q_3 + 1.5\cdot \mathrm{IQR} = 18 + 16.5 = 34.5. \] All observations \(4\) through \(21\) fall within \([-9.5, 34.5]\), so no outliers are flagged by this rule.
Interpretation and statistical role
- Robustness: IQR is resistant to extreme values compared with the range.
- Comparability: IQR supports comparisons of variability across groups with different centers.
- Graphical linkage: IQR is the box length in a box-and-whisker plot, connecting numeric summaries to visualization.
Common sources of disagreement in results
Differences in quartile conventions explain many mismatches between hand calculations and software output. The IQR depends on \(Q_1\) and \(Q_3\), and those quartiles can be defined by “median-of-halves” rules or by percentile interpolation rules. Reporting the chosen convention (or the software’s quartile method) prevents ambiguity, especially for small samples.