Median as a positional center
The median is a measure of central tendency defined by position rather than by arithmetic balance. It is the value that splits an ordered data set so that at least half of the observations are at or below it and at least half are at or above it. In percentile language, the median is the 50th percentile.
How to find the median depends on whether the number of observations is odd or even, after the data are arranged from smallest to largest.
Ungrouped data (raw list of observations)
Let \(x_{(1)} \le x_{(2)} \le \dots \le x_{(n)}\) denote the ordered observations (order statistics) from a sample of size \(n\).
Odd sample size
When \(n\) is odd, the median equals the single middle ordered observation:
\[ \operatorname{Median} = x_{\left(\frac{n+1}{2}\right)} \qquad (n \text{ odd}) \]
Example (already ordered): 3, 7, 7, 9, 12 has \(n=5\), so the median is \(x_{(3)}=7\).
Even sample size
When \(n\) is even, there are two middle ordered observations. The conventional median is their average:
\[ \operatorname{Median} = \frac{x_{\left(\frac{n}{2}\right)} + x_{\left(\frac{n}{2}+1\right)}}{2} \qquad (n \text{ even}) \]
Example (already ordered): 2, 4, 7, 10, 13, 18 has \(n=6\), so the median is \(\frac{7+10}{2}=8.5\). The median need not be an observed data value when \(n\) is even.
Visualization of the “middle” rule
Frequency tables (ungrouped values with counts)
A frequency table lists distinct values \(v_1 < v_2 < \dots < v_k\) with frequencies \(f_1, f_2, \dots, f_k\), giving total \(N=\sum_{i=1}^{k} f_i\). The median is located by position, using cumulative frequency \(F_j=\sum_{i=1}^{j} f_i\).
When \(N\) is odd, the median is the value whose cumulative frequency reaches the position \(\frac{N+1}{2}\). When \(N\) is even, the two central positions \(\frac{N}{2}\) and \(\frac{N}{2}+1\) are located in the cumulative counts, and the median is the average of the corresponding values when those two positions fall on different values.
| Value | Frequency \(f\) | Cumulative frequency \(F\) |
|---|---|---|
| 1 | 2 | 2 |
| 3 | 1 | 3 |
| 5 | 4 | 7 |
| 8 | 2 | 9 |
The table has \(N=9\), so the median position is \(\frac{9+1}{2}=5\). The cumulative frequency reaches 5 at value 5, so the median equals 5.
Grouped data (class intervals)
Grouped data place observations into intervals (classes), such as \([10,20)\), \([20,30)\), and so on. The median is estimated by locating the median class, the class where cumulative frequency crosses \(N/2\), and interpolating within that class.
\[ \widetilde{m} = L + \left(\frac{\frac{N}{2}-C_{\text{before}}}{f_{\text{class}}}\right)w \]
Here \(L\) is the lower class boundary of the median class, \(C_{\text{before}}\) is the cumulative frequency before the median class, \(f_{\text{class}}\) is the frequency in the median class, and \(w\) is the class width.
| Class interval | Frequency \(f\) | Cumulative frequency |
|---|---|---|
| [0, 10) | 3 | 3 |
| [10, 20) | 5 | 8 |
| [20, 30) | 4 | 12 |
| [30, 40) | 2 | 14 |
The total is \(N=14\), so \(N/2=7\). The cumulative frequency crosses 7 in the class \([10,20)\), making it the median class. With \(L=10\), \(C_{\text{before}}=3\), \(f_{\text{class}}=5\), and \(w=10\), \[ \widetilde{m}=10+\left(\frac{7-3}{5}\right)\cdot 10 = 18 \]
Common pitfalls
Unordered data obscure the median’s positional definition; the median is defined on the ordered list \(x_{(1)},\dots,x_{(n)}\). Repeated values (ties) are fully compatible with the definition and often produce a median equal to the repeated value. For even \(n\), the median is commonly not an observed value because it is an average of two central observations.