Kernel Density Estimation (KDE): smooth density from data
Kernel density estimation (KDE) is a non-parametric way to estimate an unknown probability density function (PDF)
from a sample \(x_1,\dots,x_n\). Instead of assuming a parametric model (like normal or exponential),
KDE builds a smooth curve by placing a small “bump” (a kernel) at each data point and averaging them.
1) KDE definition
The kernel density estimator with bandwidth \(h>0\) is:
\[
\hat{f}_h(x)=\frac{1}{n h}\sum_{i=1}^n K\!\left(\frac{x-x_i}{h}\right).
\]
Intuition: each term \(K((x-x_i)/h)\) contributes density near \(x_i\). The bandwidth \(h\) controls how wide each bump is.
2) Common kernels
Different kernels change the bump shape. Two popular ones are:
Gaussian kernel
\[
K(u)=\frac{1}{\sqrt{2\pi}}e^{-u^2/2}.
\]
Smooth everywhere; contributes small density far from each data point.
Epanechnikov kernel
\[
K(u)=\frac{3}{4}(1-u^2)\,\mathbf{1}(|u|\le 1).
\]
Compact support: contributes only when \(|u|\le 1\), meaning points farther than \(h\) away contribute 0.
3) Bandwidth \(h\): the main tuning knob
The bandwidth matters more than the kernel choice:
- Small \(h\) → low bias, high variance (wiggly curve, may overfit noise).
- Large \(h\) → high bias, low variance (oversmoothed curve, may hide structure).
Silverman’s rule of thumb
A widely used automatic choice is:
\[
h \approx 0.9\min\!\left(s,\frac{\mathrm{IQR}}{1.34}\right)n^{-1/5},
\]
where \(s\) is the sample standard deviation and IQR is the interquartile range. This works best for roughly unimodal, not-too-skewed data.
4) KDE vs histogram
A histogram is also a density estimate, but it depends on bin edges and bin width.
KDE avoids sharp bin boundaries and tends to look smoother. However, KDE can also be misleading if \(h\) is chosen poorly.
5) Normalization and numerical integration
In theory, \(\hat{f}_h(x)\) integrates to 1 if \(K\) is a valid kernel. In practice, on a finite plotting window and discrete grid,
the numeric integral can deviate slightly from 1. This calculator approximates:
\[
\int \hat{f}_h(x)\,dx \approx \sum_{j=2}^{m}\frac{(x_j-x_{j-1})}{2}\big(\hat{f}_h(x_j)+\hat{f}_h(x_{j-1})\big),
\]
(trapezoidal rule). If the plotted range is wide enough, the integral should be close to 1.
6) University note: multivariate KDE
In higher dimensions, KDE becomes:
\[
\hat{f}_{\mathbf{H}}(\mathbf{x})=\frac{1}{n|\mathbf{H}|^{1/2}}\sum_{i=1}^n
K\!\left(\mathbf{H}^{-1/2}(\mathbf{x}-\mathbf{x}_i)\right),
\]
where \(\mathbf{H}\) is a bandwidth matrix. Choosing \(\mathbf{H}\) is much harder in high dimensions (the “curse of dimensionality”).
7) Practical advice
- Try multiple bandwidths and look for stable structure (peaks that persist across reasonable \(h\)).
- If you have very small \(n\), KDE can look overly smooth or overly wiggly depending on \(h\); interpret cautiously.
- For bounded data (e.g., proportions on \([0,1]\)), standard KDE can leak density outside the bounds; boundary-corrected methods exist.