Range, variance, standard deviation
These statistics describe spread: how far the data values are from each other and from the mean.
In biology labs, spread matters because repeated measurements rarely match exactly (instrument noise, biological variability, sampling variation).
Big picture:
- Range is a quick “min to max” spread.
- Variance is the average squared deviation from the mean.
- Standard deviation is the square root of variance, in the same units as the data.
- CV% compares spread to the size of the mean (useful when units differ or means differ a lot).
Definitions
Suppose the cleaned dataset is
\(x_1, x_2, \ldots, x_n\) with mean \(\bar{x}\).
\[
\bar{x}=\frac{1}{n}\sum_{i=1}^{n}x_i
\]
Min, max, range
\[
\min=\text{smallest value},\qquad \max=\text{largest value},\qquad \text{Range}=\max-\min
\]
Population vs sample variance
\[
\text{Population variance:}\quad \sigma^2=\frac{1}{n}\sum_{i=1}^{n}(x_i-\mu)^2
\]
\[
\text{Sample variance:}\quad s^2=\frac{1}{n-1}\sum_{i=1}^{n}(x_i-\bar{x})^2
\]
Most lab datasets are samples from a larger “true” population (all possible trials),
so the sample variance \(s^2\) is commonly used.
The divisor \(n-1\) (Bessel’s correction) makes \(s^2\) an unbiased estimator of \(\sigma^2\) under standard assumptions.
Standard deviation
\[
\sigma=\sqrt{\sigma^2},\qquad s=\sqrt{s^2}
\]
Coefficient of variation (optional)
\[
\text{CV}\% = 100\cdot\frac{\text{SD}}{|\text{mean}|}
\]
CV% is not defined when the mean is zero.
Computational shortcut using Σx and Σx²
Directly summing \(\sum(x_i-\bar{x})^2\) is conceptually clear, but a common shortcut uses
\(\sum x\) and \(\sum x^2\).
\[
SS=\sum_{i=1}^{n}(x_i-\bar{x})^2=\sum_{i=1}^{n}x_i^2-\frac{\left(\sum_{i=1}^{n}x_i\right)^2}{n}
\]
Then:
\[
\sigma^2=\frac{SS}{n},\qquad s^2=\frac{SS}{n-1}
\]
The calculator can optionally show \(\sum x\) and \(\sum x^2\) because they make the steps easy to follow and check by hand.
How to interpret SD (and why the histogram shading helps)
Standard deviation tells you a “typical” distance from the mean. If the distribution is roughly bell-shaped,
the following rule of thumb is often useful:
\[
\text{About }68\%\text{ of values fall in }\bar{x}\pm 1\cdot \text{SD},\qquad
\text{about }95\%\text{ fall in }\bar{x}\pm 2\cdot \text{SD}.
\]
The calculator’s histogram shades the bands \(\bar{x}\pm 1\cdot \text{SD}\) and \(\bar{x}\pm 2\cdot \text{SD}\)
so you can visually compare the data distribution to those intervals.
What the boxplot is showing
A boxplot gives a compact picture of spread and location:
The boxplot is great for quickly seeing whether the data are tightly clustered, widely spread, or skewed.
Common pitfalls
- Mixing sample vs population: if you choose sample, variance divides by \(n-1\). If you choose population, it divides by \(n\).
- Units: variance has squared units (e.g., \((\text{mg})^2\)), while SD has the same units as the data (e.g., mg).
- Mean = 0: CV% cannot be computed because it would divide by zero.
- Outliers: extreme values strongly affect SD and variance; the boxplot and histogram help you notice them.
Connection to Topic 5
Topic 5 focused on location (mean/median/mode). This topic focuses on spread.
In practice you almost always report both:
\[
\text{location: mean (or median)} \quad+\quad \text{spread: SD (or IQR)}
\]
The calculator is designed to reuse the same dataset input so you can compute Topic 5 and Topic 6 statistics on the same values without reformatting.