Why engineers use statistics
Engineering measurements are never perfectly identical. Statistics helps summarize repeated
measurements, estimate uncertainty, compare a measured mean to a reference value, and model
relationships in experimental data.
\[
x_1,x_2,\ldots,x_n
\]
These values may represent repeated length measurements, voltage readings, material strengths,
sensor outputs, or quality-control samples.
Sample mean
The sample mean is the average measured value.
\[
\bar{x}
=
\frac{1}{n}
\sum_{i=1}^{n}x_i.
\]
It is usually the best estimate of the true process mean when the measurements are unbiased.
Sample standard deviation
The sample standard deviation measures the spread of individual measurements around the mean.
\[
s
=
\sqrt{
\frac{\sum_i\left(x_i-\bar{x}\right)^2}{n-1}
}.
\]
The denominator is \(n-1\), not \(n\), because the mean is estimated from the same sample.
Standard error of the mean
The standard deviation describes individual measurements. The standard error describes uncertainty
in the sample mean.
\[
\mathrm{SE}
=
\frac{s}{\sqrt{n}}.
\]
Increasing the number of independent measurements reduces the standard error.
Confidence interval for the true mean
A confidence interval estimates a likely range for the true process mean \(\mu\).
\[
\bar{x}
\pm
t_{\alpha/2,n-1}
\frac{s}{\sqrt{n}}.
\]
The multiplier \(t_{\alpha/2,n-1}\) comes from the Student \(t\)-distribution and depends on the
confidence level and degrees of freedom.
How to read a confidence interval
A confidence interval is about the mean, not about every individual measurement. A 95% confidence
interval does not mean that 95% of the measurements are inside the interval.
\[
L
\le
\mu
\le
U.
\]
It means the method used to build the interval would capture the true mean in about 95% of repeated
experiments under the same conditions.
One-sample hypothesis test
A one-sample test compares a measured mean with a reference value \(\mu_0\).
\[
H_0:\mu=\mu_0.
\]
The alternative hypothesis can be two-sided or one-sided:
\[
H_a:\mu\ne\mu_0,
\qquad
H_a:\mu>\mu_0,
\qquad
H_a:\mu<\mu_0.
\]
Test statistic
The one-sample \(t\)-statistic is
\[
t
=
\frac{\bar{x}-\mu_0}{s/\sqrt{n}}.
\]
A large magnitude of \(t\) means the measured mean is far from the reference value compared with
the uncertainty of the mean.
p-value
The p-value estimates how surprising the observed result would be if the null hypothesis were true.
\[
p=\Pr\left(\text{result at least as extreme as observed}\mid H_0\right).
\]
If the p-value is smaller than the significance level \(\alpha\), the null hypothesis is rejected.
Histogram and normal overlay
A histogram shows how measurements are distributed. A normal curve overlay helps compare the data
with an ideal bell-shaped model.
\[
f(x)
=
\frac{1}{s\sqrt{2\pi}}
e^{-\frac{1}{2}\left(\frac{x-\bar{x}}{s}\right)^2}.
\]
The overlay is useful for visual checking, but it does not prove that the data are exactly normal.
Linear regression
Linear regression models a relationship between an input \(x\) and a measured output \(y\).
\[
\hat{y}=b_0+b_1x.
\]
The slope \(b_1\) estimates how much \(y\) changes for one unit increase in \(x\).
Least-squares line
The regression line is chosen by minimizing the sum of squared residuals.
\[
\min
\sum_i
\left(y_i-\hat{y}_i\right)^2.
\]
The fitted values \(\hat{y}_i\) are the values predicted by the regression line.
Residuals
A residual is the difference between the measured value and the fitted value.
\[
r_i=y_i-\hat{y}_i.
\]
Residuals should be checked before trusting the regression model. Patterns in residuals may indicate
nonlinearity, outliers, or changing variance.
Coefficient of determination
The coefficient of determination \(R^2\) measures the fraction of variation explained by the regression.
\[
R^2
=
1
-
\frac{\sum_i\left(y_i-\hat{y}_i\right)^2}
{\sum_i\left(y_i-\bar{y}\right)^2}.
\]
A high \(R^2\) can be helpful, but it does not automatically prove that the model is physically correct.
Common mistakes
- Confusing standard deviation with standard error.
- Interpreting a confidence interval for the mean as a range for individual measurements.
- Using a hypothesis test without checking practical engineering significance.
- Trusting regression without checking residuals.
- Assuming a normal distribution just because a normal curve is drawn.
- Using too few data points for a reliable confidence interval.
- Ignoring outliers without an engineering reason.