False positives, false negatives, and what “accuracy” really means
In medical testing (and many other classification problems), a test returns either “positive” (\(T+\)) or “negative” (\(T-\)).
The condition of interest (for example, a disease) is either present (\(D\)) or absent (\(\neg D\)).
The key performance numbers are sensitivity and specificity, which are conditional probabilities:
\[
\text{sensitivity} = P(T+\mid D), \qquad \text{specificity} = P(T-\mid \neg D).
\]
Sensitivity measures how often the test detects the condition when it is truly there, while specificity measures how often the test is correctly negative when the condition is truly absent.
False positive rate and false negative rate
Two closely related quantities are the error rates:
\[
\mathrm{FPR} = P(T+\mid \neg D)=1-\text{specificity}, \qquad
\mathrm{FNR} = P(T-\mid D)=1-\text{sensitivity}.
\]
A false positive happens when the test says “positive” even though the condition is absent; a false negative happens when the test says “negative” even though the condition is present.
These are not the same as “probability you have the disease if the test is positive” — that is a different conditional probability.
Prevalence (base rate) and Bayes’ theorem
The prevalence \(P(D)\) is the baseline chance that a randomly chosen person truly has the condition before testing.
To interpret a result, you often want the positive predictive value (PPV) and the negative predictive value (NPV):
\[
\mathrm{PPV}=P(D\mid T+), \qquad \mathrm{NPV}=P(\neg D\mid T-).
\]
Bayes’ theorem connects these to sensitivity, specificity, and prevalence. First compute the overall positive rate:
\[
P(T+) = P(T+\mid D)P(D) + P(T+\mid \neg D)P(\neg D).
\]
Then
\[
P(D\mid T+) = \frac{P(T+\mid D)P(D)}{P(T+)}.
\]
When prevalence is very small, PPV can be surprisingly low even if sensitivity and specificity look “high,” because false positives may outnumber true positives in the tested population.
The confusion matrix view
A helpful way to see all quantities at once is the confusion matrix, which counts true positives (TP), false negatives (FN), false positives (FP), and true negatives (TN).
For a population of size \(N\), the expected counts are:
\[
\mathrm{TP}=N\,P(D)P(T+\mid D), \quad
\mathrm{FN}=N\,P(D)P(T-\mid D),
\]
\[
\mathrm{FP}=N\,P(\neg D)P(T+\mid \neg D), \quad
\mathrm{TN}=N\,P(\neg D)P(T-\mid \neg D).
\]
PPV and NPV can be interpreted as “fractions within the test-positive group” and “fractions within the test-negative group,” respectively.
ROC preview and advanced extensions
In signal detection, a test can be tuned by changing a threshold. Each threshold produces a pair \((\mathrm{FPR}, \mathrm{TPR})\) where \(\mathrm{TPR}=\) sensitivity.
Plotting \(\mathrm{TPR}\) against \(\mathrm{FPR}\) gives a ROC curve. This calculator shows the ROC point for the sensitivity and specificity you enter.
At a university level, problems extend to multi-class tests and continuous scores, where you compare entire ROC curves, compute AUC, or generalize Bayes updates across multiple hypotheses.