Bayes’ theorem: updating beliefs with evidence
Bayes’ theorem is a core rule of conditional probability that explains how to update the probability of a hypothesis after observing new evidence.
Suppose \(A\) is a hypothesis (for example, “the patient has the disease”) and \(B\) is an observation (for example, “the test is positive”).
The quantity we usually want is the posterior probability \(P(A\mid B)\): how likely \(A\) is after seeing \(B\).
Bayes’ theorem expresses this posterior in terms of a prior \(P(A)\) and the likelihood \(P(B\mid A)\):
\[
P(A\mid B)=\frac{P(B\mid A)\,P(A)}{P(B)}.
\]
Prior, likelihood, and evidence
The prior \(P(A)\) represents your belief in \(A\) before the observation.
The likelihood \(P(B\mid A)\) measures how compatible the observation is with the hypothesis.
The denominator \(P(B)\) is called the evidence (or normalization) because it makes the result a valid probability.
In the common “binary” setting where either \(A\) or \(\neg A\) holds, the evidence is computed by the law of total probability:
\[
P(B)=P(B\mid A)P(A)+P(B\mid \neg A)P(\neg A),
\quad \text{where } P(\neg A)=1-P(A).
\]
This step is crucial: it accounts for the fact that \(B\) may happen even when the hypothesis is false (for example, false positives).
A medical-test example
Consider \(P(\text{disease})=0.01\), \(P(\text{test+}\mid \text{disease})=0.9\), and \(P(\text{test+}\mid \text{no disease})=0.05\).
First compute the evidence:
\[
P(\text{test+}) = 0.9\cdot 0.01 + 0.05\cdot 0.99 = 0.0585.
\]
Then Bayes’ theorem gives the posterior:
\[
P(\text{disease}\mid \text{test+})
= \frac{0.9\cdot 0.01}{0.0585}
\approx 0.1538.
\]
Even with a fairly accurate test, the posterior is not close to 1 because the disease is rare and false positives contribute to \(P(\text{test+})\).
How to use this tool
Enter the prior \(P(A)\) along with the two likelihoods \(P(B\mid A)\) and \(P(B\mid \neg A)\).
The calculator computes \(P(\neg A)=1-P(A)\), then normalizes the denominator by computing the evidence \(P(B)\).
Finally, it outputs \(P(A\mid B)\) as a decimal and percent with clear step-by-step work, and it also reports the complement \(P(\neg A\mid B)=1-P(A\mid B)\).
The interactive canvas shows a prior vs posterior bar visualization; press Play to animate the posterior bars “filling” from 0 to their final values.
Advanced note (university level)
Bayes’ theorem extends naturally beyond two hypotheses.
If \(\{H_i\}\) is a set of mutually exclusive, exhaustive hypotheses, then
\[
P(H_i\mid B)=\frac{P(B\mid H_i)P(H_i)}{\sum_j P(B\mid H_j)P(H_j)}.
\]
In statistics and machine learning, this is closely related to odds and log-odds (logit), and to hierarchical Bayes models where priors themselves have parameters.