The law of total probability
Many probability questions ask for \(P(B)\), the probability of some event \(B\), but the easiest information you have is often
conditional: “How likely is \(B\) if case \(A_1\) happens? What if case \(A_2\) happens?” The law of total probability
turns that kind of data into a single unconditional probability by averaging over a partition.
What is a partition?
A collection of events \(\{A_1, A_2, \dots, A_n\}\) is a partition of the sample space when (1) they are mutually exclusive
(\(A_i\cap A_j=\varnothing\) for \(i\neq j\)) and (2) they cover all possibilities (\(\bigcup_i A_i=\Omega\)).
In that case, exactly one \(A_i\) occurs on each trial, so the probabilities satisfy
\[
\sum_{i=1}^{n} P(A_i)=1.
\]
Partitions are the formal way of saying “these are all the cases that can happen, and they don’t overlap.”
The formula and why it is true
If \(\{A_i\}\) is a partition and each \(P(A_i)>0\), then \(B\) can be split into disjoint pieces:
\[
B = \bigcup_{i=1}^{n} (B\cap A_i),
\]
and these intersections are disjoint because the \(A_i\) do not overlap. Additivity then gives
\[
P(B)=\sum_{i=1}^{n} P(B\cap A_i).
\]
Using the multiplication rule \(P(B\cap A_i)=P(B\mid A_i)\,P(A_i)\), we obtain the law of total probability:
\[
P(B)=\sum_{i=1}^{n} P(B\mid A_i)\,P(A_i).
\]
Each term is a weighted contribution: \(P(A_i)\) says how common case \(A_i\) is, and \(P(B\mid A_i)\) says how likely \(B\) is inside that case.
Worked example
Suppose there are two cases \(A_1\) and \(A_2\) with \(P(A_1)=0.3\) and \(P(A_2)=0.7\), and
\(P(B\mid A_1)=0.8\), \(P(B\mid A_2)=0.2\). Then
\[
P(B)=0.8\cdot 0.3 + 0.2\cdot 0.7 = 0.24 + 0.14 = 0.38.
\]
This is a weighted average: because \(A_2\) is more common, its conditional probability influences the result more.
Checks, normalization, and interpretation
In real data entry, the weights \(P(A_i)\) might not sum exactly to 1 (rounding, missing cases, or unnormalized “weights”).
Strictly speaking, the law requires \(\sum_i P(A_i)=1\). If the sum is not 1, you can either fix the inputs (add the missing case, correct numbers),
or interpret the values as relative weights and normalize them by dividing each \(P(A_i)\) by \(\sum_j P(A_j)\).
Normalization changes the model, so it should be used intentionally.
University extension
At a more advanced level, the same idea works for continuous partitions. Instead of a finite sum, you get an integral:
\(P(B)=\int P(B\mid X=x)f_X(x)\,dx\). This viewpoint connects the law of total probability to densities, mixtures, and expectation.