Bayesian posterior updating
Bayesian inference treats an unknown parameter \(\theta\) (or \(\mu\)) as a random variable.
You start with a prior distribution \(p(\theta)\), collect data, and update to a posterior distribution
\(p(\theta\mid data)\). The posterior represents your updated uncertainty after seeing the evidence.
1) Bayes’ rule (the update engine)
Bayes’ rule combines prior belief with data evidence:
\[
p(\theta\mid data)=\frac{p(data\mid \theta)\,p(\theta)}{p(data)}.
\]
Each part has a meaning:
- Prior \(p(\theta)\): what you believed before data.
- Likelihood \(p(data\mid \theta)\): how plausible the data are if \(\theta\) were true.
- Evidence \(p(data)=\int p(data\mid \theta)p(\theta)\,d\theta\): a normalization constant.
- Posterior \(p(\theta\mid data)\): updated belief after data.
In many practical calculations, we work “up to proportionality”:
\(p(\theta\mid data)\propto p(data\mid \theta)\,p(\theta)\),
and the evidence just ensures the posterior integrates to 1.
2) Conjugate priors (fast closed-form updating)
A prior is conjugate to a likelihood if the posterior stays in the same distribution family as the prior.
Conjugacy is valuable because you can update parameters with simple formulas rather than doing numerical integration.
3) Beta–Binomial (coin bias update)
Suppose \(\theta\in(0,1)\) is the probability of success (e.g., “heads”). A common prior for a probability is the Beta distribution:
\[
\theta \sim \mathrm{Beta}(\alpha,\beta),
\qquad
p(\theta)=\frac{1}{B(\alpha,\beta)}\,\theta^{\alpha-1}(1-\theta)^{\beta-1}.
\]
If you observe \(k\) successes out of \(n\) trials, the Binomial likelihood is:
\[
p(k\mid \theta,n)=\binom{n}{k}\theta^k(1-\theta)^{n-k}.
\]
Multiply prior and likelihood (ignoring constants that do not depend on \(\theta\)):
\[
p(\theta\mid k,n)\propto
\theta^{(\alpha-1)+k}(1-\theta)^{(\beta-1)+(n-k)}.
\]
That has the exact Beta form again, so the posterior is:
\[
\theta\mid k,n \sim \mathrm{Beta}(\alpha',\beta'),
\qquad
\alpha'=\alpha+k,\quad \beta'=\beta+(n-k).
\]
Interpretation (pseudo-counts)
The parameters behave like counts:
\(\alpha-1\) acts like prior successes and \(\beta-1\) acts like prior failures.
After data, you simply add real successes and failures.
Posterior mean, variance, MAP
\[
E[\theta\mid data]=\frac{\alpha'}{\alpha'+\beta'},
\qquad
\mathrm{Var}(\theta\mid data)=\frac{\alpha'\beta'}{(\alpha'+\beta')^2(\alpha'+\beta'+1)}.
\]
\[
\hat{\theta}_{\mathrm{MAP}}=
\frac{\alpha'-1}{\alpha'+\beta'-2}
\quad \text{(only if } \alpha'>1 \text{ and } \beta'>1\text{)}.
\]
The posterior mean also equals the posterior predictive probability of the next success for the Beta–Binomial model.
Credible intervals (Bayesian intervals)
A \((100c)\%\) credible interval is an interval \([L,U]\) such that:
\[
P(L\le \theta \le U \mid data)=c.
\]
A common choice is equal-tailed quantiles:
\(L=q_{(1-c)/2}\), \(U=q_{1-(1-c)/2}\), where \(q_p\) is the Beta posterior quantile.
4) Normal–Normal (unknown mean with known variance)
Another classic conjugate pair updates an unknown mean \(\mu\) when the observation noise level is known.
If your prior for \(\mu\) is Normal and the likelihood is Normal, the posterior is also Normal.
\[
\mu \sim \mathcal{N}(\mu_0,\tau_0^2),
\qquad
\bar{x}\mid \mu \sim \mathcal{N}\!\left(\mu,\frac{\sigma^2}{n}\right).
\]
The update is easiest using precision (inverse variance):
\[
\lambda_0=\frac{1}{\tau_0^2},\qquad \lambda_d=\frac{n}{\sigma^2}.
\]
Posterior variance and mean:
\[
\tau_1^2=\left(\lambda_0+\lambda_d\right)^{-1}
=\left(\frac{1}{\tau_0^2}+\frac{n}{\sigma^2}\right)^{-1},
\]
\[
\mu_1=\tau_1^2\left(\lambda_0\mu_0+\lambda_d\bar{x}\right)
=\tau_1^2\left(\frac{\mu_0}{\tau_0^2}+\frac{n\bar{x}}{\sigma^2}\right).
\]
\(\mu_1\) is a precision-weighted average: as \(n\) grows, data precision \(\lambda_d\) increases, so the posterior moves closer to \(\bar{x}\) and becomes narrower.
5) Evidence and model comparison (why \(p(data)\) matters)
In parameter estimation, the evidence \(p(data)\) just normalizes the posterior.
In model selection, evidence becomes central because it measures how well a model predicts the data on average:
it rewards good fit but penalizes overly broad priors.
6) When conjugacy breaks (university extension)
Many modern Bayesian models are non-conjugate, meaning the posterior is not a simple named distribution.
Then we use numerical methods such as:
- MCMC (Markov Chain Monte Carlo) to draw posterior samples,
- Variational inference to approximate the posterior with an easier family,
- Laplace approximation near the posterior mode.
7) What this calculator shows
- Conjugate parameter updates (e.g., \(\alpha'=\alpha+k\), \(\beta'=\beta+n-k\)).
- Posterior summaries (mean, variance/SD, MAP when defined, credible interval).
- An animated curve shift from prior to posterior to visualize how data changes belief.