Hardy–Weinberg equilibrium (p, q, genotype frequencies)
The Hardy–Weinberg (HW) model connects allele frequencies to genotype frequencies for a single gene (one
locus) with two alleles: A and a. It provides a baseline expectation for genotype proportions in a large,
randomly mating population with no selection, mutation, migration, or genetic drift (intro-level assumptions).
Core relations
Let p be the frequency of allele A and q be the frequency of allele a.
Because there are only two alleles at this locus, the allele frequencies satisfy:
\[
\begin{aligned}
p + q &= 1
\end{aligned}
\]
Under Hardy–Weinberg equilibrium, the expected genotype frequencies are obtained by expanding
\((p + q)^{2}\):
\[
\begin{aligned}
(p + q)^{2} &= p^{2} + 2pq + q^{2} \\
\Pr(AA) &= p^{2} \\
\Pr(Aa) &= 2pq \\
\Pr(aa) &= q^{2}
\end{aligned}
\]
Mode 1: Starting from an allele frequency (p or q)
If one allele frequency is known, the other is found using \(p + q = 1\). Then genotype frequencies follow directly.
\[
\begin{aligned}
q &= 1 - p \quad (\text{if } p \text{ is given}) \\
p &= 1 - q \quad (\text{if } q \text{ is given})
\end{aligned}
\]
After obtaining both allele frequencies:
\[
\begin{aligned}
\Pr(AA) &= p^{2}, \quad \Pr(Aa) = 2pq, \quad \Pr(aa) = q^{2}
\end{aligned}
\]
If a population size \(N\) is provided, expected genotype counts are computed as:
\[
\begin{aligned}
E(AA) &= N \cdot p^{2} \\
E(Aa) &= N \cdot 2pq \\
E(aa) &= N \cdot q^{2}
\end{aligned}
\]
Mode 2: Starting from genotype counts (AA, Aa, aa)
When observed genotype counts are provided, the sample size is:
\[
\begin{aligned}
N &= AA + Aa + aa
\end{aligned}
\]
Allele frequencies are estimated by counting alleles in the sample. Each AA contributes two A alleles, each
Aa contributes one A allele and one a allele, and each aa contributes two a alleles.
\[
\begin{aligned}
\hat{p} &= \frac{2\cdot AA + Aa}{2N} \\
\hat{q} &= \frac{2\cdot aa + Aa}{2N} = 1 - \hat{p}
\end{aligned}
\]
The expected HW genotype frequencies and counts are then computed using \(\hat{p}\) and \(\hat{q}\):
\[
\begin{aligned}
\Pr(AA) &= \hat{p}^{2}, \quad \Pr(Aa) = 2\hat{p}\hat{q}, \quad \Pr(aa) = \hat{q}^{2} \\
E(AA) &= N \cdot \hat{p}^{2} \\
E(Aa) &= N \cdot 2\hat{p}\hat{q} \\
E(aa) &= N \cdot \hat{q}^{2}
\end{aligned}
\]
Mode 3: Starting from recessive phenotype frequency (q²)
If the recessive phenotype corresponds to genotype aa, its frequency is \(q^{2}\). From this, the allele
frequency \(q\) is found by taking the square root, and then \(p = 1 - q\).
\[
\begin{aligned}
q &= \sqrt{q^{2}} \\
p &= 1 - q
\end{aligned}
\]
Once \(p\) and \(q\) are known, genotype frequencies follow:
\[
\begin{aligned}
\Pr(AA) &= p^{2}, \quad \Pr(Aa) = 2pq, \quad \Pr(aa) = q^{2}
\end{aligned}
\]
If a population size \(N\) is provided, expected counts are:
\[
\begin{aligned}
E(AA) &= N \cdot p^{2}, \quad
E(Aa) = N \cdot 2pq, \quad
E(aa) = N \cdot q^{2}
\end{aligned}
\]
Observed vs expected snapshot (deviation summary)
When observed genotype counts are available, a quick deviation snapshot can be computed as residuals
(Observed − Expected). This helps identify which genotypes are above or below the HW expectation, but it is
not a formal significance test.
\[
\begin{aligned}
R(AA) &= O(AA) - E(AA) \\
R(Aa) &= O(Aa) - E(Aa) \\
R(aa) &= O(aa) - E(aa)
\end{aligned}
\]
A formal Hardy–Weinberg test is typically done with a chi-square test (or an exact test for small samples),
which can be added as a separate advanced feature.