Allele frequency from genotype counts (AA, Aa, aa)
This calculator estimates allele frequencies for a single locus with two alleles (A and a) using observed
genotype data. The key idea is simple: count how many A alleles and a alleles appear in the sample, then
divide by the total number of alleles in the population sample.
Definitions
Let the observed genotype counts be \(AA\), \(Aa\), and \(aa\). The total number of individuals in the sample is:
\[
\begin{aligned}
N &= AA + Aa + aa
\end{aligned}
\]
Each individual carries two alleles at this locus, so the total number of alleles in the sample is \(2N\).
Allele counting (how genotypes contribute)
The allele counts are found by summing contributions from each genotype:
\[
\begin{aligned}
\text{A alleles} &= 2\cdot AA + 1\cdot Aa \\
\text{a alleles} &= 2\cdot aa + 1\cdot Aa
\end{aligned}
\]
From these allele counts, the allele frequencies are:
\[
\begin{aligned}
p &= \frac{2\cdot AA + Aa}{2N} \\
q &= \frac{2\cdot aa + Aa}{2N}
\end{aligned}
\]
Quick consistency check
Because there are only two alleles at this locus, the resulting allele frequencies should satisfy:
\[
\begin{aligned}
p + q &= 1
\end{aligned}
\]
In real calculations, you may see a very small deviation from 1 due to rounding when numbers are displayed with limited
decimals.
If you enter genotype frequencies instead of counts
Sometimes you may have genotype frequencies rather than raw counts. Let these be \(f_{AA}\), \(f_{Aa}\), and \(f_{aa}\),
with \(f_{AA} + f_{Aa} + f_{aa} = 1\). The allele frequencies can be computed directly as:
\[
\begin{aligned}
p &= f_{AA} + \frac{1}{2}f_{Aa} \\
q &= f_{aa} + \frac{1}{2}f_{Aa}
\end{aligned}
\]
If a population size \(N\) is provided, allele counts can be recovered using:
\[
\begin{aligned}
\text{A alleles} &= 2N\cdot p \\
\text{a alleles} &= 2N\cdot q
\end{aligned}
\]
Important limitation: phenotype counts are not enough (without assumptions)
If you only know phenotype counts (for example, “dominant phenotype” vs “recessive phenotype”), the genotype split among
dominant individuals (\(AA\) vs \(Aa\)) is unknown. Therefore, allele frequencies \(p\) and \(q\) are generally
not identifiable from phenotype counts alone unless you make extra assumptions (commonly Hardy–Weinberg equilibrium).