Allele Frequency from Genotype Counts

Biology • Population Genetics

Written by STEM Calculators Team Published January 10, 2026 Updated February 24, 2026

Input type Population size N (optional)

This calculator computes allele frequencies from genotype data: \[ p=\frac{2\cdot AA + Aa}{2N},\qquad q=\frac{2\cdot aa + Aa}{2N} \]

If one genotype is missing, you can still solve it if N is provided (the missing value is inferred by subtraction). If you only have phenotype counts (dominant vs recessive), allele frequencies are not identifiable without extra assumptions.

Inputs: genotype counts

Paste (CSV)

Paste one row with three values (AA, Aa, aa). Delimiters: comma, semicolon, or tab. Headers allowed.

Upload CSV

Upload a small CSV file containing AA, Aa, aa (one row or column).

I only have phenotype counts (not genotypes)

With only phenotype counts (dominant phenotype vs recessive phenotype), the genotype breakdown (AA vs Aa among dominants) is unknown. Therefore, p and q are not identifiable without assumptions (for example, assuming Hardy–Weinberg equilibrium).

Ready

Rate this calculator

0.0 /5 (0 ratings)

Be the first to rate.

Your rating

Name (optional) Review (optional)

You can update your rating any time.

Allele frequency from genotype counts (AA, Aa, aa)

This calculator estimates allele frequencies for a single locus with two alleles (A and a) using observed genotype data. The key idea is simple: count how many A alleles and a alleles appear in the sample, then divide by the total number of alleles in the population sample.

Definitions

Let the observed genotype counts be \(AA\), \(Aa\), and \(aa\). The total number of individuals in the sample is:

\[ \begin{aligned} N &= AA + Aa + aa \end{aligned} \]

Each individual carries two alleles at this locus, so the total number of alleles in the sample is \(2N\).

Allele counting (how genotypes contribute)

The allele counts are found by summing contributions from each genotype:

\[ \begin{aligned} \text{A alleles} &= 2\cdot AA + 1\cdot Aa \\ \text{a alleles} &= 2\cdot aa + 1\cdot Aa \end{aligned} \]

From these allele counts, the allele frequencies are:

\[ \begin{aligned} p &= \frac{2\cdot AA + Aa}{2N} \\ q &= \frac{2\cdot aa + Aa}{2N} \end{aligned} \]

Quick consistency check

Because there are only two alleles at this locus, the resulting allele frequencies should satisfy:

\[ \begin{aligned} p + q &= 1 \end{aligned} \]

In real calculations, you may see a very small deviation from 1 due to rounding when numbers are displayed with limited decimals.

If you enter genotype frequencies instead of counts

Sometimes you may have genotype frequencies rather than raw counts. Let these be \(f_{AA}\), \(f_{Aa}\), and \(f_{aa}\), with \(f_{AA} + f_{Aa} + f_{aa} = 1\). The allele frequencies can be computed directly as:

\[ \begin{aligned} p &= f_{AA} + \frac{1}{2}f_{Aa} \\ q &= f_{aa} + \frac{1}{2}f_{Aa} \end{aligned} \]

If a population size \(N\) is provided, allele counts can be recovered using:

\[ \begin{aligned} \text{A alleles} &= 2N\cdot p \\ \text{a alleles} &= 2N\cdot q \end{aligned} \]

Important limitation: phenotype counts are not enough (without assumptions)

If you only know phenotype counts (for example, “dominant phenotype” vs “recessive phenotype”), the genotype split among dominant individuals (\(AA\) vs \(Aa\)) is unknown. Therefore, allele frequencies \(p\) and \(q\) are generally not identifiable from phenotype counts alone unless you make extra assumptions (commonly Hardy–Weinberg equilibrium).

Frequently Asked Questions

How do you calculate allele frequency from genotype counts?

First compute N = AA + Aa + aa. Then p = (2*AA + Aa)/(2*N) and q = (2*aa + Aa)/(2*N), where 2*N is the total number of alleles in the sample.

How do I compute p and q from genotype frequencies instead of counts?

If you have fAA, fAa, and faa with fAA + fAa + faa = 1, then p = fAA + 0.5*fAa and q = faa + 0.5*fAa.

Can I solve allele frequencies if one genotype count is missing?

Yes, if you also provide the population size N. The missing genotype count can be inferred by subtracting the known counts from N before computing p and q.

Why are phenotype counts not enough to determine allele frequencies?

With only dominant vs recessive phenotype counts, the split between AA and Aa among dominants is unknown. Without extra assumptions (such as Hardy-Weinberg equilibrium), p and q are not identifiable from phenotype counts alone.

Allele Frequency from Genotype Counts

Inputs: genotype counts

Paste (CSV)

Upload CSV

Inputs: genotype frequencies

Allele frequencies

Allele composition (100%)

Allele counting diagram

Calculation steps

Rate this calculator

Frequently Asked Questions

Inputs: genotype counts

Paste (CSV)

Upload CSV

Inputs: genotype frequencies

Allele frequencies

Allele composition (100%)

Allele counting diagram

Calculation steps

Rate this calculator

Frequently Asked Questions

Related calculators