Genetic Drift Simulation

Biology • Population Genetics

Written by STEM Calculators Team Published January 10, 2026 Updated February 24, 2026

View all topics

Simulation inputs

Population size N (diploid)

Each generation samples \(2N\) alleles.

Initial allele frequency p₀ (A)

q₀ = 1 − p₀.

Generations

Max 500 here to keep charts fast.

Replicate runs

Try 10–200 replicates to see variability.

Random seed (optional)

Same seed → same simulated trajectories.

Stop at fixation

Stop each run once p hits 0 or 1 (then remain fixed)

Speeds simulation and makes fixation/loss obvious.

Genetic drift as random sampling: \[k \sim \text{Binomial}(2N,\,p),\] \[ p'=\frac{k}{2N}\]

Even with the same starting p₀, replicate runs diverge purely by chance.

Graph options

Spaghetti plot Show mean p(t) line

Hover the plot for generation-specific values; zoom and pan are enabled.

Histogram bins (final generation)

Hover bars for counts and percentages.

Highlight run (optional)

If blank, the plot highlights the nearest run under your cursor.

Line density

Useful when R is large and lines are very dense.

Paste or upload CSV (settings)

Paste one row (headers allowed). Supported order: \[ N,\ p0,\ generations,\ replicates,\ seed,\ stopFix \]

stopFix: 1 = yes, 0 = no. Delimiters: comma, semicolon, or tab.

Paste

Upload

Run

Graphs and steps appear after Calculate.

Ready

Rate this calculator

5.0 /5 (2 ratings)

Your rating

Name (optional) Review (optional)

You can update your rating any time.

Recent reviews

2 reviews

Anonymous

5/5

very useful

Anonymous

5/5

nice

Genetic drift simulation (random sampling in finite populations)

Genetic drift is the change in allele frequencies caused by random sampling in a finite population. Even if an allele has no selective advantage, its frequency can rise or fall purely by chance from one generation to the next. Drift is strongest when population size is small and can lead to fixation (allele frequency becomes 1) or loss (allele frequency becomes 0).

Model setup

Consider one locus with two alleles, \(A\) and \(a\). Let \(p_t\) be the frequency of allele \(A\) in generation \(t\), and \(q_t\) be the frequency of allele \(a\). By definition:

\[ \begin{aligned} p_t + q_t &= 1 \end{aligned} \]

The population has size \(N\) diploid individuals, so there are \(2N\) allele copies at this locus each generation. The key idea is that the next generation’s alleles are a random sample of the current generation’s alleles.

Drift as binomial sampling

If the current allele frequency is \(p_t\), then the number of \(A\) alleles in the next generation, denoted \(k_t\), is modeled as a binomial random variable:

\[ \begin{aligned} k_t &\sim \text{Binomial}(2N,\ p_t) \end{aligned} \]

The updated allele frequency is then:

\[ \begin{aligned} p_{t+1} &= \frac{k_t}{2N}, \qquad q_{t+1}=1-p_{t+1} \end{aligned} \]

This calculator repeats this update for the number of generations you choose. Because the update includes randomness, each run produces a different trajectory, even with the same starting value \(p_0\).

Expected behavior and sampling variance

Drift does not have a directional “push” like selection. In the binomial sampling model:

\[ \begin{aligned} \mathbb{E}[p_{t+1}\mid p_t] &= p_t \end{aligned} \]

So the expected allele frequency stays the same in one step, but there is variance due to sampling:

\[ \begin{aligned} \mathrm{Var}(p_{t+1}\mid p_t) &= \frac{p_t(1-p_t)}{2N} \end{aligned} \]

This formula explains two important patterns you can observe in the graphs:

• Drift is stronger when \(N\) is smaller (larger variance).
• Drift is strongest near \(p_t=0.5\) and weakest near \(p_t=0\) or \(p_t=1\).

Fixation and loss

If a run reaches \(p_t=1\), allele \(A\) is fixed. If it reaches \(p_t=0\), allele \(A\) is lost. These are absorbing boundaries in the model: once a run reaches 0 or 1, it stays there.

When the calculator uses many replicate simulations, it estimates:

\[ \begin{aligned} \widehat{P}(\text{fix by generation }G) &= \frac{\#\{r:\ p_G^{(r)}=1\}}{R} \\ \widehat{P}(\text{loss by generation }G) &= \frac{\#\{r:\ p_G^{(r)}=0\}}{R} \end{aligned} \]

Here \(R\) is the number of replicates, and \(p_G^{(r)}\) is the allele frequency at generation \(G\) in replicate \(r\). Runs that are neither 0 nor 1 at generation \(G\) are still segregating.

What the visualizations show

Spaghetti plot: Each line is one replicate trajectory \(p(t)\). With the same \(p_0\), lines spread out due to drift. Some reach fixation or loss earlier than others, especially for small \(N\). Hovering the plot reveals generation-specific values and the highlighted replicate.

Final-generation histogram: This shows the distribution of \(p_G\) across replicates. For strong drift (small \(N\) and/or many generations), the distribution often piles up near 0 and 1 as more runs fix or lose the allele.

Interpreting the replicate summary

The calculator reports the mean and standard deviation of final allele frequencies:

\[ \begin{aligned} \overline{p}_G &= \frac{1}{R}\sum_{r=1}^{R} p_G^{(r)} \\ s_G &= \sqrt{\frac{1}{R-1}\sum_{r=1}^{R}\left(p_G^{(r)}-\overline{p}_G\right)^2} \end{aligned} \]

The mean summarizes the average outcome across replicates, while the standard deviation quantifies how spread out the outcomes are. A large \(s_G\) indicates high variability among runs.

Notes and assumptions

This is a basic drift model with random sampling only. It assumes no selection, no mutation, no migration, and a constant population size \(N\). Because the model is stochastic, using a random seed makes the simulation reproducible: the same inputs and seed generate the same trajectories.

Frequently Asked Questions

What does the genetic drift simulation calculator do?

It simulates random allele-frequency change in a finite diploid population with no selection, mutation, or migration. The tool runs replicate trajectories across generations and summarizes fixation, loss, and the distribution of final p values.

What model is used for genetic drift in this simulation?

Each generation samples 2N allele copies with k ~ Binomial(2N, p_t), then updates p_{t+1} = k/(2N). Random sampling alone causes different replicate trajectories even with the same starting p0.

Why do some runs reach fixation or loss?

In a finite population, sampling noise can push p toward 0 or 1 over time. Once p hits 0 (loss) or 1 (fixation), those boundaries are absorbing in this model, meaning the run stays there.

How does population size N affect drift strength?

Smaller N produces larger sampling variance per generation, so trajectories spread out faster and fixation/loss tends to happen sooner. In the binomial model, Var(p_{t+1} | p_t) = p_t(1 - p_t)/(2N).

How can I reproduce the same simulation results?

Enter the same inputs and provide the same random seed. Using a seed makes the random number sequence repeatable, so the simulated trajectories match across runs.

Genetic Drift Simulation

Simulation inputs

Graph options

Paste

Upload

Run

Spaghetti plot: p(t) across replicates

Histogram: final allele frequency across replicates

Calculation steps

Rate this calculator

Recent reviews

Frequently Asked Questions