Loading…

Linearity for the Hypergeometric Distribution

How does linearity work for hypergeometric distribution when computing the mean (and what changes for the variance)?

Subject: Statistics Chapter: Discrete Random Variables and Their Probability Distributions Topic: The Hypergeometric Probability Distribution Answer included
how does linearity work for hypergeometric distribution hypergeometric distribution linearity of expectation indicator variables expected value sampling without replacement covariance variance of hypergeometric
Accepted answer Answer included

The keyword “how does linearity work for hypergeometric distribution” is answered most directly by expressing the hypergeometric count as a sum of indicator random variables. This makes the mean immediate by linearity of expectation, even though the draws are dependent.

1) Setup: the hypergeometric model

A finite population has size \(N\). Exactly \(K\) items are labeled “success” and \(N-K\) are “failure”. A sample of size \(n\) is drawn without replacement.

Let \(X\) be the number of successes in the sample. Then \(X\) follows a hypergeometric distribution with parameters \((N, K, n)\).

2) The key idea: linearity of expectation

Define indicator variables for the draws:

\[ I_i = \begin{cases} 1, & \text{if the \(i\)-th draw is a success} \\ 0, & \text{otherwise} \end{cases} \quad\text{for } i=1,2,\dots,n. \]

The total number of successes is the sum of these indicators:

\[ X = I_1 + I_2 + \cdots + I_n. \]

Linearity of expectation states that for any random variables \(Y_1,\dots,Y_n\), \[ \mathbb{E}\!\left[\sum_{i=1}^{n} Y_i\right] = \sum_{i=1}^{n} \mathbb{E}[Y_i]. \] Independence is not required.

3) Applying linearity to find the hypergeometric mean

First compute the expectation of one indicator. Each draw (viewed marginally) is a success with probability \(p = K/N\), so

\[ \mathbb{E}[I_i] = 1 \cdot \mathbb{P}(I_i=1) + 0 \cdot \mathbb{P}(I_i=0) = \mathbb{P}(I_i=1) = \frac{K}{N}. \]

Then linearity gives

\[ \mathbb{E}[X] = \mathbb{E}\!\left[\sum_{i=1}^{n} I_i\right] = \sum_{i=1}^{n} \mathbb{E}[I_i] = \sum_{i=1}^{n} \frac{K}{N} = n \cdot \frac{K}{N}. \]

4) Where dependence matters: variance requires covariances

Although linearity makes the mean straightforward, the variance must account for dependence between draws. Using \(\mathrm{Var}\!\left(\sum Y_i\right)=\sum \mathrm{Var}(Y_i)+2\sum_{i<j}\mathrm{Cov}(Y_i,Y_j)\), one obtains

\[ \mathrm{Var}(X) = \sum_{i=1}^{n} \mathrm{Var}(I_i) + 2 \sum_{1 \le i < j \le n} \mathrm{Cov}(I_i, I_j). \]
Component Value in the hypergeometric setting Reason
\(\mathrm{Var}(I_i)\) \(\dfrac{K}{N}\left(1-\dfrac{K}{N}\right)\) \(I_i\) is Bernoulli with \(p=K/N\) marginally.
\(\mathrm{Cov}(I_i,I_j)\) for \(i\neq j\) \(-\dfrac{K(N-K)}{N^2(N-1)}\) Without replacement makes successes slightly less likely after a success (negative dependence).

A compact derivation of the covariance uses:

\[ \mathbb{P}(I_i=1, I_j=1) = \frac{K}{N} \cdot \frac{K-1}{N-1}, \quad \mathbb{E}[I_i] = \frac{K}{N}. \]
\[ \mathrm{Cov}(I_i,I_j) = \mathbb{E}[I_i I_j] - \mathbb{E}[I_i]\mathbb{E}[I_j] = \frac{K}{N} \cdot \frac{K-1}{N-1} - \left(\frac{K}{N}\right)^2 = -\frac{K(N-K)}{N^2(N-1)}. \]

Substituting into the variance formula (and simplifying) yields the standard hypergeometric variance:

\[ \mathrm{Var}(X) = n \cdot \frac{K}{N}\left(1-\frac{K}{N}\right)\cdot \frac{N-n}{N-1}. \]

5) Numerical example

Suppose \(N=20\), \(K=7\), and \(n=5\). Then \(p=K/N=7/20=0.35\).

\[ \mathbb{E}[X] = n \cdot \frac{K}{N} = 5 \cdot 0.35 = 1.75. \]
\[ \mathrm{Var}(X) = 5 \cdot 0.35 \cdot (1-0.35) \cdot \frac{20-5}{20-1} = 5 \cdot 0.35 \cdot 0.65 \cdot \frac{15}{19} \approx 0.897. \]

6) Visualization: “sum of indicators” view that explains linearity

0 0.5 1 Draw 1 2 3 4 5 p = K/N = 0.35 Each bar is \( \mathbb{E}[I_i] = K/N \); sum of bars is \( \mathbb{E}[X] = \sum \mathbb{E}[I_i] = n \cdot (K/N) \)
The figure uses an example with \(N=20\), \(K=7\), \(n=5\). Each draw contributes the same expected amount \(K/N\), so linearity adds the contributions to obtain \(n \cdot (K/N)\).

7) Summary of “linearity” for the hypergeometric distribution

  • Expressing \(X\) as \(X=\sum_{i=1}^{n} I_i\) makes the mean immediate: \[ \mathbb{E}[X]=n \cdot \frac{K}{N}. \]
  • Independence is unnecessary for the mean because linearity of expectation always holds.
  • Dependence matters for the variance because covariance terms appear, producing the finite population correction: \[ \mathrm{Var}(X)=n \cdot \frac{K}{N}\left(1-\frac{K}{N}\right)\cdot \frac{N-n}{N-1}. \]
Vote on the accepted answer
Upvotes: 0 Downvotes: 0 Score: 0
Community answers No approved answers yet

No approved community answers are published yet. You can submit one below.

Submit your answer Moderated before publishing

Plain text only. Your name is required. Links, HTML, and scripts are blocked.

Fresh

Most recent questions

109 questions · Sorted by newest first

Showing 1–10 of 109
per page
  1. Mar 5, 2026 Published
    Formula of the Variance (Population and Sample)
    Statistics Numerical Descriptive Measures Measures of Dispersion for Ungrouped Data
  2. Mar 5, 2026 Published
    Mean Median Mode Calculator (Formulas, Interpretation, and Example)
    Statistics Numerical Descriptive Measures Measures of Central Tendency for Ungrouped Data
  3. Mar 4, 2026 Published
    How to Calculate Standard Deviation in Excel (STDEV.S vs STDEV.P)
    Statistics Numerical Descriptive Measures Measures of Dispersion for Ungrouped Data
  4. Mar 4, 2026 Published
    Suppose T and Z Are Random Variables: How T Relates to Z in the t Distribution
    Statistics Estimation of the Mean and Proportion Estimation of a Population Mean σ Not Known the T Distribution
  5. Mar 4, 2026 Published
    What Does R Squared Mean in Statistics (Coefficient of Determination)
    Statistics Simple Linear Regression Coefficient of Determination
  6. Mar 3, 2026 Published
    Box and Plot Graph (Box Plot) Explained
    Statistics Numerical Descriptive Measures Box and Whisker Plot
  7. Mar 3, 2026 Published
    How to Calculate a Z Score
    Statistics Continuous Random Variables and the Normal Distribution Standardizing a Normal Distribution
  8. Mar 3, 2026 Published
    How to Calculate Relative Frequency
    Statistics Organizing and Graphing Data Organizing and Graphing Quantitative Data
  9. Mar 3, 2026 Published
    Is zero an even number?
    Statistics Numerical Descriptive Measures Measures of Central Tendency for Ungrouped Data
  10. Mar 3, 2026 Published
    Monty Hall Paradox (Conditional Probability Explained)
    Statistics Probability Marginal and Conditional Probabilities
Showing 1–10 of 109
Open the calculator for this topic