How does linearity work for hypergeometric distribution when computing the mean (and what changes for the variance)?

Linearity of expectation gives \( \mathbb{E}[X]=n \cdot (K/N) \) for a hypergeometric count \(X\) without requiring independence, while the variance needs covariances because draws are dependent.

Linearity for the Hypergeometric Distribution

Accepted answer Answer included

The keyword “how does linearity work for hypergeometric distribution” is answered most directly by expressing the hypergeometric count as a sum of indicator random variables. This makes the mean immediate by linearity of expectation, even though the draws are dependent.

1) Setup: the hypergeometric model

A finite population has size \(N\). Exactly \(K\) items are labeled “success” and \(N-K\) are “failure”. A sample of size \(n\) is drawn without replacement.

Let \(X\) be the number of successes in the sample. Then \(X\) follows a hypergeometric distribution with parameters \((N, K, n)\).

2) The key idea: linearity of expectation

Define indicator variables for the draws:

\[ I_i = \begin{cases} 1, & \text{if the \(i\)-th draw is a success} \\ 0, & \text{otherwise} \end{cases} \quad\text{for } i=1,2,\dots,n. \]

The total number of successes is the sum of these indicators:

\[ X = I_1 + I_2 + \cdots + I_n. \]

Linearity of expectation states that for any random variables \(Y_1,\dots,Y_n\), \[ \mathbb{E}\!\left[\sum_{i=1}^{n} Y_i\right] = \sum_{i=1}^{n} \mathbb{E}[Y_i]. \] Independence is not required.

3) Applying linearity to find the hypergeometric mean

First compute the expectation of one indicator. Each draw (viewed marginally) is a success with probability \(p = K/N\), so

\[ \mathbb{E}[I_i] = 1 \cdot \mathbb{P}(I_i=1) + 0 \cdot \mathbb{P}(I_i=0) = \mathbb{P}(I_i=1) = \frac{K}{N}. \]

Then linearity gives

\[ \mathbb{E}[X] = \mathbb{E}\!\left[\sum_{i=1}^{n} I_i\right] = \sum_{i=1}^{n} \mathbb{E}[I_i] = \sum_{i=1}^{n} \frac{K}{N} = n \cdot \frac{K}{N}. \]

4) Where dependence matters: variance requires covariances

Although linearity makes the mean straightforward, the variance must account for dependence between draws. Using \(\mathrm{Var}\!\left(\sum Y_i\right)=\sum \mathrm{Var}(Y_i)+2\sum_{i<j}\mathrm{Cov}(Y_i,Y_j)\), one obtains

\[ \mathrm{Var}(X) = \sum_{i=1}^{n} \mathrm{Var}(I_i) + 2 \sum_{1 \le i < j \le n} \mathrm{Cov}(I_i, I_j). \]

Component	Value in the hypergeometric setting	Reason
\(\mathrm{Var}(I_i)\)	\(\dfrac{K}{N}\left(1-\dfrac{K}{N}\right)\)	\(I_i\) is Bernoulli with \(p=K/N\) marginally.
\(\mathrm{Cov}(I_i,I_j)\) for \(i\neq j\)	\(-\dfrac{K(N-K)}{N^2(N-1)}\)	Without replacement makes successes slightly less likely after a success (negative dependence).

A compact derivation of the covariance uses:

\[ \mathbb{P}(I_i=1, I_j=1) = \frac{K}{N} \cdot \frac{K-1}{N-1}, \quad \mathbb{E}[I_i] = \frac{K}{N}. \]

\[ \mathrm{Cov}(I_i,I_j) = \mathbb{E}[I_i I_j] - \mathbb{E}[I_i]\mathbb{E}[I_j] = \frac{K}{N} \cdot \frac{K-1}{N-1} - \left(\frac{K}{N}\right)^2 = -\frac{K(N-K)}{N^2(N-1)}. \]

Substituting into the variance formula (and simplifying) yields the standard hypergeometric variance:

\[ \mathrm{Var}(X) = n \cdot \frac{K}{N}\left(1-\frac{K}{N}\right)\cdot \frac{N-n}{N-1}. \]

5) Numerical example

Suppose \(N=20\), \(K=7\), and \(n=5\). Then \(p=K/N=7/20=0.35\).

\[ \mathbb{E}[X] = n \cdot \frac{K}{N} = 5 \cdot 0.35 = 1.75. \]

\[ \mathrm{Var}(X) = 5 \cdot 0.35 \cdot (1-0.35) \cdot \frac{20-5}{20-1} = 5 \cdot 0.35 \cdot 0.65 \cdot \frac{15}{19} \approx 0.897. \]

6) Visualization: “sum of indicators” view that explains linearity

The figure uses an example with \(N=20\), \(K=7\), \(n=5\). Each draw contributes the same expected amount \(K/N\), so linearity adds the contributions to obtain \(n \cdot (K/N)\).

7) Summary of “linearity” for the hypergeometric distribution

Expressing \(X\) as \(X=\sum_{i=1}^{n} I_i\) makes the mean immediate: \[ \mathbb{E}[X]=n \cdot \frac{K}{N}. \]
Independence is unnecessary for the mean because linearity of expectation always holds.
Dependence matters for the variance because covariance terms appear, producing the finite population correction: \[ \mathrm{Var}(X)=n \cdot \frac{K}{N}\left(1-\frac{K}{N}\right)\cdot \frac{N-n}{N-1}. \]

Vote on the accepted answer

Upvotes: 0 Downvotes: 0 Score: 0

1) Setup: the hypergeometric model

2) The key idea: linearity of expectation

3) Applying linearity to find the hypergeometric mean

4) Where dependence matters: variance requires covariances

5) Numerical example

6) Visualization: “sum of indicators” view that explains linearity

7) Summary of “linearity” for the hypergeometric distribution

More questions in The Hypergeometric Probability Distribution