In statistics, how should an experiment and sampling design be constructed for a study that both estimates a population parameter and tests a treatment effect, and how is proportional stratified allocation computed for a target sample size?

A valid experiment and sampling design separates (i) sampling design—defining the population/frame and selecting units with a probability method—from (ii) experimental design—defining treatments, control, random assignment, blocking, and replication; proportional stratified allocation uses \(n_h=n\cdot \frac{N_h}{N}\).

Experiment and Sampling Design: Planning Data Collection for Estimation and Causality

Accepted answer Answer included

Core distinction: sampling design vs experimental design

An experiment and sampling design has two logically separate parts:

Sampling design: how observational units are selected from a population (supports estimation and generalization).
Experimental design: how treatments are assigned to units (supports causal conclusions when random assignment is used).

Key rule: Random sampling protects external validity (representativeness), while random assignment protects internal validity (reduces confounding). One does not substitute for the other.

Concrete study scenario

A university wants to (1) estimate the mean weekly study time of all enrolled students and (2) test whether a new study-skills workshop increases study time compared with no workshop. The same project therefore requires both a sampling design (to select students) and an experimental design (to assign the workshop).

Step-by-step sampling design (for estimation)

Step 1: Define population, parameter, and sampling frame

Population: all currently enrolled students.
Parameter of interest: population mean weekly study time \(\mu\) (hours/week).
Sampling frame: an up-to-date enrollment roster with contact information.

Step 2: Choose a probability sampling method

A common choice is stratified random sampling (for example, by class year) to ensure each stratum is represented and to improve precision when variability differs across strata. With strata sizes \(N_1,\dots,N_H\) and total \(N=\sum_{h=1}^H N_h\), a proportional allocation sets: \[ n_h = n \cdot \frac{N_h}{N}, \] where \(n\) is the planned total sample size.

Step 3: Worked proportional allocation example

Suppose the enrollment counts by class year are:

Stratum (class year)	Population size \(N_h\)
Year 1	5000
Year 2	4000
Year 3	3000
Total	12000

Let the target sample size be \(n=300\). Then: \[ n_1 = 300\cdot\frac{5000}{12000}=300\cdot\frac{5}{12}=125, \] \[ n_2 = 300\cdot\frac{4000}{12000}=300\cdot\frac{1}{3}=100, \] \[ n_3 = 300\cdot\frac{3000}{12000}=300\cdot\frac{1}{4}=75. \]

Stratum	Allocation formula	Sample size \(n_h\)
Year 1	\(300\cdot\frac{5000}{12000}\)	\(125\)
Year 2	\(300\cdot\frac{4000}{12000}\)	\(100\)
Year 3	\(300\cdot\frac{3000}{12000}\)	\(75\)

Step 4: Execution details that reduce sampling error

Use a random-number generator within each stratum to select \(n_h\) students.
Predefine contact attempts and nonresponse follow-up to limit nonresponse bias.
Record response rates by stratum; consider weighting if nonresponse is differential.

Step-by-step experimental design (for causal effect)

Step 1: Define treatment, control, and response

Treatment: invitation + access to the study-skills workshop.
Control: no workshop during the study period.
Response variable: weekly study time after the intervention (hours/week).

Step 2: Use random assignment (not random sampling) to create comparable groups

From the sampled students, randomly assign participants to treatment and control. If the total experimental sample is \(n=300\), a balanced allocation assigns: \[ n_T = 150,\qquad n_C = 150. \]

Step 3: Consider blocking to control an important source of variability

If class year strongly influences study time, block by class year and randomize within each block. With the stratified sample sizes above, block-specific balanced assignment yields: \[ \text{Year 1: } 62 \text{ treatment},\ 63 \text{ control}\quad(\text{or vice versa}), \] \[ \text{Year 2: } 50 \text{ treatment},\ 50 \text{ control}, \] \[ \text{Year 3: } 37 \text{ treatment},\ 38 \text{ control}. \]

Small imbalances can occur when \(n_h\) is odd; the defining property is random assignment within each block.

Step 4: Replication and measurement protocol

Replication: many units per group (not repeated measures on the same unit) support stable effect estimation.
Measurement: use the same survey instrument and time window for treatment and control to reduce measurement bias.
Compliance: record workshop attendance; plan whether the estimand is intention-to-treat or per-protocol.

Design summary table

Design component	Primary purpose	Typical tool	Main threat addressed
Sampling design	Generalize from sample to population	Stratified random sampling	Selection bias, undercoverage
Experimental design	Estimate causal effect of treatment	Random assignment, control group	Confounding, systematic group differences
Blocking	Reduce variability and improve precision	Randomize within blocks	Known heterogeneity (e.g., class year)
Replication	Stabilize estimates and enable inference	Many units per condition	High sampling variability

Visualization: the workflow of an experiment and sampling design

The sampling design determines how units enter the study, and the experimental design determines how treatment is assigned and measured. Separating these roles helps prevent bias and confounding while supporting valid generalization.

Typical failure modes to avoid

Convenience samples for estimation: selecting volunteers can distort population conclusions due to selection bias.
No random assignment in an “experiment”: treatment self-selection creates confounding, turning the study into an observational comparison.
Undercoverage in the sampling frame: missing subgroups in the roster damages external validity even if selection is random within the frame.
Nonresponse not handled: low or differential response rates can dominate sampling error.

Vote on the accepted answer

Upvotes: 0 Downvotes: 0 Score: 0

Core distinction: sampling design vs experimental design

Concrete study scenario

Step-by-step sampling design (for estimation)

Step 1: Define population, parameter, and sampling frame

Step 2: Choose a probability sampling method

Step 3: Worked proportional allocation example

Step 4: Execution details that reduce sampling error

Step-by-step experimental design (for causal effect)

Step 1: Define treatment, control, and response

Step 2: Use random assignment (not random sampling) to create comparable groups

Step 3: Consider blocking to control an important source of variability

Step 4: Replication and measurement protocol

Design summary table

Visualization: the workflow of an experiment and sampling design

Typical failure modes to avoid

More questions in Population and Sampling Distributions