Loading…

Experiment and Sampling Design: Planning Data Collection for Estimation and Causality

In statistics, how should an experiment and sampling design be constructed for a study that both estimates a population parameter and tests a treatment effect, and how is proportional stratified allocation computed for a target sample size?

Subject: Statistics Chapter: Sampling Distributions Topic: Population and Sampling Distributions Answer included
experiment and sampling design sampling design experimental design random sampling stratified sampling simple random sample random assignment control group
Accepted answer Answer included

Core distinction: sampling design vs experimental design

An experiment and sampling design has two logically separate parts:

  • Sampling design: how observational units are selected from a population (supports estimation and generalization).
  • Experimental design: how treatments are assigned to units (supports causal conclusions when random assignment is used).

Key rule: Random sampling protects external validity (representativeness), while random assignment protects internal validity (reduces confounding). One does not substitute for the other.

Concrete study scenario

A university wants to (1) estimate the mean weekly study time of all enrolled students and (2) test whether a new study-skills workshop increases study time compared with no workshop. The same project therefore requires both a sampling design (to select students) and an experimental design (to assign the workshop).

Step-by-step sampling design (for estimation)

Step 1: Define population, parameter, and sampling frame

  • Population: all currently enrolled students.
  • Parameter of interest: population mean weekly study time \(\mu\) (hours/week).
  • Sampling frame: an up-to-date enrollment roster with contact information.

Step 2: Choose a probability sampling method

A common choice is stratified random sampling (for example, by class year) to ensure each stratum is represented and to improve precision when variability differs across strata. With strata sizes \(N_1,\dots,N_H\) and total \(N=\sum_{h=1}^H N_h\), a proportional allocation sets: \[ n_h = n \cdot \frac{N_h}{N}, \] where \(n\) is the planned total sample size.

Step 3: Worked proportional allocation example

Suppose the enrollment counts by class year are:

Stratum (class year) Population size \(N_h\)
Year 15000
Year 24000
Year 33000
Total12000

Let the target sample size be \(n=300\). Then: \[ n_1 = 300\cdot\frac{5000}{12000}=300\cdot\frac{5}{12}=125, \] \[ n_2 = 300\cdot\frac{4000}{12000}=300\cdot\frac{1}{3}=100, \] \[ n_3 = 300\cdot\frac{3000}{12000}=300\cdot\frac{1}{4}=75. \]

Stratum Allocation formula Sample size \(n_h\)
Year 1 \(300\cdot\frac{5000}{12000}\) \(125\)
Year 2 \(300\cdot\frac{4000}{12000}\) \(100\)
Year 3 \(300\cdot\frac{3000}{12000}\) \(75\)

Step 4: Execution details that reduce sampling error

  • Use a random-number generator within each stratum to select \(n_h\) students.
  • Predefine contact attempts and nonresponse follow-up to limit nonresponse bias.
  • Record response rates by stratum; consider weighting if nonresponse is differential.

Step-by-step experimental design (for causal effect)

Step 1: Define treatment, control, and response

  • Treatment: invitation + access to the study-skills workshop.
  • Control: no workshop during the study period.
  • Response variable: weekly study time after the intervention (hours/week).

Step 2: Use random assignment (not random sampling) to create comparable groups

From the sampled students, randomly assign participants to treatment and control. If the total experimental sample is \(n=300\), a balanced allocation assigns: \[ n_T = 150,\qquad n_C = 150. \]

Step 3: Consider blocking to control an important source of variability

If class year strongly influences study time, block by class year and randomize within each block. With the stratified sample sizes above, block-specific balanced assignment yields: \[ \text{Year 1: } 62 \text{ treatment},\ 63 \text{ control}\quad(\text{or vice versa}), \] \[ \text{Year 2: } 50 \text{ treatment},\ 50 \text{ control}, \] \[ \text{Year 3: } 37 \text{ treatment},\ 38 \text{ control}. \]

Small imbalances can occur when \(n_h\) is odd; the defining property is random assignment within each block.

Step 4: Replication and measurement protocol

  • Replication: many units per group (not repeated measures on the same unit) support stable effect estimation.
  • Measurement: use the same survey instrument and time window for treatment and control to reduce measurement bias.
  • Compliance: record workshop attendance; plan whether the estimand is intention-to-treat or per-protocol.

Design summary table

Design component Primary purpose Typical tool Main threat addressed
Sampling design Generalize from sample to population Stratified random sampling Selection bias, undercoverage
Experimental design Estimate causal effect of treatment Random assignment, control group Confounding, systematic group differences
Blocking Reduce variability and improve precision Randomize within blocks Known heterogeneity (e.g., class year)
Replication Stabilize estimates and enable inference Many units per condition High sampling variability

Visualization: the workflow of an experiment and sampling design

Workflow diagram for sampling design and experimental design A two-lane flowchart: the left lane shows sampling design steps from population to selected sample; the right lane shows experimental design steps from sample to randomized treatment/control, including blocking and measurement. Sampling design Experimental design Define population and sampling frame Choose probability method (SRS, stratified, cluster) Select sample using random mechanism Define treatment, control, and response Random assignment (optionally within blocks) Measure outcomes consistently; compare groups Selected sample becomes experimental units Random sampling ≠ random assignment (both can be needed).
The sampling design determines how units enter the study, and the experimental design determines how treatment is assigned and measured. Separating these roles helps prevent bias and confounding while supporting valid generalization.

Typical failure modes to avoid

  • Convenience samples for estimation: selecting volunteers can distort population conclusions due to selection bias.
  • No random assignment in an “experiment”: treatment self-selection creates confounding, turning the study into an observational comparison.
  • Undercoverage in the sampling frame: missing subgroups in the roster damages external validity even if selection is random within the frame.
  • Nonresponse not handled: low or differential response rates can dominate sampling error.
Vote on the accepted answer
Upvotes: 0 Downvotes: 0 Score: 0
Community answers No approved answers yet

No approved community answers are published yet. You can submit one below.

Submit your answer Moderated before publishing

Plain text only. Your name is required. Links, HTML, and scripts are blocked.

Fresh

Most recent questions

109 questions · Sorted by newest first

Showing 1–10 of 109
per page
  1. Mar 5, 2026 Published
    Formula of the Variance (Population and Sample)
    Statistics Numerical Descriptive Measures Measures of Dispersion for Ungrouped Data
  2. Mar 5, 2026 Published
    Mean Median Mode Calculator (Formulas, Interpretation, and Example)
    Statistics Numerical Descriptive Measures Measures of Central Tendency for Ungrouped Data
  3. Mar 4, 2026 Published
    How to Calculate Standard Deviation in Excel (STDEV.S vs STDEV.P)
    Statistics Numerical Descriptive Measures Measures of Dispersion for Ungrouped Data
  4. Mar 4, 2026 Published
    Suppose T and Z Are Random Variables: How T Relates to Z in the t Distribution
    Statistics Estimation of the Mean and Proportion Estimation of a Population Mean σ Not Known the T Distribution
  5. Mar 4, 2026 Published
    What Does R Squared Mean in Statistics (Coefficient of Determination)
    Statistics Simple Linear Regression Coefficient of Determination
  6. Mar 3, 2026 Published
    Box and Plot Graph (Box Plot) Explained
    Statistics Numerical Descriptive Measures Box and Whisker Plot
  7. Mar 3, 2026 Published
    How to Calculate a Z Score
    Statistics Continuous Random Variables and the Normal Distribution Standardizing a Normal Distribution
  8. Mar 3, 2026 Published
    How to Calculate Relative Frequency
    Statistics Organizing and Graphing Data Organizing and Graphing Quantitative Data
  9. Mar 3, 2026 Published
    Is zero an even number?
    Statistics Numerical Descriptive Measures Measures of Central Tendency for Ungrouped Data
  10. Mar 3, 2026 Published
    Monty Hall Paradox (Conditional Probability Explained)
    Statistics Probability Marginal and Conditional Probabilities
Showing 1–10 of 109
Open the calculator for this topic