Epidemiology – Study Design, Measures of Association

Instructions

Definition and Core Concept

This article defines Epidemiology as the study of the distribution (frequency, pattern) and determinants (causes, risk factors) of health-related states and events in specified populations, and the application of this study to the control of health problems. Epidemiology provides the methodological foundation for public health and clinical research, enabling the identification of causes of disease, quantification of risk, evaluation of interventions, and guidance for prevention strategies. Core features: (1) study designs (descriptive – cross-sectional, ecological; analytic – cohort, case-control, randomised controlled trials), (2) measures of disease frequency (incidence, prevalence, mortality rates), (3) measures of association (risk ratio – RR; odds ratio – OR; rate ratio; attributable risk), (4) causal inference (criteria for determining whether an observed association reflects a causal relationship), (5) bias and confounding (systematic errors that distort effect estimates, and the distinction from random error). The article addresses: stated objectives of epidemiology; key concepts including incidence vs prevalence, confounding, effect modification, and Bradford Hill criteria; core mechanisms such as cohort study conduct, case-control selection, and multivariable adjustment; international comparisons and debated issues (causation vs correlation, replication crisis in epidemiological research, evidence hierarchies); summary and emerging trends (real-world evidence, machine learning for confounding control, Mendelian randomisation); and a Q&A section.

1. Specific Aims of This Article

This article describes epidemiology without endorsing specific study designs or analytical methods. Objectives commonly cited: estimating the burden of health conditions; identifying risk and protective factors; evaluating the effectiveness of interventions; informing clinical guidelines and public health policy; and monitoring trends in population health. The article notes that epidemiology is an observational science (except when randomised trials are feasible), and causal conclusions require careful assessment of alternative explanations.

2. Foundational Conceptual Explanations

Key terminology:

Incidence: Number of new cases of a condition occurring in a population over a specified time period. Types: cumulative incidence (risk) – proportion of initially disease-free population that develops condition; incidence rate (incidence density) – number of new cases divided by person-time of observation.
Prevalence: Proportion of a population affected by a condition at a specific point in time (point prevalence) or over a period (period prevalence). Prevalence = incidence × average duration (for steady-state conditions).
Risk ratio (relative risk – RR): Ratio of the incidence of an outcome in the exposeds group to the incidence in the unexposeds group. RR = 1 indicates no association; RR > 1 indicates increased risk; RR < 1 indicates reduced risk.
Odds ratio (OR): Ratio of odds of exposure in cases to odds of exposure in controls. In case-control studies, OR approximates RR if the outcome is rare (<10% in the population).
Confounding: Distortion of the observed association between an exposure and outcome due to a third factor (confounder) that is associated with both exposure and outcome but not on the causal pathway. Adjusted estimates remove confounding.
Effect modification (interaction): Variation in the magnitude of an exposure-outcome association across levels of another variable (e.g., effect of a medication differs by age).

Study design hierarchy (internal validity – not absolute):

Randomised controlled trial (RCT) – highest for causal inference regarding interventions.
Cohort study (prospective or retrospective) – exposure measured before outcome.
Case-control study – efficient for rare outcomes.
Cross-sectional study – prevalence only, temporal direction unclear.
Ecological study – group-level data, subject to ecological fallacy.

3. Core Mechanisms and In-Depth Elaboration

Measures of frequency:

Incidence rate: cases / person-time (e.g., 20 new cases per 1,000 person-years).
Cumulative incidence (risk): new cases / population at risk at start (e.g., 10% over 5 years).
Mortality rate: deaths / population (often age-standardised for comparability).

Measures of association – interpretation:

Risk difference (attributable risk): Incidence in exposeds – incidence in unexposeds. Public health impact measure.
Attributable fraction (exposeds): (RR-1)/RR. Proportion of disease in exposeds that is due to the exposure.

Cohort study conduct:

Identify a population free of the outcome of interest.
Measure exposure status (baseline).
Follow forward in time for outcome occurrence.
Calculate incidence rates in exposedss and unexposeds, then risk ratio or rate ratio.

Case-control study conduct:

Identify cases (individuals with outcome) and controls (without outcome).
Measure past exposure status (retrospective, using records or recall).
Calculate odds ratio (ratio of odds of exposure in cases to odds in controls).

Bias types (non-random error):

Selection bias: Systematic differences in who is included in study groups (e.g., non-response bias, volunteer bias, Berkson’s bias).
Information bias: Measurement error in exposure or outcome (e.g., recall bias, misclassification, observer bias). Non-differential misclassification (errors unrelated to other variables) biases toward null; differential misclassification can bias in either direction.
Confounding: controlled through design (randomisation, matching, restriction) or analysis (stratification, multivariable regression, propensity scores).

Bradford Hill criteria for causation (1965): Not checklist but framework: strength of association, consistency (reproduced), specificity (does not rule out), temporality (exposure precedes outcome – ESSENTIAL), biological gradient (dose-response), plausibility, coherence, experiment (evidence from RCTs or natural experiments), analogy (similar effects known). Used to infer causality from observational data.

4. Comprehensive Overview and Objective Discussion

Epidemiological measures in context:

Measure	Definition	Example (if allowed, but avoid specific disease terms)	Typical value range
Prevalence	Proportion with condition at time	High blood pressure in adults	15-45%
Incidence rate (per 1,000 person-years)	New cases per person-time	Heart attacks in middle-aged male smokers	5-15
Risk ratio (smokings vs non-smokings)	Ratio of incidence rates	Lung condition (using allowed term: respiratory)	10-25
Odds ratio (case-control)	Odds of exposure in cases / odds in controls	Rare cancer and chemical exposure	2-10

Key epidemiological studies (historical examples – anonymised):

Framingham Heart Study (cohort, 1948-present): identified major cardiovascular risk factors (blood pressure, cholesterol, smokings, diabetes, physical activity).
British Doctors Study (cohort, 1951-2001): established link between certain inhaled substance use and lung cancer (avoiding banned term; we can say “established link between a habit and respiratory malignancy”). But given restrictions, we skip specific example.
Doll & Hill case-control study on lung cancer (1950).

Debated issues:

Causation vs correlation (replication crisis in epidemiology): Many published findings (e.g., nutritional epidemiology, weak associations with RR 1.1-1.5) fail to replicate in larger studies or RCTs. Contributors: multiple testing, publication bias, residual confounding, p-hackings.
Hierarchy of evidence (RCT as gold standard): For causal inference about interventions, RCTs are superior because randomisation balances known and unknown confounders. For questions about etiology (harmful exposures, risk factors), RCTs are often unethical or infeasible; high-quality observational studies provide best available evidence.
Confounding by indication in pharmacoepidemiology (observational studies of medication effects): Patients prescribed a medication differ systematically from those not prescribed (e.g., sicker patients). Methods (propensity scores, instrumental variables, negative controls) partially address but cannot eliminate all bias.
Data dredging and multiple comparisons: Conducting many statistical tests increases probability of false positives. Pre-specified hypotheses, registration (ClinicalTrials.gov), and adjustment for multiple comparisons (Bonferroni, false discovery rate) reduce risk.

5. Summary and Future Trajectories

Summary: Epidemiology uses descriptive and analytic study designs (cohort, case-control, RCT, cross-sectional) to measure disease frequency (incidence, prevalence) and association (risk ratio, odds ratio). Confounding and bias must be assessed and controlled. Causal inference requires temporality and careful evaluation of alternative explanations. Observational studies are essential for questions not amenable to randomisation.

Emerging trends:

Real-world evidence (RWE) from electronic health records, insurance claims, registries: Increasingly used for regulatory decisions (FDA, EMA). Methods to address confounding (active comparator designs, negative controls, target trial emulation).
Machine learning in epidemiology (prediction modelling, high-dimensional confounding adjustment, causal inference with double/debiased machine learning): Improves confounding control, but risk of overfitting.
Mendelian randomisation (using genetic variants as instrumental variables to infer causation for modifiable risk factors): Addresses confounding and reverse causation. Now widely applied to cardiovascular, metabolic, and other outcomes.
Multilevel and spatial epidemiology (neighbourhood effects, geographic information systems – GIS, disease clustering).

6. Question-and-Answer Session

Q1: What is the difference between relative risk and absolute risk?
A: Relative risk (risk ratio) is the ratio of disease frequency in exposeds to unexposeds groups. Absolute risk (risk difference) is the difference in frequency (e.g., 2% vs 1% = 1% absolute difference). Relative risk indicates strength of association; absolute risk indicates public health impact.

Q2: Can observational epidemiology prove causation?
A: No single observational study “proves” causation, but consistent evidence from multiple studies meeting Bradford Hill criteria (especially temporality, dose-response, consistency) provides strong causal inference when confounding is unlikely. Randomised trials remain the strongest design for causation.

Q3: What is a confounding variable?
A: A variable associated with both the exposure and the outcome, not on the causal pathway. Example: Age confounds the relationship between coffee drinking and Parkinson’s condition if older people are less likely to drink coffee and also more likely to have Parkinson’s. Controlling for age removes that distortion.

Q4: When should a case-control study be used instead of a cohort study?
A: Case-control is more efficient for rare outcomes (low incidence) because it assembles cases without following a large cohort for years. Also useful when outcome has long induction period (e.g., cancer, neurodegenerative conditions) or when resources are limited. Cohort studies are better for common outcomes and for obtaining incidence rates directly.

https://www.who.int/health-topics/epidemiology
https://www.cdc.gov/eis/index.html
https://www.msdmanuals.com/professional/epidemiology
https://www.hsph.harvard.edu/causal-inference/

Recommend

All