"

6 Data Extraction: How to Turn a Paper into Usable Data

Why Data Extraction Matters

Once a study is included, the next step is to extract key information into our shared Google Sheet. This is where we transform dense, complex papers into clean, structured data that can be analyzed statistically.

The goal: one row = one effect size, with enough information to be pooled alongside other studies.

What You’ll Be Working In

We use a shared Google Sheet template. Each row represents a unique study result—often stratified by exposure window (e.g., full pregnancy vs. third trimester). You’ll populate fields like:

Column What to Extract
Study ID Author name and year (e.g., Zhang 2022)
Country Country or countries where the study was conducted
Outcome Preterm birth, low birth weight, stillbirth, etc.
Assessment Period Exposure window: Entire Pregnancy, Trimester 1, Trimester 2, Trimester 3, Preconception
Effect Type OR, RR, HR (we may convert RR/HR to OR later)
Effect Size The reported value (e.g., OR = 1.24)
CI Lower / CI Upper The bounds of the 95% confidence interval (e.g., 1.08, 1.43)
Adjusted or Unadjusted? Usually “Adjusted” unless stated otherwise
Exposure Metric Per 10 μg/m³ increase, quartiles, IQR, heatwave, etc.
Covariates Adjusted For List of covariates from the model (e.g., maternal age, parity, SES)
Trimester-specific? Yes/No — helpful for subgroup analysis
Per Details The specific metric: “Per 10 μg/m³”, “Q1 vs Q4”, etc.
Notes Anything unclear or nonstandard (e.g., “Effect estimate from sensitivity analysis”)

Important: If the paper includes multiple outcomes or exposure windows, please enter more than one row for the same study.

Example

From a fictional study:

“An interquartile range increase in PM₂.₅ during the third trimester was associated with an OR of 1.32 (95% CI: 1.10–1.60) for preterm birth.”

You would enter:

  • Outcome: Preterm birth

  • Assessment Period: T3

  • Effect Type: OR

  • Effect Size: 1.32

  • CI Lower: 1.10

  • CI Upper: 1.60

  • Per Details: “Per IQR increase”

  • Trimester-specific?: Yes

Common Challenges (and How to Handle Them)

Situation What to Do
Study reports RR or HR Note the effect type and still write down the values. Cara and I will handle the conversion.
No CI given Check if standard error or p-value is reported. If not, flag it.
Effect size only in a figure Write down which figure it is (eg. Figure 1A) and Cara and I will extract it.
Study reports multiple models Use the fully adjusted model, not crude estimates.
Multiple exposure windows Enter one row per window (e.g., one for full pregnancy, one for T3).
Unusual exposure metrics Write exactly what was reported (e.g., “per 20 μg/m³ increase”) in the Per Details column.
  • Be precise. Double-check numbers.

  • Be consistent with how you copy and enter text (e.g., always use “PTB” for Preterm birth not “preterm” or “Pre Term Birth”).

  • If you’re unsure – leave a comment in the Notes column or tag your question.

Media Attributions

  • Screenshot 2025-07-14 093606

License

Meta-analysis with Global Environmental Health Solutions Lab Copyright © by sophieacotton. All Rights Reserved.