6 Data Extraction: How to Turn a Paper into Usable Data
Why Data Extraction Matters
Once a study is included, the next step is to extract key information into our shared Google Sheet. This is where we transform dense, complex papers into clean, structured data that can be analyzed statistically.
The goal: one row = one effect size, with enough information to be pooled alongside other studies.
What You’ll Be Working In
We use a shared Google Sheet template. Each row represents a unique study result—often stratified by exposure window (e.g., full pregnancy vs. third trimester). You’ll populate fields like:
Column | What to Extract |
---|---|
Study ID | Author name and year (e.g., Zhang 2022) |
Country | Country or countries where the study was conducted |
Outcome | Preterm birth, low birth weight, stillbirth, etc. |
Assessment Period | Exposure window: Entire Pregnancy, Trimester 1, Trimester 2, Trimester 3, Preconception |
Effect Type | OR, RR, HR (we may convert RR/HR to OR later) |
Effect Size | The reported value (e.g., OR = 1.24) |
CI Lower / CI Upper | The bounds of the 95% confidence interval (e.g., 1.08, 1.43) |
Adjusted or Unadjusted? | Usually “Adjusted” unless stated otherwise |
Exposure Metric | Per 10 μg/m³ increase, quartiles, IQR, heatwave, etc. |
Covariates Adjusted For | List of covariates from the model (e.g., maternal age, parity, SES) |
Trimester-specific? | Yes/No — helpful for subgroup analysis |
Per Details | The specific metric: “Per 10 μg/m³”, “Q1 vs Q4”, etc. |
Notes | Anything unclear or nonstandard (e.g., “Effect estimate from sensitivity analysis”) |
Important: If the paper includes multiple outcomes or exposure windows, please enter more than one row for the same study.
Example
From a fictional study:
“An interquartile range increase in PM₂.₅ during the third trimester was associated with an OR of 1.32 (95% CI: 1.10–1.60) for preterm birth.”
You would enter:
-
Outcome: Preterm birth
-
Assessment Period: T3
-
Effect Type: OR
-
Effect Size: 1.32
-
CI Lower: 1.10
-
CI Upper: 1.60
-
Per Details: “Per IQR increase”
-
Trimester-specific?: Yes
Common Challenges (and How to Handle Them)
Situation | What to Do |
---|---|
Study reports RR or HR | Note the effect type and still write down the values. Cara and I will handle the conversion. |
No CI given | Check if standard error or p-value is reported. If not, flag it. |
Effect size only in a figure | Write down which figure it is (eg. Figure 1A) and Cara and I will extract it. |
Study reports multiple models | Use the fully adjusted model, not crude estimates. |
Multiple exposure windows | Enter one row per window (e.g., one for full pregnancy, one for T3). |
Unusual exposure metrics | Write exactly what was reported (e.g., “per 20 μg/m³ increase”) in the Per Details column. |
-
Be precise. Double-check numbers.
-
Be consistent with how you copy and enter text (e.g., always use “PTB” for Preterm birth not “preterm” or “Pre Term Birth”).
-
If you’re unsure – leave a comment in the Notes column or tag your question.
Media Attributions
- Screenshot 2025-07-14 093606