Open Health Data Hub



Data Validation Report

Every query result on this site is validated against published statistics from the CDC, NCHS, and CMS. This report shows our automated test suite results.

56/56
Checks Passed
28
Test Cases
4
Datasets
Mar 11, 2026
Last Run

Methodology

Each test case compares a result from our data against a published value from an official CDC or NCHS source. We run two independent layers of validation for every test:

Layer 1
Gold SQL

A hand-written SQL query is executed directly against the DuckDB database on Railway. This tests whether the data itself reproduces published statistics, independent of the AI layer. If Layer 1 fails, the data or our understanding of the codebook is wrong.

Layer 2
NL Query

A natural language question is sent through the full production pipeline: the question goes to our API, Claude generates SQL, Railway executes it, and the result is checked. This tests the end-to-end system that users interact with. If Layer 2 fails but Layer 1 passes, the AI is misinterpreting the question or generating incorrect SQL.

BRFSS Results

11 tests

Behavioral Risk Factor Surveillance System — self-reported survey data, 400K+ respondents/year. Values are weighted prevalence percentages using CDC's _LLCPWT survey weights.

StatisticYearPublishedGold SQLDevNL QueryDevSource
Adult obesity (national)201730.1%30.1%0.030.1%0.0CDC Obesity Maps
Adult obesity (national)201830.9%30.9%0.030.9%0.0CDC Obesity Maps
Adult obesity (West Virginia)201839.5%39.5%0.039.5%0.0CDC State Data
Adult obesity (Colorado)201822.9%22.9%0.022.9%0.0CDC State Data
Current smoking201815.5%15.5%0.015.5%0.0CDC Tobacco Data
Adult obesity (national)202031.9%31.9%0.031.9%0.0CDC BRFSS Overweight and Obesity Dataset
Diagnosed diabetes201810.9%11.4%+0.511.8%+0.9CDC Chronic Disease Indicators — Diabetes
Current asthma20189.2%9.2%0.09.2%0.0CDC Asthma
Physical inactivity201824.5%24.5%0.024.5%0.0CDC PCD
Adult obesity (national)202334.3%32.8%-1.532.8%-1.5CDC Newsroom
Lifetime depression diagnosis (national)202018.5%18.8%+0.318.8%+0.3CDC MMWR 72(24), June 2023

NHANES Results

8 tests

National Health and Nutrition Examination Survey (2021–2023 cycle) — clinical exams + lab measurements. Values are weighted prevalence percentages using WTMEC2YR exam weights.

StatisticYearPublishedGold SQLDevNL QueryDevSource
Obesity overall (BMI≥30)2021–2340.3%40.3%0.039.8%-0.5NCHS Brief #508
Obesity, men (BMI≥30)2021–2339.2%39.2%0.038.7%-0.5NCHS Brief #508
Obesity, women (BMI≥30)2021–2341.3%41.3%0.040.8%-0.5NCHS Brief #508
Total diabetes (incl. undiagnosed)2021–2315.8%13.8%-2.013.8%-2.0NCHS Brief #516
High cholesterol (≥240 mg/dL)2021–2311.3%11.4%+0.111.1%-0.2NCHS Brief #515
Hypertension (measured + Dx)2021–2347.7%50.0%+2.350.0%+2.3NCHS Brief #511
Severe obesity (BMI≥40)2021–239.4%9.4%0.09.3%-0.1NCHS Brief #508
Depression (PHQ-9≥10)2021–2313.1%12.6%-0.512.6%-0.5NCHS Brief #527

Medicare Inpatient (Part A) Results

4 tests

Medicare Inpatient Prospective Payment System (IPPS) — hospital discharges by DRG, ~2M rows across 11 years (2013–2023). Values are counts from the CMS Provider Summary PUF, which only includes hospitals with ≥11 discharges per DRG.

StatisticYearPublishedGold SQLDevNL QueryDevSource
IPPS hospitals20233,1002,941-5.12,941-5.1CMS IPPS PUF
Distinct DRG codes2023600534-11.0534-11.0CMS FY 2023 IPPS Rule
Top DRG: Septicemia (871)2023561,177561,1770.0561,1770.0CMS IPPS PUF
#2 DRG: Heart Failure (291)2023319,367319,3670.0319,3670.0CMS IPPS PUF

Medicare Part D Results

5 tests

Medicare Part D Prescribers by Provider and Drug — 276M rows across 11 years (2013–2023). Published values are aggregate totals from the CMS Public Use File. Prescriber-drug combinations with fewer than 11 claims are suppressed by CMS before release.

StatisticYearPublishedGold SQLDevNL QueryDevSource
Unique prescribers20231,104,1621,104,1620.01,104,1620.0CMS Part D PUF
Total claims20231,393,568,1041,393,568,1040.01,393,568,1040.0CMS Part D PUF
Total drug cost2023$212.7B$212.7B0.0$212.7B0.0CMS Part D PUF
Unique prescribers2019985,533985,5330.0985,5330.0CMS Part D PUF
Total drug cost2019$137.0B$137.0B0.0$137.0B0.0CMS Part D PUF

Notes

Tolerance thresholds

Each test has a pre-defined tolerance (typically 1–2 percentage points for BRFSS, 1.5–5 for NHANES). These account for differences in survey weight versions, age cutoffs, and rounding. A deviation within tolerance is a pass.

BRFSS vs NHANES obesity gap

BRFSS reports ~31–33% obesity; NHANES reports ~40%. This is not an error. BRFSS uses self-reported height/weight (people underreport weight), while NHANES uses clinical measurements. The gap is well-documented in epidemiological literature.

CMS Public Use File suppression

Medicare PUF data suppresses all provider-level rows with fewer than 11 claims, beneficiaries, or discharges. This means aggregate totals from the PUF are systematically lower than universe totals. For Medicare Inpatient, hospital and DRG counts are ~5–15% below CMS-reported totals. For Part D, published values are computed directly from the PUF, so Gold SQL matches exactly.

What each layer catches

Layer 1 failures indicate data issues: wrong codebook interpretation, missing survey weights, incorrect variable coding. Layer 2 failures (with Layer 1 passing) indicate AI issues: the NL-to-SQL model is generating incorrect queries. Both layers passing means the data is correct and the AI can reproduce results from plain English questions.