Faculty Collaboration Database

Medical College of Wisconsin

CTSI Cores Search Research Informatics REDCap

Challenges in risk estimation using routinely collected clinical data: The example of estimating cervical cancer risks from electronic health-records. Prev Med 2018 Jun;111:429-435

Date

12/10/2017

Scopus ID

2-s2.0-85037606766 (requires institutional sign-in at Scopus site) 13 Citations

Abstract

Electronic health-records (EHR) are increasingly used by epidemiologists studying disease following surveillance testing to provide evidence for screening intervals and referral guidelines. Although cost-effective, undiagnosed prevalent disease and interval censoring (in which asymptomatic disease is only observed at the time of testing) raise substantial analytic issues when estimating risk that cannot be addressed using Kaplan-Meier methods. Based on our experience analysing EHR from cervical cancer screening, we previously proposed the logistic-Weibull model to address these issues. Here we demonstrate how the choice of statistical method can impact risk estimates. We use observed data on 41,067 women in the cervical cancer screening program at Kaiser Permanente Northern California, 2003-2013, as well as simulations to evaluate the ability of different methods (Kaplan-Meier, Turnbull, Weibull and logistic-Weibull) to accurately estimate risk within a screening program. Cumulative risk estimates from the statistical methods varied considerably, with the largest differences occurring for prevalent disease risk when baseline disease ascertainment was random but incomplete. Kaplan-Meier underestimated risk at earlier times and overestimated risk at later times in the presence of interval censoring or undiagnosed prevalent disease. Turnbull performed well, though was inefficient and not smooth. The logistic-Weibull model performed well, except when event times didn't follow a Weibull distribution. We have demonstrated that methods for right-censored data, such as Kaplan-Meier, result in biased estimates of disease risks when applied to interval-censored data, such as screening programs using EHR data. The logistic-Weibull model is attractive, but the model fit must be checked against Turnbull non-parametric risk estimates.

Author List

Landy R, Cheung LC, Schiffman M, Gage JC, Hyun N, Wentzensen N, Kinney WK, Castle PE, Fetterman B, Poitras NE, Lorey T, Sasieni PD, Katki HA

MESH terms used to index this publication - Major topics in bold

Adult
California
Early Detection of Cancer
Electronic Health Records
Female
Humans
Mass Screening
Middle Aged
Models, Statistical
Prevalence
Risk Assessment
Uterine Cervical Neoplasms

© 2025 Clinical & Translational Science Institute
Medical College of Wisconsin
8701 Watertown Plank Road
Milwaukee, WI 53223

Publication and ontology data from NCBI | Disclaimer and Copyright

This site is a collaborative effort of the Medical College of Wisconsin and the Clinical and Translational Science Institute (CTSI), part of the Clinical and Translational Science Award program funded by the National Center for Advancing Translational Sciences (Grant Number 2UL1TR001436) at the National Institutes of Health (NIH).

Profiles for MCW, MU, MSOE, UWM, BCW, CW, Froedtert, and VA Faculty