Medical College of Wisconsin
CTSICores SearchResearch InformaticsREDCap

Mixture models for undiagnosed prevalent disease and interval-censored incident disease: applications to a cohort assembled from electronic health records. Stat Med 2017 Sep 30;36(22):3583-3595 PMID: 28660629 PMCID: PMC5583012

Pubmed ID



For cost-effectiveness and efficiency, many large-scale general-purpose cohort studies are being assembled within large health-care providers who use electronic health records. Two key features of such data are that incident disease is interval-censored between irregular visits and there can be pre-existing (prevalent) disease. Because prevalent disease is not always immediately diagnosed, some disease diagnosed at later visits are actually undiagnosed prevalent disease. We consider prevalent disease as a point mass at time zero for clinical applications where there is no interest in time of prevalent disease onset. We demonstrate that the naive Kaplan-Meier cumulative risk estimator underestimates risks at early time points and overestimates later risks. We propose a general family of mixture models for undiagnosed prevalent disease and interval-censored incident disease that we call prevalence-incidence models. Parameters for parametric prevalence-incidence models, such as the logistic regression and Weibull survival (logistic-Weibull) model, are estimated by direct likelihood maximization or by EM algorithm. Non-parametric methods are proposed to calculate cumulative risks for cases without covariates. We compare naive Kaplan-Meier, logistic-Weibull, and non-parametric estimates of cumulative risk in the cervical cancer screening program at Kaiser Permanente Northern California. Kaplan-Meier provided poor estimates while the logistic-Weibull model was a close fit to the non-parametric. Our findings support our use of logistic-Weibull models to develop the risk estimates that underlie current US risk-based cervical cancer screening guidelines. Published 2017. This article has been contributed to by US Government employees and their work is in the public domain in the USA.

Author List

Cheung LC, Pan Q, Hyun N, Schiffman M, Fetterman B, Castle PE, Lorey T, Katki HA


Noorie Hyun PhD Assistant Professor in the Institute for Health and Equity department at Medical College of Wisconsin


2-s2.0-85021437043   6 Citations

MESH terms used to index this publication - Major topics in bold

Cohort Studies
Cost-Benefit Analysis
Electronic Health Records
Logistic Models
Middle Aged
Papanicolaou Test
Statistics, Nonparametric
Survival Analysis
Uterine Cervical Neoplasms
jenkins-FCD Prod-300 626508253d14e4184314fb9f66322a03a5906796