Medical College of Wisconsin
CTSICores SearchResearch InformaticsREDCap

Semi-Automatically Inducing Semantic Classes of Clinical Research Eligibility Criteria Using UMLS and Hierarchical Clustering. AMIA Annu Symp Proc 2010 Nov 13;2010:487-91

Date

02/25/2011

Pubmed ID

21347026

Pubmed Central ID

PMC3041461

Scopus ID

2-s2.0-84964928019 (requires institutional sign-in at Scopus site)   21 Citations

Abstract

This paper presents a novel approach to learning semantic classes of clinical research eligibility criteria. It uses the UMLS Semantic Types to represent semantic features and the Hierarchical Clustering method to group similar eligibility criteria. By establishing a gold standard using two independent raters, we evaluated the coverage and accuracy of the induced semantic classes. On 2,718 random eligibility criteria sentences, the inter-rater classification agreement was 85.73%. In a 10-fold validation test, the average Precision, Recall and F-score of the classification results of a decision-tree classifier were 87.8%, 88.0%, and 87.7% respectively. Our induced classes well aligned with 16 out of 17 eligibility criteria classes defined by the BRIDGE model. We discuss the potential of this method and our future work.

Author List

Luo Z, Johnson SB, Weng C

Author

Jake Luo Ph.D. Associate Professor; Director, Center for Biomedical Data and Language Processing (BioDLP) in the Health Informatics & Administration department at University of Wisconsin - Milwaukee




MESH terms used to index this publication - Major topics in bold

Biomedical Research
Cluster Analysis
Humans
Models, Theoretical
Natural Language Processing
Semantics
Unified Medical Language System