Medical College of Wisconsin
CTSICores SearchResearch InformaticsREDCap

Semi-Automatically Inducing Semantic Classes of Clinical Research Eligibility Criteria Using UMLS and Hierarchical Clustering. AMIA Annu Symp Proc 2010 Nov 13;2010:487-91



Pubmed ID


Pubmed Central ID


Scopus ID

2-s2.0-84964928019   20 Citations


This paper presents a novel approach to learning semantic classes of clinical research eligibility criteria. It uses the UMLS Semantic Types to represent semantic features and the Hierarchical Clustering method to group similar eligibility criteria. By establishing a gold standard using two independent raters, we evaluated the coverage and accuracy of the induced semantic classes. On 2,718 random eligibility criteria sentences, the inter-rater classification agreement was 85.73%. In a 10-fold validation test, the average Precision, Recall and F-score of the classification results of a decision-tree classifier were 87.8%, 88.0%, and 87.7% respectively. Our induced classes well aligned with 16 out of 17 eligibility criteria classes defined by the BRIDGE model. We discuss the potential of this method and our future work.

Author List

Luo Z, Johnson SB, Weng C


Jake Luo Ph.D. Associate Professor; Director, Center for Biomedical Data and Language Processing (BioDLP) in the Health Informatics & Administration department at University of Wisconsin - Milwaukee

MESH terms used to index this publication - Major topics in bold

Biomedical Research
Cluster Analysis
Models, Theoretical
Natural Language Processing
Unified Medical Language System