Medical College of Wisconsin
CTSICores SearchResearch InformaticsREDCap

EliXR: an approach to eligibility criteria extraction and representation. J Am Med Inform Assoc 2011 Dec;18 Suppl 1:i116-24

Date

08/03/2011

Pubmed ID

21807647

Pubmed Central ID

PMC3241167

DOI

10.1136/amiajnl-2011-000321

Scopus ID

2-s2.0-84863557848   76 Citations

Abstract

OBJECTIVE: To develop a semantic representation for clinical research eligibility criteria to automate semistructured information extraction from eligibility criteria text.

MATERIALS AND METHODS: An analysis pipeline called eligibility criteria extraction and representation (EliXR) was developed that integrates syntactic parsing and tree pattern mining to discover common semantic patterns in 1000 eligibility criteria randomly selected from http://ClinicalTrials.gov. The semantic patterns were aggregated and enriched with unified medical language systems semantic knowledge to form a semantic representation for clinical research eligibility criteria.

RESULTS: The authors arrived at 175 semantic patterns, which form 12 semantic role labels connected by their frequent semantic relations in a semantic network.

EVALUATION: Three raters independently annotated all the sentence segments (N=396) for 79 test eligibility criteria using the 12 top-level semantic role labels. Eight-six per cent (339) of the sentence segments were unanimously labelled correctly and 13.8% (55) were correctly labelled by two raters. The Fleiss' κ was 0.88, indicating a nearly perfect interrater agreement.

CONCLUSION: This study present a semi-automated data-driven approach to developing a semantic network that aligns well with the top-level information structure in clinical research eligibility criteria text and demonstrates the feasibility of using the resulting semantic role labels to generate semistructured eligibility criteria with nearly perfect interrater reliability.

Author List

Weng C, Wu X, Luo Z, Boland MR, Theodoratos D, Johnson SB

Author

Jake Luo Ph.D. Associate Professor; Director, Center for Biomedical Data and Language Processing (BioDLP) in the Health Informatics & Administration department at University of Wisconsin - Milwaukee




MESH terms used to index this publication - Major topics in bold

Algorithms
Biomedical Research
Data Mining
Eligibility Determination
Natural Language Processing
Patient Selection
Semantics
Unified Medical Language System