Medical College of Wisconsin
CTSICores SearchResearch InformaticsREDCap

Clinical concept and relation extraction using prompt-based machine reading comprehension. J Am Med Inform Assoc 2023 Aug 18;30(9):1486-1493

Date

06/15/2023

Pubmed ID

37316988

Pubmed Central ID

PMC10436141

DOI

10.1093/jamia/ocad107

Scopus ID

2-s2.0-85168247821 (requires institutional sign-in at Scopus site)   2 Citations

Abstract

OBJECTIVE: To develop a natural language processing system that solves both clinical concept extraction and relation extraction in a unified prompt-based machine reading comprehension (MRC) architecture with good generalizability for cross-institution applications.

METHODS: We formulate both clinical concept extraction and relation extraction using a unified prompt-based MRC architecture and explore state-of-the-art transformer models. We compare our MRC models with existing deep learning models for concept extraction and end-to-end relation extraction using 2 benchmark datasets developed by the 2018 National NLP Clinical Challenges (n2c2) challenge (medications and adverse drug events) and the 2022 n2c2 challenge (relations of social determinants of health [SDoH]). We also evaluate the transfer learning ability of the proposed MRC models in a cross-institution setting. We perform error analyses and examine how different prompting strategies affect the performance of MRC models.

RESULTS AND CONCLUSION: The proposed MRC models achieve state-of-the-art performance for clinical concept and relation extraction on the 2 benchmark datasets, outperforming previous non-MRC transformer models. GatorTron-MRC achieves the best strict and lenient F1-scores for concept extraction, outperforming previous deep learning models on the 2 datasets by 1%-3% and 0.7%-1.3%, respectively. For end-to-end relation extraction, GatorTron-MRC and BERT-MIMIC-MRC achieve the best F1-scores, outperforming previous deep learning models by 0.9%-2.4% and 10%-11%, respectively. For cross-institution evaluation, GatorTron-MRC outperforms traditional GatorTron by 6.4% and 16% for the 2 datasets, respectively. The proposed method is better at handling nested/overlapped concepts, extracting relations, and has good portability for cross-institute applications. Our clinical MRC package is publicly available at https://github.com/uf-hobi-informatics-lab/ClinicalTransformerMRC.

Author List

Peng C, Yang X, Yu Z, Bian J, Hogan WR, Wu Y

Author

William R. Hogan MD Director, Professor in the Data Science Institute department at Medical College of Wisconsin




MESH terms used to index this publication - Major topics in bold

Comprehension
Drug-Related Side Effects and Adverse Reactions
Humans
Natural Language Processing