Faculty Collaboration Database

Medical College of Wisconsin

CTSI Cores Search Research Informatics REDCap

Classifying early infant feeding status from clinical notes using natural language processing and machine learning. Sci Rep 2024 Apr 03;14(1):7831

Date

04/04/2024

Scopus ID

2-s2.0-85189809672 (requires institutional sign-in at Scopus site) 2 Citations

Abstract

The objective of this study is to develop and evaluate natural language processing (NLP) and machine learning models to predict infant feeding status from clinical notes in the Epic electronic health records system. The primary outcome was the classification of infant feeding status from clinical notes using Medical Subject Headings (MeSH) terms. Annotation of notes was completed using TeamTat to uniquely classify clinical notes according to infant feeding status. We trained 6 machine learning models to classify infant feeding status: logistic regression, random forest, XGBoost gradient descent, k-nearest neighbors, and support-vector classifier. Model comparison was evaluated based on overall accuracy, precision, recall, and F1 score. Our modeling corpus included an even number of clinical notes that was a balanced sample across each class. We manually reviewed 999 notes that represented 746 mother-infant dyads with a mean gestational age of 38.9 weeks and a mean maternal age of 26.6 years. The most frequent feeding status classification present for this study was exclusive breastfeeding [n = 183 (18.3%)], followed by exclusive formula bottle feeding [n = 146 (14.6%)], and exclusive feeding of expressed mother's milk [n = 102 (10.2%)], with mixed feeding being the least frequent [n = 23 (2.3%)]. Our final analysis evaluated the classification of clinical notes as breast, formula/bottle, and missing. The machine learning models were trained on these three classes after performing balancing and down sampling. The XGBoost model outperformed all others by achieving an accuracy of 90.1%, a macro-averaged precision of 90.3%, a macro-averaged recall of 90.1%, and a macro-averaged F1 score of 90.1%. Our results demonstrate that natural language processing can be applied to clinical notes stored in the electronic health records to classify infant feeding status. Early identification of breastfeeding status using NLP on unstructured electronic health records data can be used to inform precision public health interventions focused on improving lactation support for postpartum patients.

Author List

Lemas DJ, Du X, Rouhizadeh M, Lewis B, Frank S, Wright L, Spirache A, Gonzalez L, Cheves R, Magalhães M, Zapata R, Reddy R, Xu K, Parker L, Harle C, Young B, Louis-Jaques A, Zhang B, Thompson L, Hogan WR, Modave F

Author

William R. Hogan MD Director, Professor in the Data Science Institute department at Medical College of Wisconsin

MESH terms used to index this publication - Major topics in bold

Electronic Health Records
Female
Humans
Infant
Machine Learning
Mothers
Natural Language Processing
Software

© 2025 Clinical & Translational Science Institute
Medical College of Wisconsin
8701 Watertown Plank Road
Milwaukee, WI 53223

Publication and ontology data from NCBI | Disclaimer and Copyright

This site is a collaborative effort of the Medical College of Wisconsin and the Clinical and Translational Science Institute (CTSI), part of the Clinical and Translational Science Award program funded by the National Center for Advancing Translational Sciences (Grant Number 2UL1TR001436) at the National Institutes of Health (NIH).

Profiles for MCW, MU, MSOE, UWM, BCW, CW, Froedtert, and VA Faculty