Medical College of Wisconsin
CTSICores SearchResearch InformaticsREDCap

SimQ: real-time retrieval of similar consumer health questions. J Med Internet Res 2015 Feb 17;17(2):e43



Pubmed ID


Pubmed Central ID




Scopus ID

2-s2.0-84925611779   10 Citations


BACKGROUND: There has been a significant increase in the popularity of Web-based question-and-answer (Q&A) services that provide health care information for consumers. Large amounts of Q&As have been archived in these online communities, which form a valuable knowledge base for consumers who seek answers to their health care concerns. However, due to consumers' possible lack of professional knowledge, it is still very challenging for them to find Q&As that are closely relevant to their own health problems. Consumers often repeatedly ask similar questions that have already been answered previously by other users.

OBJECTIVE: In this study, we aim to develop efficient informatics methods that can retrieve similar Web-based consumer health questions using syntactic and semantic analysis.

METHODS: We propose the "SimQ" to achieve this objective. SimQ is an informatics framework that compares the similarity of archived health questions and retrieves answers to satisfy consumers' information needs. Statistical syntactic parsing was used to analyze each question's syntactic structure. Standardized Unified Medical Language System (UMLS) was employed to annotate semantic types and extract medical concepts. Finally, the similarity between sentences was calculated using both semantic and syntactic features.

RESULTS: We used 2000 randomly selected consumer questions to evaluate the system's performance. The results show that SimQ reached the highest precision of 72.2%, recall of 78.0%, and F-score of 75.0% when using compositional feature representations.

CONCLUSIONS: We demonstrated that SimQ complements the existing Q&A services of Netwellness, a not-for-profit community-based consumer health information service that consists of nearly 70,000 Q&As and serves over 3 million users each year. SimQ not only reduces response delay by instantly providing closely related questions and answers, but also helps consumers to improve the understanding of their health concerns.

Author List

Luo J, Zhang GQ, Wentz S, Cui L, Xu R


Jake Luo Ph.D. Associate Professor; Director, Center for Biomedical Data and Language Processing (BioDLP) in the Health Informatics & Administration department at University of Wisconsin - Milwaukee

MESH terms used to index this publication - Major topics in bold

Consumer Health Information
Information Storage and Retrieval
Surveys and Questionnaires