Wyse, Ransom J and Samuels, David C and Sanchez-Roige, Sandra and Schirle, Lori and Rhoten, Bethany A and Lee, Seo Yoon and Jeffery, Alvin D (2026) Natural language processing for substance use disorder information extraction: a systematic literature review. Current Addiction Reports, 13, 34. https://doi.org/10.1007/s40429-026-00733-3.
External website: https://link.springer.com/article/10.1007/s40429-0...
PURPOSE OF REVIEW: To examine the use of natural language processing (NLP) for substance use disorder (SUD) information extraction.
RECENT FINDINGS: 623 studies were reviewed, of which 35 met inclusion criteria. 1 paper (2.9%) was alcohol-related, 12 (34.3%) were opioid-related, 6 (17.1%) were tobacco-related, and 16 (45.7%) included multiple SUDs. Of the three types of NLP categorized for this analysis, 65.7% followed a Rule-Based approach, 37.1% followed a Machine-Learning approach, and 11.4% followed a Deep-Learning approach. NLP methods were categorized into three groups, with 43% as "Most common use" (e.g., concept extraction), 20-35% as "Regular use" (e.g., regular expressions), and < 10% as "Rare use" (e.g., sentiment analysis). Various software applications were used in each included paper, with Python leading (10 papers), followed by cTAKES (9 papers), NegEx (6 papers), R (4 papers) and others. Multiple evaluation metrics were used in each included paper; Multiple SUDs (6 papers) utilized a comparison of F1 scores and ROC AUC, followed by Tobacco (4 papers), Opioids (3 papers), and Alcohol (1 paper), each with acceptable-to-outstanding ROC AUC scores ( > = 0.7) and good-to-excellent F1 scores ( > = 0.7).
SUMMARY: Most papers included in this systematic review encompassed multiple SUDs following Rule-Based approaches, "Most common use" NLP methods (e.g. concept extraction), and familiar software applications (e.g. Python). Evaluation metrics for SUD papers utilizing NLP included common performance metrics, with ROC AUC and F1 scores achieving acceptable-to-outstanding discrimination between classes and good-to-excellent balance between precision and recall, respectively. The future direction of NLP for SUD information extraction could make use of Machine- or Deep-Learning approaches, advanced methods including Regular expressions or Sentiment analysis, and/or advanced software packages designed specifically for NLP endeavors, to better inform public health research and clinical decision making.
Repository Staff Only: item control page