论文标题
研究不足的医学概念领域的自动编码:将体育活动报告与国际功能,残疾和健康分类联系起来
Automated Coding of Under-Studied Medical Concept Domains: Linking Physical Activity Reports to the International Classification of Functioning, Disability, and Health
论文作者
论文摘要
将临床叙述与标准化词汇和编码系统联系起来是解锁医学文本中的信息以进行分析的关键组成部分。但是,许多医学概念领域都缺乏可以支持有效编码医学文本的术语。我们提出了开发自然语言处理(NLP)技术的框架,用于自动编码不足的医学信息,并通过案例研究对物理流动性功能证明其适用性。流动性是许多健康措施的组成部分,从急性护理和外科手术结果到慢性脆弱和残疾,并且在功能,残疾和健康(ICF)的国际分类中进行了编码。但是,在医学信息学中,流动性和其他类型的功能活动仍未研究,ICF或常用的医学术语均未捕获实践中的功能状态术语。我们使用物理疗法相遇的临床叙事数据集研究了两个数据驱动的范式,分类和候选选择,以将活动能力的叙述性观察到标准化的ICF代码联系起来。语言建模和单词嵌入的最新进展被用作已建立的机器学习模型和一种新颖的深度学习方法的功能,在将移动性活动报告与ICF代码联系起来时,宏F-1得分达到了84%。分类和候选方法的方法都呈现出未经研究的域中自动编码的独特优势,我们强调(i)(i)一个小注释数据集的组合; (ii)关注守则的专家定义; (iii)代表性的文本语料库足以生成高性能的自动编码系统。这项研究对NLP工具在临床护理和研究中的各种专业应用中的持续增长具有影响。
Linking clinical narratives to standardized vocabularies and coding systems is a key component of unlocking the information in medical text for analysis. However, many domains of medical concepts lack well-developed terminologies that can support effective coding of medical text. We present a framework for developing natural language processing (NLP) technologies for automated coding of under-studied types of medical information, and demonstrate its applicability via a case study on physical mobility function. Mobility is a component of many health measures, from post-acute care and surgical outcomes to chronic frailty and disability, and is coded in the International Classification of Functioning, Disability, and Health (ICF). However, mobility and other types of functional activity remain under-studied in medical informatics, and neither the ICF nor commonly-used medical terminologies capture functional status terminology in practice. We investigated two data-driven paradigms, classification and candidate selection, to link narrative observations of mobility to standardized ICF codes, using a dataset of clinical narratives from physical therapy encounters. Recent advances in language modeling and word embedding were used as features for established machine learning models and a novel deep learning approach, achieving a macro F-1 score of 84% on linking mobility activity reports to ICF codes. Both classification and candidate selection approaches present distinct strengths for automated coding in under-studied domains, and we highlight that the combination of (i) a small annotated data set; (ii) expert definitions of codes of interest; and (iii) a representative text corpus is sufficient to produce high-performing automated coding systems. This study has implications for the ongoing growth of NLP tools for a variety of specialized applications in clinical care and research.
