论文标题
生物医学知识图嵌入的基准和最佳实践
Benchmark and Best Practices for Biomedical Knowledge Graph Embeddings
论文作者
论文摘要
大部分生物医学和医疗保健数据都是以离散的符号形式(例如文本和医疗法规)编码的。在知识库和本体论中存储了大量专家策划的生物医学领域知识,但是缺乏可靠的学习知识表示方法限制了它们在机器学习应用中的有用性。尽管近年来通过自然语言处理的进步近年来,基于文本的表示学习已经显着改善,但到目前为止缺乏学习生物医学概念嵌入的尝试。最近的一个名为“知识图嵌入”的模型家族在一般领域知识图上显示了有希望的结果,我们探索了它们在生物医学领域中的能力。我们在SNOMED-CT知识图上训练几个最先进的知识图嵌入模型,提供了与现有方法进行比较的基准和关于最佳实践的深入讨论的基准,并为利用知识图的多元相关性质的重要性提供了理由。嵌入,代码和材料将提供给社区。
Much of biomedical and healthcare data is encoded in discrete, symbolic form such as text and medical codes. There is a wealth of expert-curated biomedical domain knowledge stored in knowledge bases and ontologies, but the lack of reliable methods for learning knowledge representation has limited their usefulness in machine learning applications. While text-based representation learning has significantly improved in recent years through advances in natural language processing, attempts to learn biomedical concept embeddings so far have been lacking. A recent family of models called knowledge graph embeddings have shown promising results on general domain knowledge graphs, and we explore their capabilities in the biomedical domain. We train several state-of-the-art knowledge graph embedding models on the SNOMED-CT knowledge graph, provide a benchmark with comparison to existing methods and in-depth discussion on best practices, and make a case for the importance of leveraging the multi-relational nature of knowledge graphs for learning biomedical knowledge representation. The embeddings, code, and materials will be made available to the communitY.
