论文标题
句子选择技巧的折衷方案回答
Tradeoffs in Sentence Selection Techniques for Open-Domain Question Answering
论文作者
论文摘要
开放域问题回答(QA)中的当前方法通常采用首先检索相关文档的管道,然后对检索到的文本应用强大的阅读理解(RC)模型。但是,现代RC模型运行既复杂又昂贵,因此修剪检索到的文本空间的技术对于允许这种方法进行扩展至关重要。在本文中,我们专注于采用中间句子选择步骤来解决此问题的方法,并研究此方法的最佳实践。我们描述了两组用于句子选择的模型:基于质量检查的方法,这些方法运行了一个成熟的QA系统,以识别答案候选者和基于检索的模型,这些模型找到了与每个问题特别相关的每个段落的一部分。我们研究了这两种方法中的处理速度和任务性能之间的权衡,并演示了代表两者混合的合奏模块。从开放式和Triviaqa的实验中,我们表明非常轻巧的质量保证模型可以在此任务上做得很好,但是基于检索的模型仍然更快。我们描述了两者之间的平衡,并概括了良好的跨域。
Current methods in open-domain question answering (QA) usually employ a pipeline of first retrieving relevant documents, then applying strong reading comprehension (RC) models to that retrieved text. However, modern RC models are complex and expensive to run, so techniques to prune the space of retrieved text are critical to allow this approach to scale. In this paper, we focus on approaches which apply an intermediate sentence selection step to address this issue, and investigate the best practices for this approach. We describe two groups of models for sentence selection: QA-based approaches, which run a full-fledged QA system to identify answer candidates, and retrieval-based models, which find parts of each passage specifically related to each question. We examine trade-offs between processing speed and task performance in these two approaches, and demonstrate an ensemble module that represents a hybrid of the two. From experiments on Open-SQuAD and TriviaQA, we show that very lightweight QA models can do well at this task, but retrieval-based models are faster still. An ensemble module we describe balances between the two and generalizes well cross-domain.
