论文标题
在评估分布外检测时提高标准
Raising the Bar on the Evaluation of Out-of-Distribution Detection
论文作者
论文摘要
在图像分类中,在检测分布(OOD)数据时发生了许多发展。但是,大多数OOD检测方法是在一组标准数据集上评估的,该数据集与培训数据任意不同。没有明确的定义````好的''ood数据集。此外,最先进的OOD检测方法已经在这些标准基准上取得了几乎完美的结果。在本文中,我们使用与分配的same same od same(iid od same od od od same od od same od same od same od same od same od same od same od same od same od same od od od od od oid aid的概念''定义2类。然后将样本转移为视觉上不同但类似于ID数据的点。然后,我们提出了一个基于GAN的框架,用于从这两个类别中生成OOD样本,给定一个ID数据集,通过对MNIST进行广泛的实验,CIFAR-10/100和ImageNET的大量实验。此外,基准,b)在我们的设置上表现良好的模型在常规的现实世界检测基准上也表现良好,反之亦然,因此表明人们甚至不需要单独的OOD集来可靠地评估OOD检测的性能。
In image classification, a lot of development has happened in detecting out-of-distribution (OoD) data. However, most OoD detection methods are evaluated on a standard set of datasets, arbitrarily different from training data. There is no clear definition of what forms a ``good" OoD dataset. Furthermore, the state-of-the-art OoD detection methods already achieve near perfect results on these standard benchmarks. In this paper, we define 2 categories of OoD data using the subtly different concepts of perceptual/visual and semantic similarity to in-distribution (iD) data. We define Near OoD samples as perceptually similar but semantically different from iD samples, and Shifted samples as points which are visually different but semantically akin to iD data. We then propose a GAN based framework for generating OoD samples from each of these 2 categories, given an iD dataset. Through extensive experiments on MNIST, CIFAR-10/100 and ImageNet, we show that a) state-of-the-art OoD detection methods which perform exceedingly well on conventional benchmarks are significantly less robust to our proposed benchmark. Moreover, b) models performing well on our setup also perform well on conventional real-world OoD detection benchmarks and vice versa, thereby indicating that one might not even need a separate OoD set, to reliably evaluate performance in OoD detection.
