论文标题
共同点:针对特洛伊木马攻击神经网络的内容的焦点保护
ConFoc: Content-Focus Protection Against Trojan Attacks on Neural Networks
论文作者
论文摘要
深度神经网络(DNN)已成功地应用于计算机视觉。但是,它们在与图像相关的应用程序中的广泛采用受到对特洛伊木马攻击的脆弱性的威胁。这些攻击使用带有标记或触发器的样本在训练时插入一些不当行为,在推理或测试时间被利用。在这项工作中,我们分析了DNN在培训中学到的功能的组成。我们确定它们,包括与插入的触发器有关的内容,既包含内容(语义信息)和样式(纹理信息),这些内容在测试时由DNN识别为整体。然后,我们提出了一种针对特洛伊木马攻击的新型防御技术,其中教导了DNN,以无视投入的样式,并专注于其内容,只是为了减轻分类过程中触发器的效果。该方法的通用适用性在交通标志和面部识别应用程序的背景下证明。他们每个人都会与各种触发器接触不同的攻击。结果表明,该方法在所有测试的攻击中都显着降低了攻击成功率,同时保持良性和对抗性数据时,并提高了模型的初始准确性。
Deep Neural Networks (DNNs) have been applied successfully in computer vision. However, their wide adoption in image-related applications is threatened by their vulnerability to trojan attacks. These attacks insert some misbehavior at training using samples with a mark or trigger, which is exploited at inference or testing time. In this work, we analyze the composition of the features learned by DNNs at training. We identify that they, including those related to the inserted triggers, contain both content (semantic information) and style (texture information), which are recognized as a whole by DNNs at testing time. We then propose a novel defensive technique against trojan attacks, in which DNNs are taught to disregard the styles of inputs and focus on their content only to mitigate the effect of triggers during the classification. The generic applicability of the approach is demonstrated in the context of a traffic sign and a face recognition application. Each of them is exposed to a different attack with a variety of triggers. Results show that the method reduces the attack success rate significantly to values < 1% in all the tested attacks while keeping as well as improving the initial accuracy of the models when processing both benign and adversarial data.
