论文标题

辩护面具:NLU中缓解快捷方式的新框架

Debiasing Masks: A New Framework for Shortcut Mitigation in NLU

论文作者

Meissner, Johannes Mario, Sugawara, Saku, Aizawa, Akiko

论文摘要

自然语言理解任务中有害行为的语言模型是一个主题,对NLP社区的兴趣迅速增加。数据中的虚假统计相关性使模型可以执行快捷方式,并避免发现更先进和理想的语言特征。已经提出了多种有效的伪造方法,但灵活性仍然是一个主要问题。在大多数情况下,必须对模型进行重新培训,以找到具有依据行为的新权重。我们提出了一种新的偏见方法,其中我们可以确定可用于填充模型的伪造的修剪口罩。这使得辩护行为的选择性和条件应用。我们假设偏差是由网络中一定的重量子集引起的。从本质上讲,我们的方法是识别和消除偏见的权重。我们的口罩表现出与标准对应物相同或出色的表现,同时提供重要的好处。修剪口罩可以以高效率存储在内存中,并且可以在推理时在几种伪造行为(或还原回原始偏见的模型)之间切换。最后,它为通过研究产生的口罩如何获得偏见的进一步研究打开了大门。例如,我们观察到,早期的层和注意力头被更加积极地修剪,可能暗示了可能编码偏见的位置。

Debiasing language models from unwanted behaviors in Natural Language Understanding tasks is a topic with rapidly increasing interest in the NLP community. Spurious statistical correlations in the data allow models to perform shortcuts and avoid uncovering more advanced and desirable linguistic features. A multitude of effective debiasing approaches has been proposed, but flexibility remains a major issue. For the most part, models must be retrained to find a new set of weights with debiased behavior. We propose a new debiasing method in which we identify debiased pruning masks that can be applied to a finetuned model. This enables the selective and conditional application of debiasing behaviors. We assume that bias is caused by a certain subset of weights in the network; our method is, in essence, a mask search to identify and remove biased weights. Our masks show equivalent or superior performance to the standard counterparts, while offering important benefits. Pruning masks can be stored with high efficiency in memory, and it becomes possible to switch among several debiasing behaviors (or revert back to the original biased model) at inference time. Finally, it opens the doors to further research on how biases are acquired by studying the generated masks. For example, we observed that the early layers and attention heads were pruned more aggressively, possibly hinting towards the location in which biases may be encoded.

扫码加入交流群

加入微信交流群

微信交流群二维码

发送 求 20221016079 免费下载英文原文