论文标题
DISPARSE:多任务模型压缩的解剖稀疏
DiSparse: Disentangled Sparsification for Multitask Model Compression
论文作者
论文摘要
尽管模型压缩和多任务学习的流行程度,但由于参数空间中任务的挑战性纠缠,如何有效地压缩多任务模型的分析程度不太彻底。在本文中,我们提出了一种简单,有效和首先的多任务修剪和稀疏培训计划。我们通过解开重要性测量值并在执行参数修剪和选择时独立考虑每个任务。我们的实验结果表明,与流行的稀疏训练和修剪方法相比,各种配置和设置的性能都出色。除了压缩的有效性外,Disparse还为多任务学习社区提供了强大的工具。令人惊讶的是,尽管在多种情况下,我们甚至观察到比某些专用的多任务学习方法更好的性能,尽管DISPARSE实现了较高的模型稀疏性。我们分析了用拆卸生成的修剪口罩,并在训练开始之前就观察到了每个任务都标识的非常相似的稀疏网络体系结构。我们还观察到存在一个“分水岭”层的存在,其中与任务相关性急剧下降,这意味着持续参数共享没有任何好处。我们的代码和模型将在以下网址提供:https://github.com/shi-labs/disparse-multitask-model-compression。
Despite the popularity of Model Compression and Multitask Learning, how to effectively compress a multitask model has been less thoroughly analyzed due to the challenging entanglement of tasks in the parameter space. In this paper, we propose DiSparse, a simple, effective, and first-of-its-kind multitask pruning and sparse training scheme. We consider each task independently by disentangling the importance measurement and take the unanimous decisions among all tasks when performing parameter pruning and selection. Our experimental results demonstrate superior performance on various configurations and settings compared to popular sparse training and pruning methods. Besides the effectiveness in compression, DiSparse also provides a powerful tool to the multitask learning community. Surprisingly, we even observed better performance than some dedicated multitask learning methods in several cases despite the high model sparsity enforced by DiSparse. We analyzed the pruning masks generated with DiSparse and observed strikingly similar sparse network architecture identified by each task even before the training starts. We also observe the existence of a "watershed" layer where the task relatedness sharply drops, implying no benefits in continued parameters sharing. Our code and models will be available at: https://github.com/SHI-Labs/DiSparse-Multitask-Model-Compression.
