论文标题
长矛:基于图形处理单元的神经网络的有效低精度量化Winograd卷积
LANCE: Efficient Low-Precision Quantized Winograd Convolution for Neural Networks Based on Graphics Processing Units
论文作者
论文摘要
加速深层卷积神经网络已成为一个积极的话题,并引起了对学术界和工业的兴趣。在本文中,我们提出了一种有效的低精度量化Winograd卷积算法,称为Lance,该算法结合了快速卷积和量化技术的优势。通过将线性量化操作嵌入到Winograd域中,可以在图形处理单元上有效地进行快速卷积。我们在包括SVHN,CIFAR和Imagenet在内的代表性图像分类数据集上测试具有LANE的神经网络模型。实验结果表明,我们的8位量化Winograd卷积在完全精确的精度损失的情况下,在全精度卷积上提高了2.40倍的性能。
Accelerating deep convolutional neural networks has become an active topic and sparked an interest in academia and industry. In this paper, we propose an efficient low-precision quantized Winograd convolution algorithm, called LANCE, which combines the advantages of fast convolution and quantization techniques. By embedding linear quantization operations into the Winograd-domain, the fast convolution can be performed efficiently under low-precision computation on graphics processing units. We test neural network models with LANCE on representative image classification datasets, including SVHN, CIFAR, and ImageNet. The experimental results show that our 8-bit quantized Winograd convolution improves the performance by up to 2.40x over the full-precision convolution with trivial accuracy loss.
