论文标题
虚拟流:将深度学习模型与基础硬件解耦
VirtualFlow: Decoupling Deep Learning Models from the Underlying Hardware
论文作者
论文摘要
诸如Tensorflow和Pytorch之类的最先进的深度学习系统将模型与基础硬件紧密相结合。此耦合要求用户修改应用程序逻辑,以便在不同的资源集上运行相同的作业,从而限制了给定工作负载的硬件选择,并可能迫使用户放弃更有效的硬件配置。 我们提出了VirtualFlow,该系统利用了一个称为虚拟节点处理的新颖抽象来使模型与硬件解脱。在训练或推理的每个步骤中,输入数据批次跨虚拟节点而不是硬件加速器(例如GPU和TPU)分开。将多个虚拟节点映射到每个加速器并顺序处理批处理,从而使用户可以减少其工作负载的内存需求,并模仿小簇上的大批量。 使用此技术,VirtualFlow可以实现许多新的用例,例如在不同的硬件,资源弹性和异质培训中重现培训结果。在我们的评估中,我们对TensorFlow的虚拟流实施实现了通过开箱即用的超级参数,可在不同的硬件上实现强大的收敛保证,并使用资源弹性降低了工作完成时间,并通过异类培训高达42%的吞吐量。
State-of-the-art deep learning systems such as TensorFlow and PyTorch tightly couple the model with the underlying hardware. This coupling requires the user to modify application logic in order to run the same job across a different set of resources, thereby limiting the choice of hardware for a given workload and potentially forcing the user to forgo more efficient hardware configurations. We propose VirtualFlow, a system leveraging a novel abstraction called virtual node processing to decouple the model from the hardware. In each step of training or inference, the batch of input data is split across virtual nodes instead of hardware accelerators (e.g. GPUs and TPUs). Mapping multiple virtual nodes to each accelerator and processing them sequentially effectively time slices the batch, thereby allowing users to reduce the memory requirement of their workloads and mimic large batch sizes on small clusters. Using this technique, VirtualFlow enables many new use cases, such as reproducing training results across different hardware, resource elasticity, and heterogeneous training. In our evaluation, our implementation of VirtualFlow for TensorFlow achieved strong convergence guarantees across different hardware with out-of-the-box hyperparameters, up to 48% lower job completion times with resource elasticity, and up to 42% higher throughput with heterogeneous training.
