论文标题
铸造:通过跟踪使用自学的角色在动画中标记
CAST: Character labeling in Animation using Self-supervision by Tracking
论文作者
论文摘要
与真实的图像和视频相比,漫画和动画域视频具有截然不同的特征。此外,该领域的样式具有很大的差异。当前的计算机视觉和深度学习解决方案通常会在动画内容上失败,因为它们接受了自然图像的培训。在本文中,我们提出了一种适合特定动画内容的语义表示的方法。我们首先在大规模的动画视频中训练神经网络,并将映射用作深度功能作为嵌入空间。接下来,我们使用自我求职者使用多对象跟踪来收集以这种样式的动画字符的许多示例来完善任何特定动画样式的表示形式。这些示例用于定义三胞胎进行对比损失训练。精致的语义空间可以更好地聚类动画角色,即使它们具有多种表现。使用此空间,我们可以在动画视频中构建字符的字典,并为特定风格内容(例如,特定动画系列中的字符)定义专门的分类器,而用户则很少。这些分类器是在动画视频中自动标记字符的基础。我们介绍了各种动画风格的字符集合的结果。
Cartoons and animation domain videos have very different characteristics compared to real-life images and videos. In addition, this domain carries a large variability in styles. Current computer vision and deep-learning solutions often fail on animated content because they were trained on natural images. In this paper we present a method to refine a semantic representation suitable for specific animated content. We first train a neural network on a large-scale set of animation videos and use the mapping to deep features as an embedding space. Next, we use self-supervision to refine the representation for any specific animation style by gathering many examples of animated characters in this style, using a multi-object tracking. These examples are used to define triplets for contrastive loss training. The refined semantic space allows better clustering of animated characters even when they have diverse manifestations. Using this space we can build dictionaries of characters in an animation videos, and define specialized classifiers for specific stylistic content (e.g., characters in a specific animation series) with very little user effort. These classifiers are the basis for automatically labeling characters in animation videos. We present results on a collection of characters in a variety of animation styles.
