Publication: Facial Action Unit Detection with ViT and Perceiver Using Landmark Patches
No Thumbnail Available
Date
2021
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Institute of Electrical and Electronics Engineers Inc.
Abstract
The expressions of the human face are defined by the contraction of facial muscles. The most widely used and accepted standard that provides the description of all visual changes on the face is the Facial Action Coding System (FACS). In this paper, Vision Transformer (ViT) and Perceiver attention mechanisms are individually employed to detect Action Units (AUs) from the whole face on two spontaneous datasets (DISFA, BP4D) and one in-the-wild dataset (EmotioNet) with different patch sizes, then experimented the same attention mechanisms using patches cropped around facial landmarks to examine the improvements on AU detection. The experiments show that ViT and Perceiver attention mechanisms reach, and most of the time outperform, state-of-the-art methods on AU detection on the first set of experiments. However, the most significant performance increase is observed when using only landmark patches as the input sequence to both networks. © 2022 Elsevier B.V., All rights reserved.
