Publication:
Facial Action Unit Detection with ViT and Perceiver Using Landmark Patches

No Thumbnail Available

Date

2021

Journal Title

Journal ISSN

Volume Title

Publisher

Institute of Electrical and Electronics Engineers Inc.

Research Projects

Organizational Units

Journal Issue

Abstract

The expressions of the human face are defined by the contraction of facial muscles. The most widely used and accepted standard that provides the description of all visual changes on the face is the Facial Action Coding System (FACS). In this paper, Vision Transformer (ViT) and Perceiver attention mechanisms are individually employed to detect Action Units (AUs) from the whole face on two spontaneous datasets (DISFA, BP4D) and one in-the-wild dataset (EmotioNet) with different patch sizes, then experimented the same attention mechanisms using patches cropped around facial landmarks to examine the improvements on AU detection. The experiments show that ViT and Perceiver attention mechanisms reach, and most of the time outperform, state-of-the-art methods on AU detection on the first set of experiments. However, the most significant performance increase is observed when using only landmark patches as the input sequence to both networks. © 2022 Elsevier B.V., All rights reserved.

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By