摘 要: 随着情感分析需求的日益增长,面部动作单元(ActionUnit,AU)识别作为情感计算的基础任务备受关注。尽管深度神经网络在 AU识别方面取得一定的进展,但是其依赖大规模、精确标注的数据集。然而,数据标注过程耗时、成本高且易出错,限制了AU识别性能。近年来,CLIP模型在下游任务中表现出优异的识别和泛化能力。针对 AU识别中标注数据稀缺的难题,提出一种基于CLIP和多模态掩码提示学习的 AU识别方法。通过设计多模态共享的 AU提示(AU-prompt)和注意力掩码,结合局部细节和全局特征,实现了更有效的 AU识别。实验结果表明,在BP4D和DISFA数据集上,该方法获得的F1均值分别为63.2%和64.6%,证明了模型的有效性。 |
关键词: 情感计算 面部动作单元 CLIP 提示学习 注意力掩码 |
中图分类号: TP391.1
文献标识码: A
|
|
Facial Action Unit Recognition Based on CLIP and Multimodal Masked Prompt Learning |
TANG Pei, LI Jian, CHEN Haifeng, SHI Zhan, WANG Haomiao
|
(School of Electronic Information and Artificial Intelligence, Shaanxi University of Science & Technology, Xi’an 710021, China)
221611058@sust.edu.cn; lijianjsj@sust.edu.cn; chenhaifeng@sust.edu.cn; 221612161@sust.edu.cn; 231611020@sust.edu.cn
|
Abstract: With the growing demand for affective analysis, facial Action Unit (AU) recognition has gained significant attention as a fundamental task in affective computing. Although deep neural networks have advanced AU recognition, they heavily rely on large-scale accurately annotated datasets. The time-consuming, costly, and erro-r prone annotation process limits AU recognition performance. In recent years, the CLIP model has demonstrated exceptional recognition and generalization capabilities in downstream tasks. To address the scarcity of annotated data, this paper proposes an AU recognition method based on CLIP and multimodal masked prompt learning. By designing multimoda-l shared AU prompts (AU-prompts) and attention masks, the approach integrates local details with global features,achieving more effective AU recognition. Experimental results on BP4D and DISFA datasets show average F1-scores of 63.2% and 64.6% ,respectively, validating the model’s effectiveness. |
Keywords: affective computing facial action unit CLIP prompt learning attention mask |