摘 要: 自动人格识别是当前心理学、计算机科学等相关领域的研究热点。为了有效利用视频序列中的时间维度特征和帧注意力特征,提出一种基于Transformer视频序列人格识别方法。首先,通过预训练好的卷积神经网络提取出对应的视觉帧级特征;然后,利用双向长短时记忆网络与Transformer网络对它们分别进行时间信息和帧注意力信息建模,学习高层次的视觉全局特征;最后,通过特征层融合方法将视觉全局特征结合,实现视觉人格识别。在公开人格数据集ChaLearn First Impression V2的实验结果表明,该方法取得了0.9141的大五人格平均分数,能够有效提升视觉人格识别效果。 |
关键词: 人格识别;Transformer;卷积神经网络;双向长短时记忆网络;特征层融合 |
中图分类号: TP391
文献标识码: A
|
基金项目: 国家自然科学基金面上项目(61976149);浙江省自然科学基金重点项目(LZ20F020002). |
|
A Visual Personality Recognition Method based on Transformer Network |
TANG Zhiwei1, ZHANG Shiqing2, ZHAO Xiaoming1,2
|
( 1.School of Faculty of Mechanical Engineering and Automation, Zhejiang Sci -Tech University, Hangzhou 310018, China; 2.Institute of Intelligent Information Processing, Taizhou University, Taizhou 318000, China )
1456792435@qq.com; tzczsq@163.com; tzxyzxm@163.com
|
Abstract: Automatic personality recognition has become a research focus in psychology, computer science and other related fields. This paper proposes a personality recognition method based on Transformer video sequence in order to effectively utilize time dimension feature and frame attention feature in video sequence. Firstly, the corresponding visual frame-level features are extracted by pre-trained convolutional neural network. Then, Bi-LSTM (Bidirectional Long Short Term Memory) network and Transformer network are used to model their time information and frame attention information respectively, and high-level visual global features are learned. Finally, visual global features are combined by feature-level fusion method to realize visual personality recognition. The experimental results on the public personality dataset ChaLearn First Impression V2 show that the proposed method achieves an average score of 0.9141 for the big five personality, which can effectively improve the visual personality recognition effect. |
Keywords: personality recognition; Transformer; convolutional neural network; bidirectional long short term memory network; feature-level fusion |