摘 要: 针对传统的多模态融合方法在抑郁症检测中忽略了模态之间的交互性、未能充分提取出更全面的特征表示的问题,本研究提出一种基于多模态特征增强网络的抑郁症检测方法,该方法有效地集成了视频、音频和远程光电容积脉搏(photoplethysmographic,rPPG)信号3种模态,通过模态间Transformer、模态内Transformer和多头自注意力机制,共同学习输入模态序列每个时间步的模态内和模态间的动态关系,达到了特征增强的目的。最终,拼接3个模态增强后的特征获得全面特征表示。在AVEC2013公共数据集上的实验结果显示,该方法的平均绝对误差为7.07,优于单模态抑郁症检测,表明该方法有效促进了模态之间的交互,并实现了特征增强,在自动抑郁症检测任务中展现出显著的有效性。 |
关键词: 多模态;深度学习;抑郁症检测;卷积神经网络;特征增强;多模态融合 |
中图分类号: TP391.41
文献标识码: A
|
基金项目: 国家自然科学基金项目(62276180) |
|
A Depression Detection Method Based on Multimodal Feature Enhancement Network |
ZHAO Xiaoming1,2, FAN Huiting1, ZHANG Shiqing2
|
(1.School of In f ormation Science and Engineering, Zhejiang Sci-Tech University, Hangzhou 310018, China; 2.Institute of Intelligent In f ormation Processing, Taizhou University, Taizhou 318000, China)
tzxyzxm@163.com; courage_f@163.com; tzczsq@163.com
|
Abstract: Traditional multimodal fusion methods tend to overlook the interactivity between modalities and fail to extract comprehensive feature representations in depression detection. To address these problems, this paper proposes a depression detection method based on a multimodal feature enhancement network, which effectively integrates three modalities: video, audio, and remote photoplethysmo graphy (rPPG) signals. By employing inter-modal Transformers, intra-modal Transformers, and a multi-head self-attention mechanism, the method learns the dynamic relationships both within and between modalities for each time step of the input modality sequence, achieving feature enhancement. Ultimately, the enhanced features from the three modalities are concatenated to obtain a comprehensive feature representation. Experimental results on the AVEC2013 public data set indicate that the proposed method achieves an average absolute error of 7.07, outperforming traditional unimodal depression detection methods. This demonstrates that the proposed method effectively facilitates interaction between modalities and enhances features, showing significant effectiveness in automated depression detection tasks. |
Keywords: multimodal; deep learning; depression detection; convolutional neural network; feature enhancement;multimodal fusion |