摘 要: 人体姿态估计是计算机视觉的基础性算法之一,为了探究人体姿态估计领域的研究发展趋势,文章首先介绍了基于卷积的经典人体姿态估计算法,论述各算法的基本原理及算法改进,其次对最新的基于自注意力模型(Transformer)的算法进行梳理,最后介绍了常用的公开数据集和模型评价指标,选取了几个经典算法进行对比分析,平均精度在马克斯·普朗克信息研究所(Max Planck Institute Informatik,MPII)数据集达到80%以上,在微软公共对象上下文(Common Objects in Context,COCO)数据集达到60%以上,得到卷积结构和Transformer结构互有优劣的结论。 |
关键词: 姿态估计;关节点检测;卷积神经网络;Transformer |
中图分类号: TP391.4
文献标识码: A
|
基金项目: 国家重点研发计划子课题(2020YFC2005802). |
|
A Comparative Study of Human Pose Estimation based on Convolution and Transformer |
FENG Jie, ZHENG Jianli
|
(School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China)
fjie666@outlook.com; zhengjianli163@163.com
|
Abstract: Human pose estimation is one of the basic algorithms in computer vision. In order to explore the research and development trend in the field of human pose estimation, this paper first introduces the classic human pose estimation algorithms based on convolution, and discusses the basic principles and algorithm improvements of each algorithm. Then, it reviews the latest algorithms based on the Self Attention Model (Transformer). Finally, it introduces the commonly used public datasets and model evaluation indicators. Several classical algorithms are selected for comparative analysis. The average accuracy is more than 80% in the dataset of Max Planck Institute Informatik (MPII), and more than 60% in the dataset of Microsoft Common Objects in Context (COCO). It is concluded that both convolution structure and Transformer structure have their own advantages and disadvantages. |
Keywords: human pose estimation; joints detection; CNN (Convolutional Neural Networks); Transformer |