摘 要: 针对DataFountain平台举办竞赛所提供的剧本角色情感数据集,采用中文分词、去停用词和绘制词云图等工具对数据进行预处理,利用词频-逆向文档频率(TF-IDF)算法提取文本特征,分别建立了基于支持向量机和朴素贝叶斯算法的机器学习分类识别模型。将建立的新模型应用于剧本角色情感的识别和分析研究,结果表明,朴素贝叶斯分类模型的识别效果要优于支持向量机分类模型;并且,当拉普拉斯平滑系数α = 0.2时,朴素贝叶斯算法的分类准确率接近于80%。 |
关键词: 剧本角色;支持向量机;朴素贝叶斯;情感识别 |
中图分类号: TP181
文献标识码: A
|
基金项目: 广东省重点建设学科科研能力提升项目(2021ZDJS080);惠州市哲学与社会科学基金项目(2022ZX046);惠州学院教学质量工程项目(XJYJG2021045);惠州学院“百名优秀青年教师”培养项目. |
|
Research of Screenplay Characters Emotion Recognition based on Machine Learning |
CAI Xiaoyu, QIU Meilan, LI Dewang
|
(School of Mathematics and Statistics, Huizhou University, Huizhou 516007, China)
910940859@qq.com; qiumeilan@hzu.edu.cn; ldwldw1976@126.com
|
Abstract: Based on the emotion datasets of the script characters provided by the DataFountain platform competition, this paper proposes to preprocess the data by using tools such as Chinese word segmentation, removing stop words and drawing word clouds, and text features are extracted by using the Term Frequency–Inverse Document Frequency (TF-IDF) algorithm. Then, machine learning classification and recognition models based on Support Vector Machin (SVM) and Naive Bayes algorithm are established respectively. The two proposed models are applied to the recognition and analysis of script character emotion. The results show that the recognition effect of Naive Bayesian classification model is better than that of SVM classification model. In addition, when the Laplacian smoothing coefficient α = 0.2 , the classification accuracy of Naive Bayes algorithm is close to 80%. |
Keywords: screenplay characters; SVM; Naive Bayes; emotion recognition |