摘 要: IG算法是一种有效的特征选择算法,在文本分类研究领域中得到了广泛应用。本文针对IG算法的不足, 提出了一种基于词频信息的改进方法,分别从类内词频信息、类内词频位置分布、类间词频信息等方面进行了改进。通 过实验对改进的算法进行了测试,结果表明,改进的算法相对传统算法更有效。 |
关键词: 词频信息;IG算法;特征选择;文本分类 |
中图分类号: TP391.1
文献标识码: A
|
基金项目: 江苏省高等学校大学生创新创业训练计划(项目编号:201712684020T). |
|
Research on the Application of the IG Feature Selection Algorithm Based on Word Frequency Information Improvement in Text Classification |
NIU Yuxia
|
( Nantong Science and Technology Academy, Nantong 226007, China)
|
Abstract: As an effective feature selection algorithm,the IG algorithm has been widely used in the field of text classification.Aiming at the shortcomings of the IG algorithm,this paper proposes an improved method based on word frequency information,which improves the intra-class frequency information,the intra-class word frequency location distribution and the inter-class word frequency information.Experiments are carried out to test the improved algorithm,and the results show that the improved algorithm is more effective in comparison with the traditional one. |
Keywords: word frequency information;IG algorithm;feature selection;text classification |