摘 要: 商品通常包含多个属性维度,准确找到商品评论中涉及的属性维度是文本挖掘工作的基础。RAKEL算 法是多标签分类中问题转换思路的一种实现。在以往的工作中,由于子标签集合的随机性,没有充分发现和考虑标签之 间的相关性,导致分类精度不高。为此,提出了改进的FI-RAKEL算法。首先通过FP-Growth算法得到标签的频繁项 集,再从频繁项集和原始标签集合中选择标签构成新的标签子集,以此充分利用标签相关性训练基分类器。实验证明, 改进的FI-RAKEL算法具有更好的评论文本多标签分类性能。 |
关键词: 多标签分类;RAKEL;频繁项集;标签相关性 |
中图分类号: TP391
文献标识码: A
|
基金项目: 本文受the National Key R&D Program of China under grant(2018YFB1004700)资助. |
|
Research and Implementation of RAKEL Algorithm Based Multi-Label Classification for Online Commodity Reviews |
LIANG Ruibo,WANG Siyuan,LI Zhuang,LIU Yasong
|
( School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China)
|
Abstract: Generally,there are multiple attribute-dimensions to describe a commodity.It is the foundation of text mining to accurately find the attribute-dimensions involved in commodity reviews.The Random K-Labelsets (RAKEL) is an accomplishment of problem transformation in multi-label classification.However,due to the randomness of sub-labelset and the lack of investigating into the relationship among labels,the classification accuracy of RAKEL is not high.Hence,an improved RAKEL algorithm (FI-RAKEL) is proposed.Firstly,the item-frequency sets of labels are obtained through the FPGrowth algorithm.Then,labels are selected from the item-frequency sets and the original label set respectively to generate a new k-labelset and it is used to train the corresponding classifier based on correlation among labels.The experiment result shows that the proposed FI-RAKEL algorithm brings higher classification accuracy for multiple-labeled reviews. |
Keywords: multi-label classification;RAKEL;item-frequency set;label correlation |