摘 要: 本文基于强类别特征识别算法,研究一种文本语义相似度的计算算法并对其性能进行评估。为实现该功 能并形成一种通用算法,本文设计了一种基于语义识别码的语义函数库作为比较对象,使用两次模糊神经元深度卷积机 器学习算法模块,并在两次机器学习之间使用一次基于傅立叶变换的频域特征提取的刚性算法,最终在该算法模块前后 使用外置的数据模糊算法和解模糊算法,实现了一个较复杂的机器学习通用算法。而该算法也是本文的一次技术创新。 通过基于志愿者主观评价的性能评估,发现该系统重点实现了汉语言的文本语义相似度评价,且实现了81.78%的人工 判断准确率对比结果,且只有5.52%的志愿者认为系统判断结果与人工判断结果完全不一致。 |
关键词: 强类别特征算法;机器学习;文本相似度;语义识别;性能评估 |
中图分类号: TP309
文献标识码: A
|
|
Text Similarity Calculation and Performance Evaluation based on Strong Category Features |
LIU Hui
|
( Information Of ce, University of Shanghai for Science and Technology, Shanghai 200093, China )
liu_hui@usst.edu.cn
|
Abstract: This paper studies the algorithm of text semantic similarity calculation and its performance evaluation, based on the recognition algorithm of strong category features. In order to realize this function and form a general algorithm, this paper designs a semantic function library based on the semantic identi cation code as the comparison object, uses two fuzzy neuron deep convolution machine learning algorithm modules. Between two machine learning modules, one frequency domain feature extraction rigid algorithm is used based on Fourier transform. Finally, a more complex general algorithm of machine learning is realized by using external data before and after the algorithm module. This algorithm is also a technical innovation. Through the subjective performance evaluation of volunteers, it is found that the system realizes the semantic similarity evaluation of Chinese text, and achieves 81.78% of the compared manual judgment accuracy rate, and only 5.52% of the volunteers think that the results of the system are completely inconsistent with the results of manual judgment. |
Keywords: strong class feature algorithm; machine learning; text similarity; semantic recognition; performance evaluation |