摘 要: 中医临床记录的病症内容是中医医师进行诊断的重要依据。由于中文表达形式的多样性与复杂性,如何 从这些病症内容中进行标准化四诊信息的提取对于中医证候分析具有重要的研究价值。本文在充分分析各种中文分词算 法的基础上,选择将最大正向匹配分词算法应用于中医临床病症内容中的四诊信息语义理解,构建的中医四诊语义模型 在100个实际病例的四诊信息提取,再对最大分词数进行变量控制,得出最大分词数为5时得出的准确率和召回率最高。 |
关键词: 中文分词;证候分析;四诊信息 |
中图分类号: TP311
文献标识码: A
|
|
Research and Application of Chinese Word Segmentation Model in Semantic Understanding of TCM Diseases |
XU Lintao,YE Xinxin,PEI Chengfei,WU Rongshi1,2,3,4
|
1.( Anhui University of Science & Technology, Huainan 232000, China) 1194663015@qq.com;2.xxye999@163.com;3.1138664088@qq.com;4.Rongshi_Wu@163.com
|
Abstract: TCM clinical record of the disease content is an essential basis for the diagnosis of TCM physicians.Due to the diversity and complexity of Chinese expressions,how to extract standardized four-diagnosis information from the contents of these conditions has important research value for TCM syndrome analysis.Based on the full analysis of various Chinese word segmentation algorithms,this paper chooses to apply the maximum forward matching word segmentation algorithm to the semantic interpretation of the four-diagnosis information in the clinical symptoms of traditional Chinese medicine.This research conducts the extraction of four-diagnosis information of 100 actual cases based on the constructed traditional Chinese medicine four-diagnosis information diagnostic model. Then the variable control is performed on the maximum number of word segmentation,and the high accuracy and recall rate are obtained when the maximum number of word segmentation is five. |
Keywords: chinese word segmentation;syndrome analysis;four consultation information |