摘 要: 医学数据的类重叠问题会严重影响疾病的智能诊断效果。为了减轻腰椎间盘样本的类重叠对分类器 产生的不良影响,提出了一种可减轻类重叠的混合采样算法———CO_HS算法。该算法将训练样本划分为核心样 本、边界样本和噪声样本,对重叠区域的样本进行采样,以减轻样本集的类重叠程度。采用CO_HS算法产生的新训 练样本集训练RF等分类模型,并建立了6种新的腰椎间盘退变分类器。实验结果显示,建立的新分类器在多项性 能指标上均实现了显著提升,其中准确度提升了7.8百分点~12.7百分点,kappa系数提升了11.6百分点~20.2百 分点,敏感性提升了7.9百分点~16.8百分点,特异性提升了9.0百分点~18.2百分点,F指标提升了9.4百分点~ 18.4百分点。因此,CO_HS算法被证明是一种能有效解决样本类重叠问题、改善分类性能的高效方法。 |
关键词: 智能医学;类重叠;混合采样;腰椎间盘退变 |
中图分类号: TP181;R604
文献标识码: A
|
基金项目: 福建省区域发展项目(2019Y3007) |
|
Sampling Algorithm for Reducing Class Overlap in Lumbar Disc Samples |
ZHAO Xinxin1, WU Xiaofeng1,2
|
(1.School of Mathematics and Statistics, Minnan Normal University, Zhangzhou 363000, China; 2.School of Mathematics and Computer Science, Quanzhou Normal University, Quanzhou 362000, China)
zhao_xx2021@163.com; mathwxf@sina.com
|
Abstract: The class overlap problem in medical data can severely affect the performance of intelligent disease diagnosis. To mitigate the negative impact of class overlap in lumbar disc samples on classifiers, this paper proposes a CO_HS algorithm, a hybrid sampling algorithm to reduce class overlap. This algorithm divides the training samples into core samples, boundary samples, and noise samples, sampling from the overlapping region to reduce the degree of class overlap in the dataset. New training samples generated by the CO_HS algorithm are used to train classification models such as Random Forest (RF), resulting in the establishment of six new classifiers for lumbar disc degeneration. Experimental results indicate that the newly established classifiers show significant improvement across multiple performance metrics. Specifically, the accuracy has increased by 7.8 percentage points to 12.7 percentage points, the kappa coefficient has increased by 11.6 percentage points to 20.2 percentage points, sensitivity has been improved by 7.9 percentage points to 16.8 percentage points, specificity has been elevated by 9.0 percentage points to 18.2 percentage points, and the F-measure has been boosted by 9.4 percentage points to 18.4 percentage points. Therefore, the CO_HS algorithm is proven to be an effective method for addressing the class overlap issue and improving classification performance. |
Keywords: intelligent medicine; class overlap; hybrid sampling; lumbar disc degeneration |