摘 要: 乳腺癌是世界范围内妇女死亡的主要原因之一,准确的诊断是乳腺癌治疗中最重要的步骤之一。本文详 细讲解了逻辑回归模型的原理知识,结合Sklearn机器学习库的LogisticRegression算法对乳腺癌威斯康辛(诊断)数据集 进行了数据分类。由于该数据集分类标签划分为两类(恶性、良性),能够很好地适用于逻辑回归模型。用基于两个特征 的逻辑回归模型得到的分类结果表明,当选取平均半径和最大周长两个特征时,分类精度最高(95.72%)。与以往的方法 相比,该方法在性能上有所提高。 |
关键词: 乳腺癌数据集;逻辑回归分类算法;预测 |
中图分类号: TP393
文献标识码: A
|
|
Research on Classification of Diagnosis Data of Breast Cancer Based on Logistic Regression Algorithm |
LIU Lei
|
( Dalian Neusoft Information University, Dalian 116023, China)
|
Abstract: Breast cancer is one of the major causes of death for women worldwide,and accurate diagnosis is one of the most important steps in the treatment of breast cancer.This paper explains the knowledge of the logistic regression model in detail,and classifies the data set of breast cancer by using the Logistic Regression algorithm of Sklearn machine learning library.The classification label of the data set is divided into 2 classes (malignant and benign),which is appropriate for the logistic regression model.The classification results based on the logistic regression model with two features show that the classification accuracy is the highest (95.72%) when the two characteristics of the mean radius and the largest perimeter are selected.In comparison to previous methods,the performance has been improved to some extent. |
Keywords: breast cancer data set;logistic regression classification algorithm;prediction |