Imbalanced Data Classification for Primary Diagnosis of Breast Diseases by AdaBoost.M1, K-Nearest Neighbor and Probabilistic Neural Network

Darzi, Mohammad; Olfat Bakhsh, Asiye; Gorgin, Saeid; Oveisi, Farid; Hashemi, Esmat; Alavi, Nasrin

Write your message

ارسال به ایمیل

Volume 9, Issue 2 (Iranian Quarterly Journal of Breast Diseases 2016) ijbd 2016, 9(2): 7-18 | Back to browse issues page

‎ 20.1001.1.17359406.1395.9.2.1.5

Mendeley

Zotero

RefWorks

Darzi M, Olfat Bakhsh A, Gorgin S, Oveisi F, Hashemi E, Alavi N. Imbalanced Data Classification for Primary Diagnosis of Breast Diseases by AdaBoost.M1, K-Nearest Neighbor and Probabilistic Neural Network. ijbd 2016; 9 (2) :7-18
URL: http://ijbd.ir/article-1-525-en.html

Imbalanced Data Classification for Primary Diagnosis of Breast Diseases by AdaBoost.M1, K-Nearest Neighbor and Probabilistic Neural Network

Mohammad Darzi ^*¹

1- , modarzi@yahoo.com

Abstract: (9260 Views)

Abstract

Introduction: Breast Cancer is one of the common cancers in Iran. Each Prediagnosis of that can survive women from different risks. The aim of this research is classifying imbalanced dataset for detecting normal vs. abnormal women who came to ACECR Breast Cancer Clinic. Imbalanced datasets are one of the main challenges for designing medical decision support system. So, in this article, imbalanced data classification was addressed via data level solutions.

Methods: In this research for classifying of 918 women’ breast situation, the “AdaBoost.M1”, “K-nearest neighbor”, and “probabilistic neural network” as triple algorithms were used. Because of facing with imbalanced dataset, for solving that, “random over sampling”, “Random under sampling”, and “Synthetic Minority Over-sampling Technique” were used as 3 re-sampling methods. So, Mat lab and R as software tools were used for implementing of methods and algorithms. Also, the values of 60 features that extracted from women’s historical and physical exam forms were used as input data in triple algorithms. Finally, “precision” and “F-Measure” as two criteria were used for evaluating in test state of triple algorithms.

Results: Based on “precision” and “F-Measure” as two useful criteria, the best performance of this research’s classification algorithms were through dataset that generated by Synthetic Minority Over-sampling Technique. So, the performance of “AdaBoost.M1”, “K-nearest neighbor”, and “probabilistic neural network” for classification of that dataset based on “precision” and “F-Measure” were “93.5,93.6”, “79.5,87.7”,and “86,91.9”respectively.

Conclusion: There are different methods for solving imbalanced datasets problem through classification of that. Re-Sampling is one of the popular data level methods. Through 3 re-sampling methods, the best classification algorithm performance belongs datasets that generated by “Synthetic Minority Over-sampling Technique”, So among triple algorithms and four datasets that were used in this research and the based on “precision” and “F-Measure”, AdaBoost.M1 had the best performance in classification.

Keywords: Imbalanced Dataset, Classification, Breast Diseases, AdaBoost.M1, K-NN, PNN, SMOTE

Full-Text [PDF 83 kb] (7041 Downloads)

Type of Study: Research | Subject: Breast Diseases
Received: 2016/09/17 | Accepted: 2016/09/17 | Published: 2016/09/17

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Designed & Developed by: Yektaweb

Iranian Journal of

Breast Diseases

Related Websites