Volume 9, Issue 2 (Iranian Quarterly Journal of Breast Diseases 2016)                   ijbd 2016, 9(2): 7-18 | Back to browse issues page

XML Persian Abstract Print

Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Darzi M, Olfat Bakhsh A, Gorgin S, Oveisi F, Hashemi E, Alavi N. Imbalanced Data Classification for Primary Diagnosis of Breast Diseases by AdaBoost.M1, K-Nearest Neighbor and Probabilistic Neural Network. ijbd. 2016; 9 (2) :7-18
URL: http://ijbd.ir/article-1-525-en.html
, modarzi@yahoo.com
Abstract:   (6856 Views)


Introduction: Breast Cancer is one of the common cancers in Iran. Each Prediagnosis of that can survive women from different risks. The aim of this research is classifying imbalanced dataset for detecting normal vs. abnormal women who came to ACECR Breast Cancer Clinic. Imbalanced datasets are one of the main challenges for designing medical decision support system. So, in this article, imbalanced data classification was addressed via data level solutions.

Methods: In this research for classifying of 918 women’ breast situation, the “AdaBoost.M1”, “K-nearest neighbor”, and “probabilistic neural network” as triple algorithms were used. Because of facing with imbalanced dataset, for solving that, “random over sampling”, “Random under sampling”, and “Synthetic Minority Over-sampling Technique” were used as 3 re-sampling methods. So, Mat lab and R as software tools were used for implementing of methods and algorithms. Also, the values of 60 features that extracted from women’s historical and physical exam forms were used as input data in triple algorithms. Finally, “precision” and “F-Measure” as two criteria were used for evaluating in test state of triple algorithms.

Results: Based on “precision” and “F-Measure” as two useful criteria, the best performance of this research’s classification algorithms were through dataset that generated by Synthetic Minority Over-sampling Technique. So, the performance of “AdaBoost.M1”, “K-nearest neighbor”, and “probabilistic neural network” for classification of that dataset based on “precision” and “F-Measure” were “93.5,93.6”, “79.5,87.7”,and “86,91.9”respectively.

Conclusion: There are different methods for solving imbalanced datasets problem through classification of that. Re-Sampling is one of the popular data level methods. Through 3 re-sampling methods, the best classification algorithm performance belongs datasets that generated by “Synthetic Minority Over-sampling Technique”, So among triple algorithms and four datasets that were used in this research and the based on “precision” and “F-Measure”, AdaBoost.M1 had the best performance in classification.

Full-Text [PDF 83 kb]   (5354 Downloads)    
Type of Study: Research | Subject: Breast
Received: 2016/09/17 | Accepted: 2016/09/17 | Published: 2016/09/17

Add your comments about this article : Your username or Email:

Send email to the article author

Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2022 CC BY-NC 4.0 | Iranian Quarterly Journal of Breast Disease

Designed & Developed by : Yektaweb