Write your message


Volume 19, Issue 1 (Iranian Journal of Breast Diseases 2027)                   ijbd 2027, 19(1): 64-76 | Back to browse issues page


XML Persian Abstract Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Khademi M, Khodabakhsh P, Heidarpoor Z, Paktinat S, Atashi A. Prediction of Breast Cancer Metastasis Using Tree-Based Machine Learning Models: A Retrospective Analysis of Iranian Women. ijbd 2027; 19 (1) :64-76
URL: http://ijbd.ir/article-1-1191-en.html
1- Department of Applied Mathematics, Islamic Azad University South Tehran Branch, Tehran, Iran , maryam_khademi@iau.ac.ir
2- Department of IT and Computer Engineering, Islamic Azad University South Tehran Branch, Tehran, Iran
3- Doctor of Medicine (MD), Islamic Azad University, Tehran Medical Branch, Tehran, Iran
4- Medical Informatics Department, Breast Cancer Research Center, Iranian National Cancer Institute, ACECR, Tehran, Iran & Department of Artificial Intelligence, School of Advanced Technologies in Medicine, Tehran University of Medical Sciences, Tehran, Iran
Abstract:   (76 Views)

Introduction: Breast cancer metastasis is a major cause of cancer-related death. Accurate prediction helps doctors make better decisions. This study developed and evaluated tree-based machine learning models to predict metastasis in Iranian women, using real clinical data with substantial missing values.
Methods: They looked at clinical records of 8,148 breast cancer patients in Tehran from 1997 to 2020. Variables with over 50% missing data were removed, leaving 4,310 complete cases. They compared Decision Tree, Random Forest, and XGBoost (which handles missing data well) against K-NN and Naïve Bayes (which need data imputation). They used stratified 10-fold cross-validation to check for overfitting and class imbalance, and then tested the best models on a separate hold-out set. Performance was measured using AUC, sensitivity, specificity, accuracy, and F1 score.
Results: Tree-based models worked better than the others. XGBoost had the best discrimination (AUC = 0.96, accuracy = 99.4%, F1 = 0.96), and Decision Trees were highly interpretable (sensitivity = 94%, specificity = 96.9%). Even though key predictors such as tumor size were excluded, other variables, such as hormone receptor status and age at menarche, allowed for strong predictions. K-NN had very low sensitivity (6%), and Naïve Bayes was inconsistent.
Conclusion: Decision trees and similar models can reliably predict breast cancer metastasis using incomplete, imbalanced real-world data, provided they are properly validated. These models are good for places with fewer resources. Future work should focus on better data collection and imputation methods. This study demonstrates the utility of interpretable machine learning for cancer applications in underrepresented populations.

Full-Text [PDF 683 kb]   (28 Downloads)    
Type of Study: Research | Subject: Health informatics
Received: 2025/06/5 | Accepted: 2025/10/8 | Published: 2026/03/25

References
2. [13] Suyal M, Goyal P. A review on analysis of k-nearest neighbor classification machine learning algorithms based on supervised learning. International Journal of Engineering Trends and Technology. 2022; 70(7):43-8. doi:10.14445/22315381/IJETT-V70I7P205 [DOI:10.14445/22315381/IJETT-V70I7P205]
3. [14] Bafjaish S. S., Comparative analysis of Naive Bayesian techniques in health-related for classification task. Journal of Soft Computing and Data Mining. 2020;1(2):1-10. doi:10.30880/jscdm.2020.01.02.001
4. [15] Bell ML, Floden L, Rabe BA, Hudgens S, Dhillon HM, Bray VJ, et al. Analytical approaches and estimands to take account of missing patient-reported data in longitudinal studies. Patient Relat Outcome Meas. 2019;10:129-40. doi: 10.2147/PROM.S178963. [DOI:10.2147/PROM.S178963] [PMID] []
5. [16] Arafat HM, Omar J, Muhamad R, Al-Astani TAD, Shafii N, Al Laham NA, et al. Breast Cancer Risk From Modifiable and Non-Modifiable Risk Factors among Palestinian Women: A Systematic Review and Meta-Analysis. Asian Pac J Cancer Prev. 2021;22(7):1987-95. doi: 10.31557/APJCP.2021.22.7.1987. [DOI:10.31557/APJCP.2021.22.7.1987] [PMID] []
6. [17] Youn HJ, Han W. A Review of the Epidemiology of Breast Cancer in Asia: Focus on Risk Factors. Asian Pac J Cancer Prev. 2020;21(4):867-80. doi: 10.31557/APJCP.2020.21.4.867. [DOI:10.31557/APJCP.2020.21.4.867] [PMID] []
7. [18] Ho PJ, Lau HSH, Ho WK, Wong FY, Yang Q, Tan KW, et al. Incidence of breast cancer attributable to breast density, modifiable and non-modifiable breast cancer risk factors in Singapore. Sci Rep. 2020;10(1):503. doi: 10.1038/s41598-019-57341-7. [DOI:10.1038/s41598-019-57341-7] [PMID] []
8. [19] Daly AA, Rolph R, Cutress RI, Copson ER. A Review of Modifiable Risk Factors in Young Women for the Prevention of Breast Cancer. Breast Cancer (Dove Med Press). 2021;13:241-57. doi: 10.2147/BCTT.S268401. [DOI:10.2147/BCTT.S268401] [PMID] []
9. [20] Dadziak M, Olko P, Zapala M A, Hunek A, Chmielarz K, Wiśiewska-Skomra J, et al. The non-modifiable risk factors for breast cancer development in women. Journal of Education, Health and Sport. Online. 2023.;25(1): 134-46. doi: 10.12775/JEHS.2023.25.01.012. [DOI:10.12775/JEHS.2023.25.01.012]
10. [21] Vishwakarma G, Mehta A, Saifi M, Garg D, Paliwal D. Modifiable (Sleeping Pattern and Stress) and Non-Modifiable Risk Factors Associated with Breast Cancer: A Matched Case-Control Study in Delhi, India. Asian Pac J Cancer Prev. 2022;23(7):2469-76. doi: 10.31557/APJCP.2022.23.7.2469. [DOI:10.31557/APJCP.2022.23.7.2469] [PMID] []
11. [22] Bastos D. R. d. Risk factors related to breast cancer development. Mastology. 2019;29(4):218-23. doi:10.29289/2594539420190000461 [DOI:10.29289/2594539420190000461]
12. [23] Yazdani A, Dorri S, Atashi A, Shirafkan H, Zabolinezhad H. Bone Metastasis Prognostic Factors in Breast Cancer. Breast Cancer (Auckl). 2019;13:1178223419830978. doi: 10.1177/1178223419830978. [DOI:10.1177/1178223419830978] [PMID] []
13. [24]Tapak L, Shirmohammadi-khoram N, Amini P, Poorolajal J. P. Prediction of survival and metastasis in breast cancer patients using machine learning classifiers. Clinical Epidemiology and Global Health. 2019;7(3):293-9. [DOI:10.1016/j.cegh.2018.10.003]
14. [25]Razavi M, Wang L, Karssemeijer N, Linsen L, Frese U, Hahn H. et al, Novel Morphological Features for Non-mass-like Breast Lesion Classification on DCE-MRI. Lecture Notes in Computer Science. 2016:305-12. doi:10.1007/978-3-319-47157-0_37. [DOI:10.1007/978-3-319-47157-0_37]
15. [26]Jakkanwar B. S. Review on Multiple Cancer Disease Prediction And Identification using Machine Learning Techniques. International Journal for Research in Applied Science and Engineering Technology. 2023;11(6):1333-7. doi:10.22214/ijraset.2023.53112 [DOI:10.22214/ijraset.2023.53112]
16. [27] Mao L, Wang H, Hu LS, Tran NL, Canoll PD, Swanson KR, Li J. Knowledge-Informed Machine Learning for Cancer Diagnosis and Prognosis: A Review. IEEE Trans Autom Sci Eng. 2025;22:10008-28. doi:10.1109/tase.2024.3515839. [DOI:10.1109/TASE.2024.3515839] [PMID] []

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2026 CC BY-NC 4.0 | Iranian Journal of Breast Diseases

Designed & Developed by: Yektaweb