Options
A Hybrid Machine Learning Approach for Improving Mortality Risk Prediction on Imbalanced Data
Journal
Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Service
Date Issued
2019
Author(s)
Tashkandi, Araek
DOI
10.1145/3366030.3366040
Abstract
The efficiency of Machine Learning (ML) models has widely been acknowledged in the healthcare area. However, the quality of the underlying medical data is a major challenge when applying ML in medical decision making. In particular, the imbalanced class distribution problem causes the ML model to be biased towards the majority class. Furthermore, the accuracy will be biased, too, which produces the Accuracy Paradox. In this paper, we identify an optimal ML model for predicting mortality risk for Intensive Care Units (ICU) patients. We comprehensively assess an approach that leverages the efficiency of ML ensemble learning (in particular, Gradient Boosting Decision Tree) and clustering-based data sampling to handle the imbalanced data problem that this model faces. We comprehensively compare different competitors (in terms of ML models as well as clustering methods) on a big real-world ICU dataset achieving a maximum area under the curve value of 0.956.