Options
Efficient In-Database Patient Similarity Analysis for Personalized Medical Decision Support Systems
ISSN
2214-5796
Date Issued
2018
Author(s)
DOI
10.1016/j.bdr.2018.05.001
Abstract
Patient similarity analysis is a precondition to apply machine learning technology on medical data. In this sense, patient similarity analysis harnesses the information wealth of electronic medical records (EMRs) to support medical decision making. A pairwise similarity computation can be used as the basis for personalized health prediction. With n patients the amount of similarity calculations is required. Thus, analyzing patient similarity leads to data explosion when exploiting big data. By increasing the data size the computational burden of this analysis increases. A real-life medical application may exceed the limits of current hardware in a fairly short amount of time. Finding ways to optimize patient similarity analysis and handling this data explosion is the topic of this paper. Current implementations for patient similarity analysis require their users to have knowledge of complex data analysis tools. Moreover, data pre-processing and analysis are performed in synthetic conditions: the data are extracted from the EMR database and then the data preparation and analysis are processed in external tools. After all of this effort the users might not experience a superior performance of the patient similarity analysis. We propose methods to optimize the patient similarity analysis in order to make it scalable to big data. Our method was tested against two real datasets and a low execution time was accomplished. Our result hence benefits a comprehensive medical decision support system. Moreover, our implementation comprises a balance between performance and applicability: the majority of the workload is processed within a database management system to enable a direct implementation on an EMR database.