Now showing 1 - 3 of 3
  • 2019Conference Paper
    [["dc.bibliographiccitation.firstpage","83"],["dc.bibliographiccitation.lastpage","92"],["dc.contributor.author","Tashkandi, Araek"],["dc.contributor.author","Wiese, Lena"],["dc.date.accessioned","2020-05-25T13:59:36Z"],["dc.date.available","2020-05-25T13:59:36Z"],["dc.date.issued","2019"],["dc.description.abstract","The efficiency of Machine Learning (ML) models has widely been acknowledged in the healthcare area. However, the quality of the underlying medical data is a major challenge when applying ML in medical decision making. In particular, the imbalanced class distribution problem causes the ML model to be biased towards the majority class. Furthermore, the accuracy will be biased, too, which produces the Accuracy Paradox. In this paper, we identify an optimal ML model for predicting mortality risk for Intensive Care Units (ICU) patients. We comprehensively assess an approach that leverages the efficiency of ML ensemble learning (in particular, Gradient Boosting Decision Tree) and clustering-based data sampling to handle the imbalanced data problem that this model faces. We comprehensively compare different competitors (in terms of ML models as well as clustering methods) on a big real-world ICU dataset achieving a maximum area under the curve value of 0.956."],["dc.identifier.doi","10.1145/3366030.3366040"],["dc.identifier.uri","https://resolver.sub.uni-goettingen.de/purl?gro-2/65965"],["dc.language.iso","en"],["dc.relation.conference","iiWAS2019: The 21st International Conference on Information Integration and Web-based Applications & Services"],["dc.relation.eventend","2019-12"],["dc.relation.eventlocation","München"],["dc.relation.eventstart","2019-12"],["dc.relation.ispartof","Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Service"],["dc.title","A Hybrid Machine Learning Approach for Improving Mortality Risk Prediction on Imbalanced Data"],["dc.type","conference_paper"],["dc.type.internalPublication","yes"],["dspace.entity.type","Publication"]]
    Details DOI
  • 2018Journal Article
    [["dc.bibliographiccitation.firstpage","297"],["dc.bibliographiccitation.issue","2"],["dc.bibliographiccitation.journal","Distributed and Parallel Databases"],["dc.bibliographiccitation.lastpage","321"],["dc.bibliographiccitation.volume","37"],["dc.contributor.author","Wiese, Ingmar"],["dc.contributor.author","Sarna, Nicole"],["dc.contributor.author","Wiese, Lena"],["dc.contributor.author","Tashkandi, Araek"],["dc.contributor.author","Sax, Ulrich"],["dc.date.accessioned","2020-05-25T13:20:07Z"],["dc.date.available","2020-05-25T13:20:07Z"],["dc.date.issued","2018"],["dc.description.abstract","Efficient identification of cohorts of similar patients is a major precondition for personalized medicine. In order to train prediction models on a given medical data set, similarities have to be calculated for every pair of patients—which results in a roughly quadratic data blowup. In this paper we discuss the topic of in-database patient similarity analysis ranging from data extraction to implementing and optimizing the similarity calculations in SQL. In particular, we introduce the notion of chunking that uniformly distributes the workload among the individual similarity calculations. Our benchmark comprises the application of one similarity measures (Cosine similariy) and one distance metric (Euclidean distance) on two real-world data sets; it compares the performance of a column store (MonetDB) and a row store (PostgreSQL) with two external data mining tools (ELKI and Apache Mahout)."],["dc.identifier.doi","10.1007/s10619-018-7249-x"],["dc.identifier.uri","https://resolver.sub.uni-goettingen.de/purl?gro-2/65959"],["dc.language.iso","en"],["dc.relation.issn","0926-8782"],["dc.title","Concept acquisition and improved in-database similarity analysis for medical data"],["dc.type","journal_article"],["dc.type.internalPublication","yes"],["dspace.entity.type","Publication"]]
    Details DOI
  • 2018Journal Article
    [["dc.bibliographiccitation.firstpage","52"],["dc.bibliographiccitation.journal","Big Data Research"],["dc.bibliographiccitation.lastpage","64"],["dc.bibliographiccitation.volume","13"],["dc.contributor.author","Tashkandi, Araek"],["dc.contributor.author","Wiese, Ingmar"],["dc.contributor.author","Wiese, Lena"],["dc.date.accessioned","2020-05-25T13:22:11Z"],["dc.date.available","2020-05-25T13:22:11Z"],["dc.date.issued","2018"],["dc.description.abstract","Patient similarity analysis is a precondition to apply machine learning technology on medical data. In this sense, patient similarity analysis harnesses the information wealth of electronic medical records (EMRs) to support medical decision making. A pairwise similarity computation can be used as the basis for personalized health prediction. With n patients the amount of similarity calculations is required. Thus, analyzing patient similarity leads to data explosion when exploiting big data. By increasing the data size the computational burden of this analysis increases. A real-life medical application may exceed the limits of current hardware in a fairly short amount of time. Finding ways to optimize patient similarity analysis and handling this data explosion is the topic of this paper. Current implementations for patient similarity analysis require their users to have knowledge of complex data analysis tools. Moreover, data pre-processing and analysis are performed in synthetic conditions: the data are extracted from the EMR database and then the data preparation and analysis are processed in external tools. After all of this effort the users might not experience a superior performance of the patient similarity analysis. We propose methods to optimize the patient similarity analysis in order to make it scalable to big data. Our method was tested against two real datasets and a low execution time was accomplished. Our result hence benefits a comprehensive medical decision support system. Moreover, our implementation comprises a balance between performance and applicability: the majority of the workload is processed within a database management system to enable a direct implementation on an EMR database."],["dc.identifier.doi","10.1016/j.bdr.2018.05.001"],["dc.identifier.uri","https://resolver.sub.uni-goettingen.de/purl?gro-2/65960"],["dc.language.iso","en"],["dc.relation.issn","2214-5796"],["dc.title","Efficient In-Database Patient Similarity Analysis for Personalized Medical Decision Support Systems"],["dc.type","journal_article"],["dc.type.internalPublication","yes"],["dspace.entity.type","Publication"]]
    Details DOI