Now showing 1 - 2 of 2
  • 2018Journal Article
    [["dc.bibliographiccitation.firstpage","297"],["dc.bibliographiccitation.issue","2"],["dc.bibliographiccitation.journal","Distributed and Parallel Databases"],["dc.bibliographiccitation.lastpage","321"],["dc.bibliographiccitation.volume","37"],["dc.contributor.author","Wiese, Ingmar"],["dc.contributor.author","Sarna, Nicole"],["dc.contributor.author","Wiese, Lena"],["dc.contributor.author","Tashkandi, Araek"],["dc.contributor.author","Sax, Ulrich"],["dc.date.accessioned","2020-05-25T13:20:07Z"],["dc.date.available","2020-05-25T13:20:07Z"],["dc.date.issued","2018"],["dc.description.abstract","Efficient identification of cohorts of similar patients is a major precondition for personalized medicine. In order to train prediction models on a given medical data set, similarities have to be calculated for every pair of patients—which results in a roughly quadratic data blowup. In this paper we discuss the topic of in-database patient similarity analysis ranging from data extraction to implementing and optimizing the similarity calculations in SQL. In particular, we introduce the notion of chunking that uniformly distributes the workload among the individual similarity calculations. Our benchmark comprises the application of one similarity measures (Cosine similariy) and one distance metric (Euclidean distance) on two real-world data sets; it compares the performance of a column store (MonetDB) and a row store (PostgreSQL) with two external data mining tools (ELKI and Apache Mahout)."],["dc.identifier.doi","10.1007/s10619-018-7249-x"],["dc.identifier.uri","https://resolver.sub.uni-goettingen.de/purl?gro-2/65959"],["dc.language.iso","en"],["dc.relation.issn","0926-8782"],["dc.title","Concept acquisition and improved in-database similarity analysis for medical data"],["dc.type","journal_article"],["dc.type.internalPublication","yes"],["dspace.entity.type","Publication"]]
    Details DOI
  • 2018Journal Article
    [["dc.bibliographiccitation.firstpage","52"],["dc.bibliographiccitation.journal","Big Data Research"],["dc.bibliographiccitation.lastpage","64"],["dc.bibliographiccitation.volume","13"],["dc.contributor.author","Tashkandi, Araek"],["dc.contributor.author","Wiese, Ingmar"],["dc.contributor.author","Wiese, Lena"],["dc.date.accessioned","2020-05-25T13:22:11Z"],["dc.date.available","2020-05-25T13:22:11Z"],["dc.date.issued","2018"],["dc.description.abstract","Patient similarity analysis is a precondition to apply machine learning technology on medical data. In this sense, patient similarity analysis harnesses the information wealth of electronic medical records (EMRs) to support medical decision making. A pairwise similarity computation can be used as the basis for personalized health prediction. With n patients the amount of similarity calculations is required. Thus, analyzing patient similarity leads to data explosion when exploiting big data. By increasing the data size the computational burden of this analysis increases. A real-life medical application may exceed the limits of current hardware in a fairly short amount of time. Finding ways to optimize patient similarity analysis and handling this data explosion is the topic of this paper. Current implementations for patient similarity analysis require their users to have knowledge of complex data analysis tools. Moreover, data pre-processing and analysis are performed in synthetic conditions: the data are extracted from the EMR database and then the data preparation and analysis are processed in external tools. After all of this effort the users might not experience a superior performance of the patient similarity analysis. We propose methods to optimize the patient similarity analysis in order to make it scalable to big data. Our method was tested against two real datasets and a low execution time was accomplished. Our result hence benefits a comprehensive medical decision support system. Moreover, our implementation comprises a balance between performance and applicability: the majority of the workload is processed within a database management system to enable a direct implementation on an EMR database."],["dc.identifier.doi","10.1016/j.bdr.2018.05.001"],["dc.identifier.uri","https://resolver.sub.uni-goettingen.de/purl?gro-2/65960"],["dc.language.iso","en"],["dc.relation.issn","2214-5796"],["dc.title","Efficient In-Database Patient Similarity Analysis for Personalized Medical Decision Support Systems"],["dc.type","journal_article"],["dc.type.internalPublication","yes"],["dspace.entity.type","Publication"]]
    Details DOI