Concept acquisition and improved in-database similarity analysis for medical data

Wiese, Ingmar; Sarna, Nicole; Wiese, Lena; Tashkandi, Araek; Sax, Ulrich

doi:10.1007/s10619-018-7249-x

Concept acquisition and improved in-database similarity analysis for medical data

ISSN

0926-8782

Date Issued

2018

Author(s)

Wiese, Ingmar

Sarna, Nicole

Wiese, Lena

Tashkandi, Araek

Sax, Ulrich

DOI

10.1007/s10619-018-7249-x

Abstract

Efficient identification of cohorts of similar patients is a major precondition for personalized medicine. In order to train prediction models on a given medical data set, similarities have to be calculated for every pair of patients—which results in a roughly quadratic data blowup. In this paper we discuss the topic of in-database patient similarity analysis ranging from data extraction to implementing and optimizing the similarity calculations in SQL. In particular, we introduce the notion of chunking that uniformly distributes the workload among the individual similarity calculations. Our benchmark comprises the application of one similarity measures (Cosine similariy) and one distance metric (Euclidean distance) on two real-world data sets; it compares the performance of a column store (MonetDB) and a row store (PostgreSQL) with two external data mining tools (ELKI and Apache Mahout).

google-scholar

Views

Downloads

Options

Concept acquisition and improved in-database similarity analysis for medical data