Options
Diagnosing Cloud Performance Anomalies Using Large Time Series Dataset Analysis
Journal
IEEE 7th International Conference on Cloud Computing
Date Issued
2014
Author(s)
DOI
10.1109/CLOUD.2014.129
Abstract
Virtualized Cloud platforms have become increasingly common and the number of online services hosted on these platforms is also increasing rapidly. A key problem faced by providers in managing these services is detecting the performance anomalies and adjusting resources accordingly. As online services generate a very large amount of monitored data in the form of time series, it becomes very difficult to process this complex data by traditional approaches. In this work, we present a novel distributed parallel approach for performance anomaly detection. We build upon Holt-Winters forecasting for automatic aberrant behavior detection in time series. First, we extend the technique to work with MapReduce paradigm. Next, we correlate the anomalous metrics with the target Service Level Objective (SLO) in order to locate the suspicious metrics. We implemented and evaluated our approach on a production Cloud encompassing IaaS and PaaS service models. Experimental results confirm that our approach is efficient and effective in capturing the metrics causing performance anomalies in large time series datasets.