Scalable Containerized Pipeline for Real-time Big Data Analytics
Date
2022Author
Aurangzaib, RanaIqbal, Waheed
Abdullah, Muhammad
Bukhari, Faisal
Ullah, Faheem
Erradi, Abdelkarim
...show more authors ...show less authors
Metadata
Show full item recordAbstract
With the widespread usage of IoT, processing data streams in real-time have become very important. The traditional data-stream processing systems are inefficient in processing big data for detecting anomalies, classifications, clustering, and prediction in real-time using minimal resources. In this paper, we address this limitation by proposing a scalable pipeline for real-time processing of big data streams. Our proposed solution is capable of dynamically managing resources for different components of the pipeline using automatic scaling. The pipeline is containerized and deployed on a Kubernetes cluster. The proposed scalable pipeline is evaluated using a case study of anomaly detection in IoT data. The proposed solution yields a x 1.31 to x 2.4 increase in throughput, and x 32 to x 80 decreased latency compared to the commonly used static resource allocation strategy for data pipelines. 2022 IEEE.
Collections
- Computer Science & Engineering [2402 items ]