A Deep Recurrent Neural Network Based Predictive Control Framework for Reliable Distributed Stream Data Processing
Citations Over Time
Abstract
In this paper, we present design, implementation and evaluation of a novel predictive control framework to enable reliable distributed stream data processing, which features a Deep Recurrent Neural Network (DRNN) model for performance prediction, and dynamic grouping for flexible control. Specifically, we present a novel DRNN model, which makes accurate performance prediction with careful consideration for interference of co-located worker processes, according to multilevel runtime statistics. Moreover, we design a new grouping method, dynamic grouping, which can distribute/re-distribute data tuples to downstream tasks according to any given split ratio on the fly. So it can be used to re-direct data tuples to bypass misbehaving workers. We implemented the proposed framework based on a widely used Distributed Stream Data Processing System (DSDPS), Storm. For validation and performance evaluation, we developed two representative stream data processing applications: Windowed URL Count and Continuous Queries. Extensive experimental results show: 1) The proposed DRNN model outperforms widely used baseline solutions, ARIMA and SVR, in terms of prediction accuracy; 2) dynamic grouping works as expected; and 3) the proposed framework enhances reliability by offering minor performance degradation with misbehaving workers.
Related Papers
- → Approximate join processing over data streams(2003)277 cited
- → A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams(2004)138 cited
- → Frequency-based load shedding over a data stream of tuples(2009)10 cited
- → Mining Closed Item sets from Tuple-Evolving Data Streams(2019)1 cited
- → Trends in Data Stream Mining(2023)