Towards Provenance-Based Anomaly Detection in MapReduce
Citations Over TimeTop 10% of 2015 papers
Abstract
MapReduce enables parallel and distributed processing of vast amount of data on a cluster of machines. However, such computing paradigm is subject to threats posed by malicious and cheating nodes or compromised user submitted code that could tamper data and computation since users maintain little control as the computation is carried out in a distributed fashion. In this paper, we focus on the analysis and detection of anomalies during the process of MapReduce computation. Accordingly, we develop a computational provenance system that captures provenance data related to MapReduce computation within the MapReduce framework in Hadoop. In particular, we identify a set of invariants against aggregated provenance information, which are later analyzed to uncover anomalies indicating possible tampering of data and computation. We conduct a series of experiments to show the efficiency and effectiveness of our proposed provenance system.
Related Papers
- Provenance analysis of Oligocene sediments in western Qaidam Basin(2012)
- Provenance analysis of low uplift of Shahejie Formation, Liaoxi sag,China(2014)
- Optimal Provenance Selection of 26-Year-Old Larix olgensis in East Slope of Daxing' anling(2008)
- Provenance Analysis of Qingshuihe Formation of Cretaceous in Hinterland of Junggar Basin(2012)
- Provenance Analysis of Middle Jurassic Xishanyao Formation in Shanshan Oilfield(2015)