Performance evaluation of big data frameworks for large-scale data analytics
Citations Over TimeTop 10% of 2016 papers
Abstract
The increasing adoption of Big Data analytics has led to a high demand for efficient technologies in order to manage and process large datasets. Popular MapReduce frameworks such as Hadoop are being replaced by emerging ones like Spark or Flink, which improve both the programming APIs and performance. However, few works have focused on comparing these frameworks. This paper addresses this issue by performing a comparative evaluation of Hadoop, Spark and Flink using representative Big Data workloads and considering factors like performance and scalability. Moreover, the behavior of these frameworks has been characterized by modifying some of the main parameters of the workloads such as HDFS block size, input data size, interconnect network or thread configuration. The analysis of the results has shown that replacing Hadoop with Spark or Flink can lead to a reduction in execution times by 77% and 70% on average, respectively, for non-sort benchmarks.
Related Papers
- → Big Data Processing Using Spark in Cloud(2018)48 cited
- → The Hollow Spark, a New Spectroscopic Light Source(1972)5 cited
- SPARK Examiner를 이용해 ANSI-C프로그램의 안전성을 분석하기 위한 C언어의 제약 조건과 변환 방법(2003)
- Study on Sort Mode of Ecotourism Students in University(2004)
- → There are two kinds of economic planning. One sort is the planning of final demands and the other sort is the planning of total output(1963)