Towards Building a Scalable Data Analytics System on Clouds: An Early Experience on AliCloud
Citations Over Time
Abstract
With the development of big data, big data processing systems, such as Hadoop and Spark, are widely used to handle large-scale data. To avoid the complexity and expensiveness of building a self-owned big data processing system, cloud providers tend to deploy big data processing tools as cloud services. Typical examples include Amazon EMR, Azure HDInsight and AliCloud E-MapReduce. However, how to build a cost-efficient system and scale the system is still challenging. In this paper, we have conducted a case study on AliCloud E-MapReduce, and analyzed the system performance upon local and remote file systems. We compared the scalability of Hadoop and Spark by using scaleout and scale-up strategies respectively. Based on the analysis results, we derive several observations and implications, which will contribute to guide the performance optimization.
Related Papers
- → Big data anonymization with spark(2017)19 cited
- → A Comparative Study of Bigdata Tools: Hadoop Vs Spark Vs Storm(2023)7 cited
- → Big Data Analysis using Apache Hadoop and Spark(2019)5 cited
- → SmarT: Machine Learning Approach for Efficient Filtering and Retrieval of Spatial and Temporal Data in Big Data(2021)2 cited
- → Introduction to Bigdata and Relation with IoT(2018)