An Evaluation of Cassandra for Hadoop
Citations Over TimeTop 1% of 2013 papers
Abstract
In the last decade, the increased use and growth of social media, unconventional web technologies, and mobile applications, have all encouraged development of a new breed of database models. NoSQL data stores target the unstructured data, which by nature is dynamic and a key focus area for "Big Data" research. New generation data can prove costly and unpractical to administer with SQL databases due to lack of structure, high scalability, and elasticity needs. NoSQL data stores such as MongoDB and Cassandra provide a desirable platform for fast and efficient data queries. This leads to increased importance in areas such as cloud applications, e-commerce, social media, bioinformatics, and materials science. In an effort to combine the querying capabilities of conventional database systems and the processing power of the MapReduce model, this paper presents a thorough evaluation of the Cassandra NoSQL database when used in conjunction with the Hadoop MapReduce engine. We characterize the performance for a wide range of representative use cases, and then compare, contrast, and evaluate so that application developers can make informed decisions based upon data size, cluster size, replication factor, and partitioning strategy to meet their performance needs.
Related Papers
- → Comparison of SQL, NoSQL and NewSQL databases for internet of things(2016)41 cited
- → Hybrid storage engine for geospatial data using NoSQL and SQL paradigms(2021)5 cited
- → Intelligent processing of unstructured textual data in document based NoSQL databases(2021)6 cited
- → Improving The Performance of Big Data Databases(2019)1 cited
- → Standard process for moving from a relational database model to NoSQL(2023)