A Review on Data Cleansing Methods for Big Data
Citations Over TimeTop 10% of 2019 papers
Abstract
Massive amounts of data are available for the organization which will influence their business decision. Data collected from the various resources are dirty and this will affect the accuracy of prediction result. Data cleansing offers a better data quality which will be a great help for the organization to make sure their data is ready for the analyzing phase. However, the amount of data collected by the organizations has been increasing every year, which is making most of the existing methods no longer suitable for big data. Data cleansing process mainly consists of identifying the errors, detecting the errors and corrects them. Despite the data need to be analyzed quickly, the data cleansing process is complex and time-consuming in order to make sure the cleansed data have a better quality of data. The importance of domain expert in data cleansing process is undeniable as verification and validation are the main concerns on the cleansed data. This paper reviews the data cleansing process, the challenge of data cleansing for big data and the available data cleansing methods.
Related Papers
- → Scene-Based Big Data Quality Management Framework(2018)11 cited
- → Data Quality Issues in Data Migration(2015)6 cited
- → Addition of Process Decomposition in Open Source Tools-Based Cleansing Data Modules(2021)3 cited
- → Rule-based data quality(2002)3 cited
- → Data Quality and Information Compliance(2003)