Analysis of Optimized Machine Learning and Deep Learning Techniques for Spam Detection
Citations Over TimeTop 10% of 2021 papers
Abstract
Spam and non-spam email identification are one of the most challenging tasks for both email service providers and consumers. The spammers try to spread misleading facts through irritating messages by attracting user's attention. Several spam identification-models have previously been proposed and tested but the recorded accuracy has shown that further work in this direction is needed to achieve improved accuracy, low training time, and less error rate. In this research work, we have proposed a model that classifies the e-mail into spam and ham. DBSCAN and Isolation Forest are used to identify the extreme values outside of the specific range. Heatmap, Recursive Feature Elimination, and Chi-Square feature selection techniques are used to select the effective features. The proposed model is implemented in both machine learning and deep learning to establish a comparative analysis. Multinomial Naïve Bayes (MNB), Random Forest (RF), K-Nearest Neighbor (KNN), Gradient Boosting (GB) are used to introduce ensemble method in machine learning implementation. Recurrent Neural Network (RNN), Gradient Descent (GD), Artificial Neural Network (ANN) for deep learning implementation. An ensemble method is constructed to combine multiple classifiers' output. The ensemble methods allow producing better prediction accuracy compared to a single classifier. Our proposed model obtained an accuracy of 100%, AUC=100, MSE error = 0 and RMSE error = 0 for machine learning implementation and accuracy of 99%, loss value= 0.0165 for deep learning implementation based on an email spam base dataset collected from the UCI machine learning repository.
Related Papers
- → Software defect prediction using tree-based ensembles(2020)76 cited
- → An Ensemble of Random Forest Gradient Boosting Machine and Deep Learning Methods for Stock Price Prediction(2021)29 cited
- Breast Cancer Classification Using Machine Learning Techniques: A Review(2021)
- → Development of Predictive Models of Diabetes Using Ensemble Machine Learning Classifier(2022)5 cited
- → Machine Learning for Cancer Subtype Prediction with FSA Method(2019)4 cited