Relaxed online SVMs for spam filtering
Citations Over TimeTop 1% of 2007 papers
Abstract
Spam is a key problem in electronic communication, including large-scale email systems and the growing number of blogs. Content-based filtering is one reliable method of combating this threat in its various forms, but some academic researchers and industrial practitioners disagree on how best to filter spam. The former have advocated the use of Support Vector Machines (SVMs) for content-based filtering, as this machine learning methodology gives state-of-the-art performance for text classification. However, similar performance gains have yet to be demonstrated for online spam filtering. Additionally, practitioners cite the high cost of SVMs as reason to prefer faster (if less statistically robust) Bayesian methods. In this paper, we offer a resolution to this controversy. First, we show that online SVMs indeed give state-of-the-art classification performance on online spam filtering on large benchmark data sets. Second, we show that nearly equivalent performance may be achieved by a Relaxed Online SVM (ROSVM) at greatly reduced computational cost. Our results are experimentally verified on email spam, blog spam, and splog detection tasks.
Related Papers
- E-mail Spam Filtering using Genetic Algorithm based on Probabilistic Weights and Words Count(2020)
- → Automatic Email Spam Classification Using Naïve Bayes(2023)4 cited
- → Combining Naive Bayes and Tri-gram Language Model for Spam Filtering(2011)4 cited
- → SMS Spam Detection using Machine Learning(2023)2 cited
- PRIS Kidult Anti-SPAM Solution at the TREC 2005 Spam Track: Improving the Performance of Naive Bayes for Spam Detection(2005)