Performance Evaluation of Different Feature Encoding Schemes on Cybersecurity Logs
Citations Over TimeTop 10% of 2019 papers
Abstract
Many cybersecurity logs contain a substantial volume of textual data regarding security events. This data needs to be converted to numerical types before any machine learning (ML) algorithms can be applied. Feature encoding is the process of transforming textual data into numerical values so they may be applied to ML algorithms, resulting in improved model accuracy. Researchers have used many approaches to convert textual data into numerical values such as, “Label Encoding” “One Hot Encoding” and “Binary Encoding”. These approaches are useful encoding schemes for dealing with large scale text data. We examine the application of these methods to cybersecurity datasets to determine which encoding scheme performs the best when used with a classification ML algorithm in identifying intrusion detections. Experimental results show that label encoding performed the best, whereas one hot encoding was least effective.
Related Papers
- → A computational intelligence for evaluation of intrusion detection system(2011)10 cited
- A Kind of Intrusion Detection Model(2003)
- Research on Computer Network Intrusion Detection System(2003)
- Discussion of Intrusion Detection Technology(2004)
- → The Papers of Woodrow Wilson: Volume 18: 1908-1909; Volume 19: 1909-1910; Volume 20: 1910; Volume 21: 1910.(1977)