Contemporary QSAR Classifiers Compared
Citations Over TimeTop 1% of 2007 papers
Abstract
We present a comparative assessment of several state-of-the-art machine learning tools for mining drug data, including support vector machines (SVMs) and the ensemble decision tree methods boosting, bagging, and random forest, using eight data sets and two sets of descriptors. We demonstrate, by rigorous multiple comparison statistical tests, that these techniques can provide consistent improvements in predictive performance over single decision trees. However, within these methods, there is no clearly best-performing algorithm. This motivates a more in-depth investigation into the properties of random forests. We identify a set of parameters for the random forest that provide optimal performance across all the studied data sets. Additionally, the tree ensemble structure of the forest may provide an interpretable model, a considerable advantage over SVMs. We test this possibility and compare it with standard decision tree models.
Related Papers
- → A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects(2022)1,066 cited
- → HLPI-Ensemble: Prediction of human lncRNA-protein interactions based on ensemble strategy(2018)103 cited
- → An Ensemble of Random Forest Gradient Boosting Machine and Deep Learning Methods for Stock Price Prediction(2021)29 cited
- → Peer to peer lending risk analysis based on embedded technique and stacking ensemble learning(2022)12 cited
- → Ensemble Methods in Supervised Learning(2009)18 cited