Application of Random Forest Approach to QSAR Prediction of Aquatic Toxicity
Citations Over TimeTop 10% of 2009 papers
Abstract
This work is devoted to the application of the random forest approach to QSAR analysis of aquatic toxicity of chemical compounds tested on Tetrahymena pyriformis. The simplex representation of the molecular structure approach implemented in HiT QSAR Software was used for descriptors generation on a two-dimensional level. Adequate models based on simplex descriptors and the RF statistical approach were obtained on a modeling set of 644 compounds. Model predictivity was validated on two external test sets of 339 and 110 compounds. The high impact of lipophilicity and polarizability of investigated compounds on toxicity was determined. It was shown that RF models were tolerant for insertion of irrelevant descriptors as well as for randomization of some part of toxicity values that were representing a "noise". The fast procedure of optimization of the number of trees in the random forest has been proposed. The discussed RF model had comparable or better statistical characteristics than the corresponding PLS or KNN models.
Related Papers
- → Lipophilicity estimation and characterization of selected steroid derivatives of biomedical importance applying RP HPLC(2016)15 cited
- → Modeling of Chromatographic Lipophilicity Indices of Quaternary Ammonium and Nitrone Derivatives and Their Thiazolic Salts Using Molecular Descriptors(2010)9 cited
- → Quantitative Structure-Activity Relationship Study of Some Antipsychotics by Multiple Linear Regressions(2014)4 cited
- → Structure–Retention Relationship Study of 2,4-dioxotetrahydro-1,3-thiazole Derivatives(2015)3 cited
- → Quantum Chemistry Prediction of Molecular Lipophilicity Using Semi-Empirical AM1 and <i>Ab Initio</i> HF/6-311++G Levels(2017)1 cited