Predicting the Accuracy of Ligand Overlay Methods with Random Forest Models
Citations Over Time
Abstract
The accuracy of binding mode prediction using standard molecular overlay methods (ROCS, FlexS, Phase, and FieldCompare) is studied. Previous work has shown that simple decision tree modeling can be used to improve accuracy by selection of the best overlay template. This concept is extended to the use of Random Forest (RF) modeling for template and algorithm selection. An extensive data set of 815 ligand-bound X-ray structures representing 5 gene families was used for generating ca. 70,000 overlays using four programs. RF models, trained using standard measures of ligand and protein similarity and Lipinski-related descriptors, are used for automatically selecting the reference ligand and overlay method maximizing the probability of reproducing the overlay deduced from X-ray structures (i.e., using rmsd ≤ 2 Å as the criteria for success). RF model scores are highly predictive of overlay accuracy, and their use in template and method selection produces correct overlays in 57% of cases for 349 overlay ligands not used for training RF models. The inclusion in the models of protein sequence similarity enables the use of templates bound to related protein structures, yielding useful results even for proteins having no available X-ray structures.
Related Papers
- Effectiveness of Thin Hot Mix Asphalt Overlay on Pavement Ride and Condition Performance(2008)
- ASPHALT OVERLAY ON CRACK-SEALED CONCRETE PAVEMENTS USING STRESS DISTRIBUTING MEDIA(1993)
- Selection on Optimum Overlay Thickness for Asphalt Pavement(2005)
- → Designing overlay structures(1977)2 cited
- PAVEMENT OVERLAY DESIGN PROCEDURES AND ASSUMPTIONS. VOLUME III: GUIDE FOR DESIGNING AN OVERLAY. FINAL REPORT(1986)