An Extensive and Diverse Set of Molecular Overlays for the Validation of Pharmacophore Programs
Citations Over TimeTop 13% of 2013 papers
Abstract
The pharmacophore hypothesis plays a central role in both the design and optimization of drug-like ligands. Pharmacophore patterns are invoked to explain the binding affinity of ligands and to enable the design of chemically distinct scaffolds that show affinity for a protein target of interest. The importance of pharmacophores in rationalizing ligand affinity has led to numerous algorithms that seek to overlay ligands based on their pharmacophoric features. All such algorithms must be validated with respect to known ligand overlays, usually by extracting ligand overlay sets from the Protein Data Bank (PDB). This validation step creates the problem of which of the known overlays to select and from which proteins. The large number of structures and protein families in the PDB makes it difficult to establish a definitive overlay set; as a result, validation studies have rarely employed the same data sets. We have therefore undertaken an exhaustive analysis of the RCSB PDB to identify 121 distinct ligand overlay sets. We have defined a robust protein overlay protocol, which is free from subjective interpretation over which residues to include, and we have analyzed each overlay set on the basis of whether they provide evidence for the pharmacophore hypothesis. Our final data set spans a broad range of structural types and degrees of difficulty and includes overlays that any algorithm should be able to reproduce, as well as some for which there is very weak evidence for a conserved pharmacophore at all. We provide this set in the hope that it will prove definitive, at least until the PDB is greatly enriched with further structures or with radically different protein folds and families. Upon publication, the data set will be available for free download from the Web site of the Cambridge Crystallographic Data Centre.
Related Papers
- → Protein Data Bank (PDB): Database of Three-Dimensional Structural Information of Biological Macromolecules(1998)767 cited
- → Intrinsic Disorder in the Protein Data Bank(2007)171 cited
- → BioJava-ModFinder: identification of protein modifications in 3D structures from the Protein Data Bank(2017)10 cited
- → Using the Tools and Resources of the RCSB Protein Data Bank(2016)10 cited
- → The Protein Data Bank ( PDB ) and Macromolecular Structure Data Supporting Computer‐Aided Drug Design(2023)4 cited