Mapping of the Available Chemical Space versus the Chemical Universe of Lead‐Like Compounds
Citations Over TimeTop 10% of 2017 papers
Abstract
This is, to our knowledge, the most comprehensive analysis to date based on generative topographic mapping (GTM) of fragment-like chemical space (40 million molecules with no more than 17 heavy atoms, both from the theoretically enumerated GDB-17 and real-world PubChem/ChEMBL databases). The challenge was to prove that a robust map of fragment-like chemical space can actually be built, in spite of a limited (≪105 ) maximal number of compounds ("frame set") usable for fitting the GTM manifold. An evolutionary map building strategy has been updated with a "coverage check" step, which discards manifolds failing to accommodate compounds outside the frame set. The evolved map has a good propensity to separate actives from inactives for more than 20 external structure-activity sets. It was proven to properly accommodate the entire collection of 40 m compounds. Next, it served as a library comparison tool to highlight biases of real-world molecules (PubChem and ChEMBL) versus the universe of all possible species represented by FDB-17, a fragment-like subset of GDB-17 containing 10 million molecules. Specific patterns, proper to some libraries and absent from others (diversity holes), were highlighted.
Related Papers
- → SMIfp (SMILES fingerprint) Chemical Space for Virtual Screening and Visualization of Large Databases of Organic Molecules(2013)69 cited
- → Expanding the fragrance chemical space for virtual screening(2014)45 cited
- → Mapping of the Available Chemical Space versus the Chemical Universe of Lead‐Like Compounds(2017)45 cited
- → Expanding Bioactive Fragment Space with the Generated Database GDB-13s(2023)12 cited
- → Design of Diverse and Focused Compound Libraries(2017)4 cited