CACHE

CRITICAL ASSESSMENT OF COMPUTATIONAL HIT-FINDING EXPERIMENTS

DONATE

  • About
    • WHAT IS CACHE
    • Conferences
  • CACHE News
  • CHALLENGES
    • Challenge #1
      • Announcement
      • Computation methods
      • Preliminary results
      • Final results
    • Challenge #2
      • Announcement
      • Computation methods
      • Preliminary results
      • Final Results
    • Challenge #3
      • Announcement
      • Computation methods
      • Preliminary results
      • Final Results
    • Challenge #4
      • Announcement
      • Computation methods
      • Preliminary results
    • Challenge #5
      • Announcement
      • Computation methods
    • Challenge #6
      • Announcement
    • FAQ
  • PUBLICATIONS
  • CONTACT

Challenge #5

Hit Identification
Method type (check all that applies)
De novo design
Deep learning
Machine learning
Hybrid of the above
Our workflow employs a combination of physical docking of a small library to train a ML model that predicts docking scores of putative ligands and generative modeling to predict novel ligands with high predicted score
Description of your approach (min 200 and max 800 words)

Our approach combines the expertise of Kozakov Lab at Stony Brook and Tropsha Lab at UNC. For this specific challenge, where substantial number of molecules active against the MCHR1 target is already known, we will use both structure-based (ML-accelerated docking) and ligand based (QSAR) methods developed in our laboratories and published in the open literature; we will not use any commercial software. Our SBDD workflow uses several complimentary modules for identification of high affinity hits for a given protein target with a known 3D structure. Key enabling components of our hit selection approach include the identification of the binding site hot-spot information together with conventional structure-based virtual screening methods. We use FTMap, a computational mapping algorithm that identifies binding regions on the surface of the target protein with major contributions to the ligand binding free energy along with soaked fragments' data from Fragalysis (when available). FTMap samples all possible positions of small organic molecule probes and scores them using a physical energy function. The binding site regions that bind multiple probes (both computational and experimental) are considered hot spots that bind favorable chemical functional groups. In addition to the hypothesis-free FTMap, we use a different approach towards hot spot identification dubbed LigTBM that was inspired by the binding site similarity search methods across the structural proteome. The basic idea is to match physico-chemical environment of the target protein to the micro pockets containing small organic molecule probes extracted from PDB structures containing bound ligands. This matching procedure also provides us with possible fragment placement within the target protein, so the data is presented in the same form as FTMap data, which facilitates the identification of consensus hot spots. The hot spot information is used to create a pharmacophore model for the next stage of virtual screening, where we perform a pharmacophore-based virtual screening of the entire Enamine REAL library (~47B with tautomers) to select a subset of the target specific compounds based on their fitting to our pharmacophore hypothesis. In addition, we use the pharmacophore models to seed our recently published HIDDEN GEM workflow that employs docking, similarity search, and generative modeling to obtain novel, target-specific hit compounds.  All virtual screening hits (usually ~1M) are docked into the binding site using Auto-DOCK software (which is also used as part of the HIDDEN GEM workflow) to prioritize hits . The top-scoring docking hits are then additionally prioritized using the hot spot information resulting in the final selection of hit molecules from the Enamine library.  In parallel, we will use known experimental data for MCHR1 ligands provided by the organizers to build QSAR models using various ML methods and best practices for model building and validation that we have extensively published on. We will then use our QSAR models for virtual screening of the same Enamine REAL library to nominate additional hits. We will then submit three groups of hits for this Challenge nominated via (i) SBDD workflow only, (ii) QSAR-based virtual screening only, and (iii) consensus hits from both workflows.

For the hit optimization phase, in addition to methods described above, we will employ SMALLSA developed recently (Kirchoff et al, 2024) to identufy additional hits from Enamine that are similar to the experimentally confirmed virtual screening hits.

What makes your approach stand out from the community? (<100 words)

We employ methodologies and software developed within our groups. Unique features of our approach reside with our use of both experimental (Fragalysis) and computationally derived (FTMap and LigTBM) identification of the hot spots, which are used to formulate the pharmacophore hypotheses. Additional unique element is the use of generative and reinforcement learning (with the bias provided by pharmacophore hypotheses) to design coputational hits de novo and then nominate both these hits and Enamine compounds similar to these hits as candidates for the experimental testing. The latter strategy is especially useful in the next phase of hit optimization.

Method Name
Frag2Hits
Commercial software packages used

None

Free software packages used

FTMap server (https://ftmap.bu.edu/), HIDDEN GEM (https://github.com/molecularmodelinglab/HiDDEN-GEM); RDKit;

Relevant publications of previous uses by your group of this software/method

Kathryn E. Kirchoff, James Wellnitz, Joshua E. Hochuli, Travis Maxfield, Konstantin I. Popov, Shawn Gomez, Alexander Tropsha. Utilizing Low-Dimensional Molecular Embeddings for Rapid Chemical Similarity Search.https://doi.org/10.48550/arXiv.2402.07970

Popov, K., Wellnitz, J., Maxfield, T., Tropsha, A. HIt Discovery using docking ENriched by GEnerative Modeling (HIDDEN GEM): A novel computational workflow for accelerated virtual screening of ultra-large chemical libraries. Mol Inform. 2024 Jan;43(1):e202300207. doi: 10.1002/minf.202300207

Kozakov D, Grove LE, Hall DR, Bohnuud T, Mottarella SE, Luo L, Xia B, Beglov D, Vajda S. The FTMap family of web servers for determining and characterizing ligand-binding hot spots of proteins. Nature Protocols. 2015

Popova M, Isayev O, Tropsha A.* Deep reinforcement learning for de novo drug design. Sci Adv. 2018 Jul 25;4(7):eaap7885. doi: 10.1126/sciadv.aap7885;

Alekseenko, A.; Kotelnikov, S.; Ignatov, M.; Egbert, M.; Kholodov, Y.; Vajda, S.; Kozakov, D. ClusPro LigTBM: Automated Template-Based Small Molecule Docking. J. Mol. Biol. 2019. https://doi.org/10.1016/j.jmb.2019.12.011;

Korshunova M, Ginsburg B, Tropsha A, Isayev O. OpenChem: A Deep Learning Toolkit for Computational Chemistry and Drug Design. J Chem Inf Model. 2021 Jan 25;61(1):7-13. doi: 10.1021/acs.jcim.0c00971

Tropsha A, Isayev O, Varnek A, Schneider G, Cherkasov A. Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR. Nat Rev Drug Discov. 2023 Dec 8. doi: 10.1038/s41573-023-00832-0

Cache

All rights reserved
v5.47.19.49

Footer first

  • Login
  • Applicant Login
  • Terms of Participation
  • Privacy Policy
  • FAQ
  • Docs
This website is licensed under CC-BY 4.0

Toronto website development by Rebel Trail