CACHE

CRITICAL ASSESSMENT OF COMPUTATIONAL HIT-FINDING EXPERIMENTS

DONATE

  • About
    • WHAT IS CACHE
    • Read More
    • Spotlight
    • Conferences
  • CACHE News
  • CHALLENGES
    • Challenge #1
      • Announcement
      • Computation methods
      • Preliminary results
    • Challenge #2
      • Announcement
      • Computation methods
      • Preliminary results
    • Challenge #3
      • Announcement
      • Computation methods
    • Challenge #4
      • Announcement
      • Computation methods
    • FAQ
  • Sponsor a Challenge
  • CONTACT

Challenge #3

Hit Identification
Method type (check all that applies)
Deep learning
Free energy perturbation
High-throughput docking
Machine learning
Physics-based
Description of your approach (min 200 and max 800 words)

Our proposed hit-identification workflow extends pipelines developed by our team during  CACHE Challenge 2. The core methodology consists of high-throughput docking followed by binding affinity estimation using Molecular Mechanics Poisson-Boltzman Surface Area (MMPBSA) on multiple poses drawn from a molecular dynamics (MD) run of the protein-ligand complex. Given the computational cost of running MD and MMPBSA, we developed two novel strategies to ensure that we are able to explore most of the molecular space of Enamine REAL while only doing the expensive evaluations on the most promising molecules. The first approach uses pharmacophore matching on fragments followed by fragment combination. The second approach trains a Bayesian graph neural network to predict the binding affinity scores derived from MMPBSA.

 

Pharmacophore-based fragment matching and combining: we leverage a fragment combination approach to search a vast combinatorial space of reagent combinations while reducing the need for computationally expensive methods. We construct several pharmacophore models based on experimentally observed binders at the NSP3 target site. Each pharmacophore is divided into non-overlapping sub-components. Fragment-sized compounds are matched against each pharmacophore subcomponent. Hits from complementary pharmacophore sets are fed into a reaction predictor which determines whether the fragments can be combined, based on both chemical and geometrical constraints. If the combined compound maintains compatibility with the full pharmacophore this compound is passed for scoring. 

 

We use Enamine (building blocks) fragment libraries, as these are combined by Enamine to derive the full Enamine REAL database. This way we only yield compounds that are available from Enamine. Furthermore, this approach enables us to effectively explore the full 5.5bn compound space while only having to run pharmacophore matching on ~300,000 fragment compounds. This approach explores a combinatorial reaction product space that grows quadratically (in the case of two-component reactions) or cubically (in the case of three-component reactions) while computational costs are linear with the number of fragments. Using this approach in CACHE Challenge 2 we were able to generate leads that shared chemical properties with experimentally observed fragment hits and we found that more than 90% of the compounds generated using this approach were available from Enamine. 

 

Fragment combinations that pass the pharmacophore model are further scored by consensus docking [Autodock, rDock, DOCK]. For the most promising pool of candidates, we compute the absolute free energy of binding using MMPBSA [GROMACS and gmx-MMPBSA], as well as QM/MMPBSA using semi-empirical quantum mechanical calculations. Free energy perturbation calculations will be employed to choose between top-ranking ligands with similar structures and to evaluate minor modifications to the final structures [GROMACS PyAutoFEP]. 

 

Bayesian Graph Neural Network (BGNN) binding affinity predictor: we create a training dataset by running MMPBSA on a few hundred molecules to generate binding affinity estimates. We then train a BGNN using the Chemprop library to predict binding affinity. This model is used to generate predictions on the Enamine REAL database and those molecules which have the highest upper-confidence bound are selected to pass through the docking + MMPBSA pipeline. After obtaining more MMPBSA scores the model can be re-trained and the cycle repeated. In CACHE 2 we were able to generate predictions for 1bn molecules per day and we found good concordance between the MMPBSA predicted score and the BGNN score.

 

A key aspect of our strategy is ensuring efficient sampling of the 5.5bn compound space of the Enamine REAL database, while also ensuring that the most computationally intensive methods (MMPBSA & FEP) are reserved for the most promising candidate molecules. Using cluster compute resources we are able to dock several million molecules: this includes carrying out ensemble docking against multiple conformations of the receptor.

 

The ~3 million curated set of molecules that we dock and score will be composed of the following:

 

  • Pharmacophore matches generated from combining Enamine building blocks
  • Molecules from Enamine REAL with high score from the BGNN
  • ~2.1m molecules from the Enamine high-throughput screening library
  • ~460k molecules from the Enamine hit locator library
  • ~200k molecules from the Enamine building blocks library

Using this combination we will probe most of the relevant molecular space of the 5.5bn compound Enamine REAL database while reducing docking computations by multiple orders of magnitude.

 

Ligands are initially screened for molecular weight, solubility and exclusion of carboxylic acids. Final selections are screened against PAINS [BadApples webserver] to exclude ligands likely to be toxic or promiscuous binders. To ensure a diversity of structures in the final submission, we will cluster compounds based on structure and submit one sample from each cluster. FEP will be used to decide between structurally similar molecules. 

 

What makes your approach stand out from the community? (<100 words)

Our Reaction informed fragment fuser (RiFF) uses pharmacophore fragment matching followed by fragment combination. This allows us to reduce pharmacophore embedding and searching from billions of molecules to a few hundred thousand and yields compounds that share structural similarities with the experimentally observed binders. Second, we optimise the use of computationally expensive MD-based scoring by iteratively training a BGNN classifier to predict these scores and guide which molecules are evaluated next using a bayesian optimization strategy. This allows us to quickly screen Enamine REAL. Each of these methods complements our virtual screening funnel which uses docking followed by binding affinity estimation using MMPBSA.

Method Name
Reaction informed fragment fuser (RiFF)
Commercial software packages used

None

Free software packages used

AutoDock Vina

AutoDock4

rDock

GROMACS

DOCK v6 & v3 (Kuntz Group UCSF)

PyAutoFEP

 

Relevant publications of previous uses by your group of this software/method

Badaoui, Magd, Pedro J. Buigues, Dénes Berta, Gaurav M. Mandana, Hankang Gu, Tamás Földes, Callum J. Dickson, et al. ‘Combined Free-Energy Calculation and Machine Learning Methods for Understanding Ligand Unbinding Kinetics’. Journal of Chemical Theory and Computation 18, no. 4 (12 April 2022): 2543–55. 

https://doi.org/10.1021/acs.jctc.1c00924

 

Berta, Dénes, Magd Badaoui, Sam Alexander Martino, Pedro J. Buigues, Andrei V. Pisliakov, Nadia Elghobashi-Meinhardt, Geoff Wells, Sarah A. Harris, Elisa Frezza, and Edina Rosta. ‘Modelling the Active SARS-CoV-2 Helicase Complex as a Basis for Structure-Based Inhibitor Design’. Chemical Science 12, no. 40 (2021): 13492–505. https://doi.org/10.1039/D1SC02775A

 

Bradshaw, John, Brooks Paige, Matt J. Kusner, Marwin H. S. Segler, and José Miguel Hernández-Lobato. ‘Barking up the Right Tree: An Approach to Search over Molecule Synthesis DAGs’. arXiv, 21 December 2020. 

http://arxiv.org/abs/2012.11522

 

Cook, Nicola J, Wen Li, Dénes Berta, Magd Badaoui, Allison Ballandras-Colas, Andrea Nans, Abhay Kotecha, Edina Rosta, Alan N Engelman, Peter Cherepanov. ‘Structural basis of second-generation HIV integrase inhibitor action and viral resistance’. Science 367 (6479), 806-810.

https://www.science.org/doi/abs/10.1126/science.aay4919  

 

Karlova, Andrea, Wim Dehaen, and Andrei Penciu. ‘How to Reward Your Drug Agent?’, NeurIPS Workshop 2021. 

 

Cache

All rights reserved
v5.47.19.49

Footer first

  • Login
  • Applicant Login
  • Privacy Policy
  • FAQ
  • Docs
This website is licensed under CC-BY 4.0