Our goal through this competition is to validate whether our enhanced hit-finding workflow demonstrates the anticipated efficacy compared to our existing workflow (CACHE challenge #4). This workflow integrates various in silico drug development techniques, from target protein structure modeling to ultra-high-throughput virtual screening and de novo design using generative models, while maintaining simplicity.
Initially, we will conduct multi-state modeling using AlphaFold2 to generate diverse structures of MCHR1, guided by templates from activation state-annotated GPCR databases (https://gpcrdb.org/). We will use the multi-state modeling algorithm developed by Heo et al. (doi:10.1002/prot.26382). We will select 10 structures based on their TM-RMSD (Cα-RMSD for the transmembrane helices), considering the diversity of the extracellular interface identified from Cryo-EM and AF2 modeling studies (https://doi.org/10.1101/2023.11.03.565472).
For these selected structures, we will perform ligand docking using experimentally verified ligands (IC50 nM~uM) provided by CACHE using AutoDock-GPU to obtain various binding poses. Subsequently, our team's deep learning-based protein-ligand binding affinity prediction (AKScore2, https://doi.org/10.3390/ijms21228424) will be performed to these poses. To complement the accuracy of our selection, we will also re-scoring docked poses using RTMscore (https://doi.org/10.1021/acs.jmedchem.2c00991), and select top candidates based on consensus between two methods. Based on these results, we will choose the protein structure and the bound poses that are most consistent with actual experimental outcomes as the final target structure.
We will explore two pipelines to identify two groups of hit candidates. The first involves creating a docking score predictor, similar to our V-Dock approach (https://doi.org/10.3390/ijms222111635), which uses a trained machine that predicts docking score only from their SMILES strings. Initially, the Enamine screening collection (4 million compounds) will be docked and train the docking score predictor using the SMILES representation as input. After training, we will rapidly screen the Enamine REAL DB (4 billion) to filter top molecules predicted to have high docking scores. These molecules will be validated through actual docking & rescoring calculations using the similar approach to select hit candidate group A.
The second pipeline will use generative models, such as ResGen (https://doi.org/10.1038/s42256-023-00712-7) and Pocket2Mol (https://doi.org/10.48550/arXiv.2205.07249), to create new molecules that can bind to the binding pocket, thus forming a de novo designed molecule library. During the generation, we will use the experimentally validated molecules as initial scaffolds. We will also search for ligands similar to experimentally verified ligand structures from the Enamine library. Following a similar screening process (docking & rescoring), we will select top molecules to form hit candidate group B. If de novo designed compounds are challenging to synthesize, we will find and screen similar molecules from the Enamine library to refine group B.
From groups A and B, we will filter further using criteria like target affinity, solubility, and PAINS filters, prioritizing chemical novelty and likelihood of binding to the protein for the final hit candidates. If time allows, we plan to re-rank 100 top molecules through MD simulation (AMBER) for 100 ns followed by MM-PBSA calculations.
This comprehensive approach aims to validate an enhanced workflow that integrates cutting-edge in silico methodologies for drug discovery, focusing on efficiency, efficacy, and innovation in identifying new drug candidates.