Computational methods

Hit Identification

Method type (check all that applies)

Deep learning

High-throughput docking

Machine learning

Physics-based

Description of your approach (min 200 and max 800 words)

In this CACHE challenge, we will prioritize implementing Active Learning (AL) into our workflows. Historically, we, and our software, have been limited by smaller compound libraries due to the compute-intensive framework of our virtual high throughput screening. In previous CACHE challenges, we have also identified a liability in ultimately considering smaller libraries. AL will enable the broader exploration of chemical space which we will then refine using our more traditional, higher-resolution methods. Recent CACHE findings also highlight the impact of deep learning-based scoring methods on hit rates, prompting us to incorporate a consensus scoring approach into our hit selection criteria.

Designing selective MCHR1 antagonists is especially challenging due to its similarity to the hERG channel, whose inhibition leads to cardiotoxicity; several failed clinical trials have been reported1–3. We will first explore the failures and successes of different research teams4 to help us design selective MCHR1 antagonists5.

Next, we will source MCHR1 structures from AlphaFold and GPCRdb focusing on multistate models; there are no crystal structures of MCHR1 available to-date. We will give preference to predicted structures that meet a confidence threshold: pLDDT > 706.

Since our predicted structures will be in the apo form, we will optimize the selected structure(s) by running molecular dynamics (MD) simulations after docking a representative MCHR1 antagonist into the previously proposed binding site5. In our trajectory analysis, and subsequent protein model selection, we will verify that original binding site residues (if any) with low confidence (pLDDT score between 50-70) are conformationally sound.

We will retrospectively benchmark our structural models and scoring methods (see below) with the datasets of known MCHR1 antagonists provided by the CACHE team, ensuring structure and activity diversity using either a clustering approach or careful selection. We will dock this benchmark dataset with FITTED7 using the MD optimized structures and query per-residue interaction energies for trends. We will also evaluate various scoring methods, including our own Graph Neural Network (GNN)-based scoring8, GNINA rescoring (Convolutional Neural Network (CNN)-based scoring)9, and binding free energy using an MM-GBSA method10. We will evaluate the results (AUROC, enrichment factors) to identify optimal settings, iterating if necessary. This analysis will allow us to investigate the effectiveness of consensus scoring and how individual scoring methods differentiate actives and inactives. Insights from this study will guide our prospective compounds selection.

In the second step, we will use our in-house deployed active learning (AL)-driven approach to screen the Enamine Real library. AL models iteratively predict, select, and dock compounds, build machine learning models, and refine the model predictions with each cycle to efficiently identify high-potential candidates with minimal computational effort. Throughout the AL process, we will ultimately dock only 0.01 – 0.5% of the selected virtual library11,12. Our goal is to then identify and select the top 100,000 compounds with the highest predicted docking scores for higher-resolution ranking.

After AL screening, we will generate one (or more) pharmacophore model(s) based on the insights from the retrospective study, by selecting diverse MCHR1 antagonists (preferably without hERG alerts)5. Pharmacophore-based ranking considers the 3D-shape and pharmacophoric features (acceptor, donor, and hydrophobic, so on) of known antagonists, whereas the AL-approach predicts only the global docking score based on the fingerprint similarity. Generated pharmacophore model(s) will be used to screen AL-derived 100k hits and re-rank the hits using RMSD to the reference antagonists. If the number of promising hits is low, we will select the next 100k hits from the AL screening. We envision docking at most 25,000 compounds using FITTED against the MD optimized structure.

In the third and final step, we will compute scores for the docked poses using top-performing approaches (consensus) from the benchmarking study. During the visual analysis, we will carefully examine the binding poses, key interactions, and consensus scoring to determine the final list of 150 compounds.

References:

(1) Lim, G.; You, K. Y.; Lee, J. H.; Jeon, M. K.; Lee, B. H.; Ryu, J. Y.; Oh, K.-S. Identification and New Indication of Melanin-Concentrating Hormone Receptor 1 (MCHR1) Antagonist Derived from Machine Learning and Transcriptome-Based Drug Repositioning Approaches. Int. J. Mol. Sci. 2022, 23 (7), 3807. https://doi.org/10.3390/ijms23073807. 

(2) Kowalski, T. J.; Sasikumar, T. Melanin-Concentrating Hormone Receptor-1 Antagonists as Antiobesity Therapeutics: Current Status. BioDrugs Clin. Immunother. Biopharm. Gene Ther. 2007, 21 (5), 311–321. https://doi.org/10.2165/00063030-200721050-00003. 

(3) Johansson, A. Evolution of Physicochemical Properties of Melanin Concentrating Hormone Receptor 1 (MCHr1) Antagonists. Bioorg. Med. Chem. Lett. 2016, 26 (19), 4559–4564. https://doi.org/10.1016/j.bmcl.2016.08.072. 

(4) Johansson, A.; Löfberg, C. Novel MCH1 Receptor Antagonists: A Patent Review. Expert Opin. Ther. Pat. 2015, 25 (2), 193–207. https://doi.org/10.1517/13543776.2014.993382. 

(5) Igawa, H.; Takahashi, M.; Kakegawa, K.; Kina, A.; Ikoma, M.; Aida, J.; Yasuma, T.; Kawata, Y.; Ashina, S.; Yamamoto, S.; Kundu, M.; Khamrai, U.; Hirabayashi, H.; Nakayama, M.; Nagisa, Y.; Kasai, S.; Maekawa, T. Melanin-Concentrating Hormone Receptor 1 Antagonists Lacking an Aliphatic Amine: Synthesis and Structure-Activity Relationships of Novel 1-(Imidazo[1,2-a]Pyridin-6-Yl)Pyridin-2(1H)-One Derivatives. J. Med. Chem. 2016, 59 (3), 1116–1139. https://doi.org/10.1021/acs.jmedchem.5b01704. 

(6) Mariani, V.; Biasini, M.; Barbato, A.; Schwede, T. lDDT: A Local Superposition-Free Score for Comparing Protein Structures and Models Using Distance Difference Tests. Bioinformatics 2013, 29 (21), 2722–2728. https://doi.org/10.1093/bioinformatics/btt473. 

(7) Moitessier, N.; Pottel, J.; Therrien, E.; Englebienne, P.; Liu, Z.; Tomberg, A.; Corbeil, C. R. Medicinal Chemistry Projects Requiring Imaginative Structure-Based Drug Design Methods. Acc. Chem. Res. 2016, 49 (9), 1646–1657. https://doi.org/10.1021/acs.accounts.6b00185. 

(8) Burai-Patrascu, M.; Nivedha, A. K.; Rostaing, O.; Chukka, P.; Moitessier, N.; Pottel, J. The First CACHE Challenge–Identifying Binders of the WD-Repeat Domain of Leucine-Rich Repeat Kinase 2. ChemRxiv 2022.

(9) Andrew T. McNutt. GNINA 1.0: Molecular Docking with Deep Learning. J. Cheminformatics Vol. 13 Artic. Number 43 2021 2021, 13, 43.

(10) Yang, M.; Bo, Z.; Xu, T.; Xu, B.; Wang, D.; Zheng, H. Uni-GBSA: An Open-Source and Web-Based Automatic Workflow to Perform MM/GB(PB)SA Calculations for Virtual Screening. Brief. Bioinform. 2023, 24 (4), bbad218. https://doi.org/10.1093/bib/bbad218. 

(11) Yang, Y.; Yao, K.; Repasky, M. P.; Leswing, K.; Abel, R.; Shoichet, B. K.; Jerome, S. V. Efficient Exploration of Chemical Space with Docking and Deep Learning. J. Chem. Theory Comput. 2021, 17 (11), 7106–7119. https://doi.org/10.1021/acs.jctc.1c00810. 

(12) Graff, D. E.; Shakhnovich, E. I.; Coley, C. W. Accelerating High-Throughput Virtual Screening through Molecular Pool-Based Active Learning. Chem. Sci. 12 (22), 7866–7881. https://doi.org/10.1039/d0sc06805e. 

What makes your approach stand out from the community? (<100 words)

In this work, we proposed active learning and pharmacophore-driven methods for ultra-library screening to efficiently and cost-effectively explore chemical space. Additionally, we are implementing consensus scoring to potentially better rank virtual screening predictions by overcoming the challenges of single scoring systems: (a) reducing high false positive rate; (b) increasing enrichment factor substantially1,2. We have encountered the potential benefits of multifaceted scoring throughout earlier CACHE challenges as well. Overall, our approach further stands out with our commitment to publish our methodology and learnings as we've done previously3,4. Our aim is to improve the practices and outcomes of computationally driven research programs across our community.

References:

1.Oda A, Tsuchida K, Takakura T. Comparison of consensus scoring strategies for evaluating computational models of protein-ligand complexes. J Chem Inf Model. 2006;46(1):380-391.

2. Perez C, Y, Sotomayor BS, Jimenes VK, Gonzalez RM, Cruz MM, Armijos JV, Cordeiro, MNDS. CompScore: Boosting Structure-Based Virtual Screening Performance by Incorporating Docking Scoring Function Components into Consensus Scoring. J. Chem. Inf. Model, 59(9), 3655–3666.

3. Nivedha, A.K., Burai-Patrascu, M., Rostaing, O., Chukka, P., Singh, B., Janezic, M., Moitessier, A., Moitessier, N. and Pottel, J., 2023. The Second CACHE Challenge-Targeting the RNA-Binding Pocket of the SARS-CoV2 Nonstructural Protein 13 via a consensus-scoring method and FITTED templated docking.

4. BVS SK, Rostaing O, Burai-Patrascu M, Janezic M, Moitessier A, Pottel J, Moitessier N. The Third CACHE Challenge–Finding Ligands Targeting the Macrodomain of SARS-CoV-2 NSP3 Using AI-inspired and Knowledge-Based Approaches.

Method Name

Multi-stage hit identification workflow (Includes Active Learning and Pharmacophore modelling)

Commercial software packages used

Forecaster Suite, In-house developed packages (Active learning, and Pharmacophore modelling)

Free software packages used

RDkit, Deepchem, TensorFlow, PyTorch, GROMACS

Relevant publications of previous uses by your group of this software/method

Docking:

Labarre, A., Stille, J., Burai-Patrascu, M., Martins, A., Pottel, J., Moitessier, N. Docking Ligands into Flexible and Solvated Macromolecules. 8. Forming New Bonds – Challenges and Opportunities. Journal of Chemical Information and Modeling (2022), 62, 1061-1077.

Active Learning: in-house deployed tool/models (Similar approach described as listed below):

Yang, Y.; Yao, K.; Repasky, M. P.; Leswing, K.; Abel, R.; Shoichet, B. K.; Jerome, S. V. Efficient Exploration of Chemical Space with Docking and Deep Learning. J. Chem. Theory Comput. 2021, 17 (11), 7106–7119. https://doi.org/10.1021/acs.jctc.1c00810. 

Graff, D. E.; Shakhnovich, E. I.; Coley, C. W. Accelerating High-Throughput Virtual Screening through Molecular Pool-Based Active Learning. Chem. Sci. 12 (22), 7866–7881. https://doi.org/10.1039/d0sc06805e. 

Challenge #5