Computational methods

Hit Identification

Method type (check all that applies)

De novo design

Deep learning

Free energy perturbation

High-throughput docking

Machine learning

Physics-based

Description of your approach (min 200 and max 800 words)

1 protein structure determination

To address the limitation of conventional AlphaFold2 predictions, which tend to favor inactive states [1], we utilize an in-house multi-state prediction protocol that allows AlphaFold2 to generate more diverse structures. This protocol integrates active-state-annotated templates based on sequence similarity and modifies the multiple sequence alignment (MSA) input features [2]. This approach is complemented by the GPCR-I-TASSER template-based modeling pipeline, resulting in a collection of diverse MCHR1 structures.

To determine the optimal MCHR1 structure from the predictions made by the modified AlphaFold2 and GPCR-I-TASSER, we use RosettaLigandEnsemble [9] and leverage ligand information from MCHR1 ChEMBL and patent compounds. Specifically, we focus on the interactions between charged functional groups of ligands and the aminergic MCHR1 receptor [3]. This interaction data, combined with docking scores, serves as the criteria for ranking the predicted structures.

The highest-ranked model is subsequently utilized for virtual screening.

2 Hit identification

Step 1: Identify novel compounds.

For the hit identification phase of computational drug discovery, we employ Rosetta ligand docking to navigate through the Enamine 31 billion make-on-demand library [4]. This approach leverages evolutionary algorithms in conjunction with Rosetta docking to identify novel scaffolds.

These novel scaffolds, derived from the previous docking results, are then integrated into our in-house Fragment-Based Drug Discovery (FBDD) pipeline. This pipeline segments compounds from the MCHR1 ChEMBL and patent databases into smaller fragments. These fragments are cataloged into a library and subsequently merged with the novel scaffolds to generate new compounds. This process enables the creation of new molecular entities by reassembling these fragments.

Step 2: Prioritize top candidates.

The screening of these newly synthesizable compounds, originating from both the novel scaffolds and FBDD pipelines, is conducted using DiffDock [5]. Following this screening, PoseBusters is employed for plausibility checks, discarding compounds with unrealistic conformations or binding poses [6].

To prioritize these compounds, we leverage data from MCHR1 ChEMBL and patent compounds to refine our pre-trained Quantitative Structure-Activity Relationship (QSAR) model. This model incorporates AttentiveFP architecture, a cutting-edge approach based on graph convolutional neural networks enhanced with an attention mechanism, designed to accurately predict compound activity [7].

Compounds are evaluated based on a comprehensive score that includes their QSAR model predictions, binding pose score (from PoseBusters), and confidence score (from DiffDock). The compounds with the highest scores are selected as promising hits for further exploration.

Additionally, we evaluate the cardiotoxic potential of these compounds by assessing their propensity to inhibit the hERG channel. Compounds demonstrating favorable hERG binding are excluded from further consideration, ensuring a focus on candidates with a favorable safety profile [8].

We will also include experienced medicinal chemists in the overall pipeline to monitor the selected molecule quality.

What makes your approach stand out from the community? (<100 words)

Our approach uses state-of-the-art machine learning - it integrates a novel deep learning architecture with physics-based methods. We believe that the main drive for the advancement of the field will be given by developing new deep learning architectures that’s why we created our in-house multi-state prediction protocol. In addition, we will use a diffusion generative model that has higher precision than traditional docking methods to predict the best binding structures.

Method Name

Deep learning, diffusion generative model, molecular dynamics, free energy methods

Free software packages used

GPCR-I-TASSER, RosettaLigandEnsemble, DiffDock, PoseBusters

Relevant publications of previous uses by your group of this software/method

[1] Heo, Lim, and Michael Feig. "Multi‐state modeling of G‐protein coupled receptors at experimental accuracy." Proteins: Structure, Function, and Bioinformatics 90.11 (2022): 1873-1885.

[2] Del Alamo, Diego, et al. "Sampling alternative conformational states of transporters and receptors with AlphaFold2." Elife 11 (2022): e75751.

[3] Schaller, David, et al. "Ligand-guided homology modeling drives identification of novel histamine H3 receptor ligands." PLoS One 14.6 (2019): e0218820.

[4] https://github.com/RosettaCommons

[5] Corso, Gabriele, et al. "Diffdock: Diffusion steps, twists, and turns for molecular docking." arXiv preprint arXiv:2210.01776 (2022).

[6] Buttenschoen, Martin, Garrett M. Morris, and Charlotte M. Deane. "PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences." Chemical Science (2024).

[7] Xiong, Zhaoping, et al. "Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism." Journal of medicinal chemistry 63.16 (2019): 8749-8760.

[8] Lim, Gyutae, et al. "Identification and new indication of melanin-concentrating hormone receptor 1 (MCHR1) antagonist derived from machine learning and transcriptome-based drug repositioning approaches." International Journal of Molecular Sciences 23.7 (2022): 3807.

[9] Fu, Darwin Yu, and Jens Meiler. "RosettaLigandEnsemble: A small-molecule ensemble-driven docking approach." ACS omega 3.4 (2018): 3655-3664.

Challenge #5