1 protein structure determination
To address the limitation of conventional AlphaFold2 predictions, which tend to favor inactive states [1], we utilize an in-house multi-state prediction protocol that allows AlphaFold2 to generate more diverse structures. This protocol integrates active-state-annotated templates based on sequence similarity and modifies the multiple sequence alignment (MSA) input features [2]. This approach is complemented by the GPCR-I-TASSER template-based modeling pipeline, resulting in a collection of diverse MCHR1 structures.
To determine the optimal MCHR1 structure from the predictions made by the modified AlphaFold2 and GPCR-I-TASSER, we use RosettaLigandEnsemble [9] and leverage ligand information from MCHR1 ChEMBL and patent compounds. Specifically, we focus on the interactions between charged functional groups of ligands and the aminergic MCHR1 receptor [3]. This interaction data, combined with docking scores, serves as the criteria for ranking the predicted structures.
The highest-ranked model is subsequently utilized for virtual screening.
2 Hit identification
Step 1: Identify novel compounds.
For the hit identification phase of computational drug discovery, we employ Rosetta ligand docking to navigate through the Enamine 31 billion make-on-demand library [4]. This approach leverages evolutionary algorithms in conjunction with Rosetta docking to identify novel scaffolds.
These novel scaffolds, derived from the previous docking results, are then integrated into our in-house Fragment-Based Drug Discovery (FBDD) pipeline. This pipeline segments compounds from the MCHR1 ChEMBL and patent databases into smaller fragments. These fragments are cataloged into a library and subsequently merged with the novel scaffolds to generate new compounds. This process enables the creation of new molecular entities by reassembling these fragments.
Step 2: Prioritize top candidates.
The screening of these newly synthesizable compounds, originating from both the novel scaffolds and FBDD pipelines, is conducted using DiffDock [5]. Following this screening, PoseBusters is employed for plausibility checks, discarding compounds with unrealistic conformations or binding poses [6].
To prioritize these compounds, we leverage data from MCHR1 ChEMBL and patent compounds to refine our pre-trained Quantitative Structure-Activity Relationship (QSAR) model. This model incorporates AttentiveFP architecture, a cutting-edge approach based on graph convolutional neural networks enhanced with an attention mechanism, designed to accurately predict compound activity [7].
Compounds are evaluated based on a comprehensive score that includes their QSAR model predictions, binding pose score (from PoseBusters), and confidence score (from DiffDock). The compounds with the highest scores are selected as promising hits for further exploration.
Additionally, we evaluate the cardiotoxic potential of these compounds by assessing their propensity to inhibit the hERG channel. Compounds demonstrating favorable hERG binding are excluded from further consideration, ensuring a focus on candidates with a favorable safety profile [8].
We will also include experienced medicinal chemists in the overall pipeline to monitor the selected molecule quality.