Computational methods

Hit Identification

Method type (check all that applies)

De novo design

Deep learning

High-throughput docking

Machine learning

Physics-based

Description of your approach (min 200 and max 800 words)

We will employ a structure-guided drug discovery approach based on a unique molecular generative model (SAGE) recently developed in our lab. This generative model is specifically tuned to produce Enamine REAL Space ligands targeting the 3D structure of an input protein binding pocket. Compared to traditional virtual screening, this approach offers unparalleled speed, enabling us to rapidly sample from the entire enamine library space (30 billion compounds) in a structure-guided way. With this increased efficiency, we can generate ligands using multiple receptor conformations (rather than using a single docking grid as typically used in virtual screening).

Our approach consists of three steps: (1) structural modeling of the inactive receptor state (2) using our generative model to sample enamine ligands for each structure (3) scoring and ranking of generated ligands utilizing known actives (ComBind scoring).

We will generate multiple structural models of the MCHR1 receptor using two distinct approaches: AlphaFold 2 and homology modeling. For homology modeling, we will employ Schrodinger’s Prime software, utilizing inactive state structures from other GPCRs with the highest sequence similarity to MCHR1 as templates. Additionally, we will conduct restrained molecular dynamics (MD) simulations (Amber) to further expand our set of conformational models. Subsequently, the conformations obtained from these diverse methods will be clustered using the PENSA (Python Ensemble Analysis) library to yield approximately 10 unique receptor conformations, all in the inactive state.

Next, we will employ our generative model to generate 10,000 enamine ligands for each input receptor structure (100,000 molecules generated total). This model utilizes geometric deep learning, incorporating the atomic coordinates of the receptor pocket to directly position and join enamine building block fragments. The model takes as input a receptor pocket, with our selection encompassing the standard GPCR orthosteric site within the receptor core. We will ensure a variety of ligand sizes are sampled, with thresholds guided by known active ligands. We will remove any molecules that share a scaffold with known active ligands.

Finally, we will use a unique scoring approach, ComBind, to rank our generated ligands. ComBind utilizes a list of other ligands that are known to bind the same target (“helper ligands”); these ligands are docked in addition to the ligands of interest ("query ligand"). The scoring function rewards similar interactions (e.g. hydrogen bonds / salt-bridges) between the helper and query ligands. This interaction similarity score is combined with the standard docking score (Schrodinger’s Glide) produce a final ranking of the compounds. We will cluster the top 2000 molecules by chemical similarity (3D ECFP fingerprint) and greedily select the top 100 representative ligands from these clusters.

What makes your approach stand out from the community? (<100 words)

Our approach distinguishes itself through the incorporation of a newly developed molecular generative AI model, facilitating swift exploration of the expansive Enamine library space. By leveraging structure-guided techniques, we can produce unique scaffolds distinct from existing ligands. Moreover, the efficiency of our approach enables us to explore multiple receptor conformations, potentially enhancing our chances of success, especially given the inherent uncertainty in individual structure predictions. Additionally, our distinctive scoring approach integrates information from other known actives (similar interactions) without overfitting.

Method Name

SAGE

Commercial software packages used

Schrodinger's Prime, Glide, Maestro

Free software packages used

AlphaFold 2, Amber (MD simulations), MDAnalysis, PENSA, e3fp

Relevant publications of previous uses by your group of this software/method

Paggi, Joseph M., et al. "Leveraging nonstructural data to predict structures and affinities of protein–ligand complexes." Proceedings of the National Academy of Sciences 118.51 (2021): e2112621118.

Vögele, Martin, et al. "Systematic analysis of biomolecular conformational ensembles with PENSA." arXiv preprint arXiv:2212.02714 (2022).

Challenge #5