Computational methods

Hit Identification

Method type (check all that applies)

High-throughput docking

Machine learning

Physics-based

Description of your approach (min 200 and max 800 words)

The approach will be a combination of Machine learning, Molecular docking, and Molecular Dynamics simulation.

Stage 1

We will perform the Molecular dynamics simulation with the AlphaFold homology model of MCHR1. We will use the assembly of conformations to proceed with assemble docking.

Stage 2

High-throughput molecular docking of Enamine xREAL Space (176 B molecules) will be performed using V-SYNTHES in combination with active learning, V-SYNTHES approach was published in Nature 601, 452–459 (2022). It first identifies the best scaffold–synthon combinations as seeds suitable for further growth, and then iteratively elaborates these seeds to select complete molecules with the best docking scores. This hierarchical combinatorial method enables rapid detection of the best-scoring compounds in the ultra-large chemical space while performing molecular docking of a fraction (<0.1%) of the compounds. The active learning technique will be used to enable the exploration of a larger space of full molecules with molecular docking to enrich the final selection with top-scoring compounds.

The V-SYNTHES approach in combination with active learning will be performed in a few steps:

We generate a library of fragment-like compounds representing all possible scaffold–synthon combinations for all reactions in the whole Enamine xREAL Space (176 Billion molecules), which is referred to as a minimal enumeration Library (MEL).
The MEL compounds are docked onto the target receptor using energy-based docking of the flexible ligand. About 100 thousand of the top-scoring compounds will be used to apply the proprietary technology CapSelect. The technology will allow us to identify the preferable fragments for future growth into final molecules. The last step would be to filter the fragments for diversity using different criteria (i.e. a single reaction cannot contribute more than 20% of the selection).
The iterative enumeration of the selected fragments from step 2.
The docking screen on the final enumerated subset of the library. A subset of randomly selected molecules will be initially docked using ICM-Pro by MolSoft. The docking results will be used as the training set for the machine learning model. The Directed Message Passing Neural Network will be used as the model architecture due to its great performance on molecular properties prediction tasks. The trained model will score the rest of the enumerated set to pick the next subset of compounds for docking. The new model will be trained on all the docked compounds combined. This process will be repeated iteratively until most of the top-scoring compounds from the enumerated set are recalled, estimating docking up to 1-2% of the dataset.

Stage 3

A 3D pharmacophore model will be built using the 3D coordinates of the known ligands. The 3D coordinates will be obtained by molecular docking followed by Molecular dynamic simulations using the grid developed in Stage 2. The pharmacophore screening will be enabled utilizing the RIDE algorithm by MolSoft. RIDE is a fast 3D molecular similarity search method based on Atomic Property Fields (APF). APF is a grid 3D pharmacophore potential that is generated from one or more high-affinity scaffolds with seven properties assigned from empiric physico-chemical components. These properties include hydrogen bond donors, acceptors, Sp2 hybridization, lipophilicity, size, electropositive/negative, and charge. The Pharmacophore screen for the REAL Database (6B molecules) will be enabled.

Stage 4

For the final selection, we will select the top-scoring compounds from both methods and perform their extensive analysis. In case of significant overlap, the compounds for testing will be selected from the pool of compounds that have the highest scores in both. If the methods prioritize different features resulting in little overlap, the two experiments will be treated independently for the final compound selection. For the resulting list of compounds, we will run MMPBSA calculations. 10 ns of MD trajectory will be obtained for each complex. During the run protein’s Cα-atoms and ligand’s heavy atoms will be constrained to prevent possible fluctuations in the binding site. ΔG will be calculated using gmx_MMPBSA software. The solvation model will use PB, Entropic component will be calculated using the Interaction Entropy method. Based on the ΔG we will prioritize the clusters to select the molecules from.

What makes your approach stand out from the community? (<100 words)

V-SYNTHES requires thousands of times fewer computational resources than standard VLS without compromising docking accuracy at any step. It was tested on Cannabinoid CB1/CB2, Kinase ROCK1, Angiotensin AT2, and Fungal Bromodomain and demonstrated a high hit rate, great potency, and affinity of the hits. Combining V-SYNTHES with active learning will enable the exploration of a larger space of full molecules with molecular docking to enrich the final selection with top-scoring compounds.

A combination Structure-based approach and ligand-based (pharmacophore screening) will allow us to find the potential hits with more confidence and accuracy. Running MMPBSA calculations will help to prioritize molecules for testing.

Method Name

The complex approach utilizing the structure-based and ligand-based strategies powered by ML

Commercial software packages used

ICM-Pro is provided by MolSoft.

Free software packages used

RDKit, KNIME

Relevant publications of previous uses by your group of this software/method

Synthon-based ligand discovery in virtual libraries of over 11 billion compounds. Arman A. Sadybekov, Anastasiia V. Sadybekov, Yongfeng Liu, Christos Iliopoulos-Tsoutsouvas, Xi-Ping Huang, Julie Pickett, Blake Houser, Nilkanth Patel, Ngan K. Tran, Fei Tong, Nikolai Zvonok, Manish K. Jain, Olena Savych, Dmytro S. Radchenko, Spyros P. Nikas, Nicos A. Petasis, Yurii S. Moroz, Bryan L. Roth, Alexandros Makriyannis & Vsevolod Katritch. Nature 601, 452–459 (2022).

Efficient Exploration of Chemical Space with Docking and Deep Learning. Ying Yang, Kun Yao, Matthew P. Repasky, Karl Leswing, Robert Abel, Brian K. Shoichet, and Steven V. Jerome. Journal of Chemical Theory and Computation 2021 17 (11), 7106-7119 DOI: 10.1021/acs.jctc.1c00810

Regression-Based Active Learning for Accessible Acceleration of Ultra-Large Library Docking. Egor Marin, Margarita Kovaleva, Maria Kadukova, Khalid Mustafin, Polina Khorn, Andrey Rogachev, Alexey Mishin, Albert Guskov, and Valentin Borshchevskiy. Journal of Chemical Information and Modeling Article ASAP. DOI: 10.1021/acs.jcim.3c01661

gmx_MMPBSA: A New Tool to Perform End-State Free Energy Calculations with GROMACS. Mario S. Valdés-Tresanco, Mario E. Valdés-Tresanco, Pedro A. Valiente, and Ernesto Moreno. Journal of Chemical Theory and Computation 2021 17 (10), 6281-6291. DOI: 10.1021/acs.jctc.1c00645

Challenge #5