STEP 1: Protein Structure Prediction
The first part of the solution is predicting the protein structure using flow-matching methods to generate protein ensemble prediction. Molecular Dynamics Simulation will be performed on the predicted structure to ensure stability. Binding sites will be identified using our in-house binding site prediction algorithm based on few geometric deep-learning methods.
STEP 2: Training/ Finetuning different AI/ML models
This phase consists of training 3 different models.
(i) A model to predict the docking score: This model will be trained using an active learning approach with the data that would be created by docking a part of the Enamine diverse library to the predicted MCHR1 protein structure (using autodock vina).
(ii) A QSAR Model to predict the pIC50: This model will be trained using ML/DL methods on the given Chembl data and validated with the given patent data.
(iii) A Binding affinity prediction model: We shall finetune our in-house binding affinity prediction model (that predicts affinity with any given protein seq & ligand pair) to MCHR1 with the given data.
STEP 3: Virtually screening the Enamine REAL Space
We shall then screen the Enamine REAL Space consisting of 6.75B molecules separately with all three AI/ML models trained in Step 2 in our small molecule discovery platform BoltChem. The top molecules beyond a specific threshold would be selected from each flow. They shall be merged and considered further.
STEP 4: Pharmacophore-based screening
The top diverse molecules from the given data would be used to construct a ligand-based pharmacophore model. With the predicted pharmacophore feature information, we shall map the hypothesis to the residues in the protein structure. The selected molecules (from step 3) shall now be screened using the generated pharmacophore hypothesis.
STEP 5: Docking (Optional)
The resultant molecules would be docked with the predicted protein structure and the predicted binding site (verified with the pharmacophore hypothesis) using FABind or DiffDock (MIT License). The docked molecules will be filtered based on the protein-ligand contacts including the pharmacophore features. The shortlisted molecules shall then be docked with AutoDock Vina and ranked according to the docking score.
STEP 6: Filtering and prioritizing
The molecules with docking score <-7 shall be screened with our in-house screening workflow. This workflow includes filters like drug-likeness, synthetic accessibility, and elimination of molecules with PAINS, toxic, or undesirable substructures. The molecules are ranked based on a weighted score that summarizes the docking score, predicted pIC50, and predicted binding affinity of a molecule.
STEP 7: Molecular Dynamic Simulation
Molecular Dynamic simulation for the top 150 molecules with the predicted protein structure will be performed using OpenMM or Gromacs. The binding free energies of the complexes would be calculated using MMPB/GBSA. The top 100 molecules would be recommended for the Hit Identification stage.