Computational methods

Hit Identification

Method type (check all that applies)

Deep learning

Machine learning

Description of your approach (min 200 and max 800 words)

STEP 1: Protein Structure Prediction

The first part of the solution is predicting the protein structure using flow-matching methods to generate protein ensemble prediction. Molecular Dynamics Simulation will be performed on the predicted structure to ensure stability. Binding sites will be identified using our in-house binding site prediction algorithm based on few geometric deep-learning methods.

STEP 2: Training/ Finetuning different AI/ML models

This phase consists of training 3 different models.

(i) A model to predict the docking score: This model will be trained using an active learning approach with the data that would be created by docking a part of the Enamine diverse library to the predicted MCHR1 protein structure (using autodock vina).

(ii) A QSAR Model to predict the pIC50: This model will be trained using ML/DL methods on the given Chembl data and validated with the given patent data.

(iii) A Binding affinity prediction model: We shall finetune our in-house binding affinity prediction model (that predicts affinity with any given protein seq & ligand pair) to MCHR1 with the given data.

STEP 3: Virtually screening the Enamine REAL Space

We shall then screen the Enamine REAL Space consisting of 6.75B molecules separately with all three AI/ML models trained in Step 2 in our small molecule discovery platform BoltChem. The top molecules beyond a specific threshold would be selected from each flow. They shall be merged and considered further.

STEP 4: Pharmacophore-based screening

The top diverse molecules from the given data would be used to construct a ligand-based pharmacophore model. With the predicted pharmacophore feature information, we shall map the hypothesis to the residues in the protein structure. The selected molecules (from step 3) shall now be screened using the generated pharmacophore hypothesis.

STEP 5: Docking (Optional)

The resultant molecules would be docked with the predicted protein structure and the predicted binding site (verified with the pharmacophore hypothesis) using FABind or DiffDock (MIT License). The docked molecules will be filtered based on the protein-ligand contacts including the pharmacophore features. The shortlisted molecules shall then be docked with AutoDock Vina and ranked according to the docking score.

STEP 6: Filtering and prioritizing

The molecules with docking score <-7 shall be screened with our in-house screening workflow. This workflow includes filters like drug-likeness, synthetic accessibility, and elimination of molecules with PAINS, toxic, or undesirable substructures. The molecules are ranked based on a weighted score that summarizes the docking score, predicted pIC50, and predicted binding affinity of a molecule.

STEP 7: Molecular Dynamic Simulation

Molecular Dynamic simulation for the top 150 molecules with the predicted protein structure will be performed using OpenMM or Gromacs. The binding free energies of the complexes would be calculated using MMPB/GBSA. The top 100 molecules would be recommended for the Hit Identification stage.

What makes your approach stand out from the community? (<100 words)

This approach stands out for its holistic and systematic approach to drug discovery, integrating multiple computational techniques and leveraging advanced methodologies like AI/ML with Active learning and reinforcement learning to identify potential lead compounds with high precision and efficiency.

Method Name

Virtual Screening with QSAR, and binding affinity models with active learning

Commercial software packages used

Boltchem

Free software packages used

RDKit, FABind, AutoDock Vina, OpenMM, Diffdock, Gromacs

Relevant publications of previous uses by your group of this software/method

Challenge #5