Computational methods

Hit Identification

Method type (check all that applies)

De novo design

Deep learning

High-throughput docking

Machine learning

Physics-based

Description of your approach (min 200 and max 800 words)

First, we will use an ML protein-ligand binding predictor as the primary filter to accelerate a massive number of molecular docking calculations. To do this, we will collect docking data for the given target protein using small molecules from the Enamine Diversity Library (approximately 3.9 million compounds) as ligands. The docking programs we will use are AutoDock-GPU, Vina-GPU, rDock, and LeDock, which are all open-source programs widely used in high-throughput virtual screening (HTVS) for drug discovery. We will normalize the docking data obtained from these three programs. Using this information, we will train our docking-score prediction models that predict the docking scores only from their SMILES representations. This prediction model will allow screening with ultra-large libraries. We will re-rank the molecules using a machine-learning model that performs consensus docking based on their docking scores. The consensus docking model is trained with the DUD-E and LIT-PCBA sets. Based on the predicted docking scores and consensus docking machine, we will predict the binding likelihood for an even larger dataset, the Enamine Real Data (over 6 billion compounds), to select molecules with high predicted binding likelihoods to the specified binding site of the target protein (1st Screening results).

Next, we will perform docking calculations on the target protein for the primarily selected molecules and predict the stability of the binding state for the estimated binding poses. Based on this, we will select the top molecules and obtain the primary hit candidate group (2nd Screening results). At this point, we will use a Rescoring model to perform score scaling to predict the optimal ligand docking pose. The rescoring models we will use are RTM-score, RF-score-VS, and AK-score v2. We will standardize and rank the results from these three rescoring methods, then select the molecules with the highest average rank.

Using the primary hit candidates as a basis, we will generate novel hit candidates using a molecular generative model. The molecular generative models we will use are MolFinder and MolGAN. For the obtained novel hit candidates, we will perform docking calculations using the three previously mentioned docking programs (AutoDock-GPU, Vina-GPU, LeDock) to select the top molecules as hit candidates. During this selection, we will prioritize overlapping molecules from the top candidates of each docking program (2nd Hit candidates). We will then combine the two groups of hit candidate materials and analyze the binding stability of each molecule to identify the final hit candidates.

During the merging process, we will use RetroTRAE (Retrosynthetic translation of atomic environments with Transformer) and GASA (Graph Attention-based assessment of Synthetic Accessibility, YU, Jiahui, et al., Journal of Chemical Information and Modeling, 2022, 62, 2973-2986.) methods to evaluate the synthetic accessibility of the hit candidate materials. These models are deep learning-based artificial intelligence models that predict the Synthetic Accessibility Score (SAS), an indicator used in the drug discovery field to assess the ease or difficulty of compound synthesis. We believe that the hit candidate identified through this entire process will have the highest practical feasibility.

What makes your approach stand out from the community? (<100 words)

First, our model uses docking-score prediction machines that predicts the docking scores of various docking programs only from the SMILES strings. This will allow us to perform ultra-large screening within a relatively short timeframe.

Second, we will use a consensus-based re-scoring scheme using multiple docking tools. This scheme will effectively expand the search space of the docking tools.

Method Name

SNU-Dock

Free software packages used

rDOCK

Autodock-GPU

vina-GPU

LeDock

Relevant publications of previous uses by your group of this software/method

Choi, J., & Lee, J.* V-Dock: Fast Generation of Novel Drug-Like Molecules Using Machine-Learning-Based Docking Score and Molecular Optimization. Int. J. Mol. Sci, (2021) 22, 11635. https://doi.org/10.3390/ijms222111635

Kwon, Y., Shin, W. H., Ko, J. & Lee, J.* AK-score: Accurate protein-ligand binding affinity prediction using an ensemble of 3D-convolutional neural networks. Int. J. Mol. Sci. 21, 1–16 (2020)

Challenge #4