To identify hit molecules for the macrodomain of SARS-CoV-2 Nsp3, we will use the V-Dock approach developed by our group. The V-dock approach uses deep learning models that predict the protein-ligand docking scores from SMILES strings using the docking results of a subset of the whole library instead of directly docking all ligands. We have already shown that protein-ligand docking scores can be accurately predicted from the SMILES representations. Thus, using the trained models makes ultra-fast virtual screening of a large virtual library possible. The Enamine REAL DB, with ~ 2B ligands provided by the organizers, will be used for all the following procedures. First, a random subset of 1 M ligands will be docked to the target binding site using three docking software, Autodock-vina, Autodock-GPU, and Glide. Based on the obtained docking results, the models that predict the docking scores from a SMILES string will be trained using multi-layer perceptron and graph-neural network architectures. This will enable ultra-fast virtual screening without actual docking calculations. Using the three docking score prediction models, the entire Enamine REAL DB will be screened. As a result, we will obtain the lists of top candidates estimated with the predicted Autodock-vina, Autodock-GPU, and Glide scores. Afterward, we will collect the union of the top 100,000 molecules from the three lists. The resulting merged top candidates will be considered the first candidate set and passed for actual docking and re-scoring.
The first candidates will be docked using Autodock-GPU. The resulting docked conformations will be re-scored using the two machine-learning-based scoring functions: AK-score-v2 (in preparation) and RTMscore (10.1021/acs.jmedchem.2c00991). AK-score-v2 is the second version of our in-house protein-ligand binding affinity prediction model. The score consists of two models: classification and regression model. The classification model predicts if a given protein-ligand complex structure is native-like, the validity of the structure. The regression model predicts the binding affinity of a given complex structure. Recently, we identified that re-scoring using AK-score-v2 improves the enrichment factor of top 1% by 10~20 times than using Autodock-vina score tested with the core-set of PDBBind. RTMscore is another graph neural network-based scoring scheme that predicts whether a given protein-ligand pair will bind based on a docked complex structure. Its screening power is one of the highest among the published scoring functions. The first candidates will be re-scored using AK-score-v2 and RTMscore. Finally, the top 100 molecules will be selected based on the average ranking of ligands using AK-score-v2 and RTMscore and will be submitted.
For hit optimization, we will fine-tune our AK-score-v2 model through transfer learning. Starting from the current AK-score-v2 parameters, the regression model parameters will be further updated using the experimental data provided by the organizers. This model will give us a target-specific scoring function. Additional 50 compounds will be further selected with this updated scoring function.