We will build on an analysis pipeline we have developed that is capable of searching billion-scale small-molecule libraries for binding candidates to a target pocket. In the first phase of the pipeline, we will perform a fast (and approximate) affinity prediction using a strategy based on graph neural networks (GNNs). We have developed GNNs that compute representations of both ligand and protein pocket based on a diverse collection of surface properties. The representations produced by these models are combined using a multi-layer perceptron that performs docking-free integration to produce rapid predictions of binding affinity for each of billions of ligands to the target pocket. Fine-tuning training of this integration model will be performed based on ADP-bound pockets in pdbbind. This process will produce a down-sampled data set that is enriched for molecules with good binding affinity.
The ligands in this down-sampled library (~1M ligand candidates) will be docked to the target pocket using AutoDock Vina, and candidate poses will be subjected to a rescoring neural network that predicts affinity based on a variant of the above integration network that incorporates pose-informed interaction features. The top 1000 candidates from this analysis phase will be subjected to coarse-grained MD (CGMD, with the MARTINI package), with the top 100 candidates selected for submission.