Computational methods

Hit Identification

Method type (check all that applies)

Deep learning

Description of your approach (min 200 and max 800 words)

Our approach is an improved version based on our recently published work, TankBind.

In TankBind, the protein and compound are initially encoded with the state-of-the-art graph neural network, and the interaction between them is modeled by a trigonometry-aware neural network that can learn the many-body effect and the constraints imposed by Euclidean geometry. We designed the learning objective such that the model could simultaneously predict the protein-ligand binding structure and binding affinity.

In contrast to traditional machine learning approach that can only predict the binding affinity, our method could explicitly predict the corresponding protein-ligand conformations. In contrast to docking-based method, our method is significantly faster and has smoother energy landscape because the protein-ligand interaction evolves in the hidden space instead of real 3D space.

We improved the original TankBind in two aspects. First, we curated a significantly larger training dataset by combining SAR database, such as ChEMBL and ExCAPE with PDB structures. Second, we improved the modeling to take advantage of the rich vector direction information by using an equivariant module.

Our previous experiments show that our model could achieve higher accuracy for a specific protein family of interest when fine-tuned with data related to this family. Therefore, our model is further fine-tuned with all the structures of CBLB TKB domain co-crystalized structures and all the binding affinity data associated with protein domain.

Our model could screen 20M compounds per GPU per day. Therefore, we could easily screen the entire Enamine database with the computational resource available to us.

After the screening, we will filter the result by only preserving compounds that predicted to bind to the desired binding site and having new scaffold. We will also manually check a few predicted candidate compounds using MD simulations.

What makes your approach stand out from the community? (<100 words)

Our model, TankBind, is the first neural network model that could simultaneously predict the protein-ligand binding structure and binding affinity. In addition, we have improved the framework and trained with a much larger dataset.

Method Name

TankBind

Free software packages used

P2Rank.

RDKit

Relevant publications of previous uses by your group of this software/method

Lu, Wei, et al. "Tankbind: Trigonometry-aware neural networks for drug-protein binding structure prediction." NeurIPS 2022 spotlight.

Lu, Wei, Nicholas P. Schafer, and Peter G. Wolynes. "Energy landscape underlying spontaneous insertion and folding of an alpha-helical transmembrane protein into a bilayer." Nature communications 9.1 (2018): 4949.

Lu, Wei, et al. "OpenAWSEM with Open3SPN2: A fast, flexible, and accessible framework for large-scale coarse-grained biomolecular simulations." PLoS computational biology 17.2 (2021): e1008308.

Challenge #4