Computational methods

Hit Identification

Method type (check all that applies)

Deep learning

Physics-based

Description of your approach (min 200 and max 800 words)

We developed multi-scale and multi-task neural networks to learn binding structures and binding affinities between compounds and proteins based on our previous works[1-3]. The model takes geometric graph representation of compounds and proteins as input. The compound was processed by a physics-driven graph neural network, integrating the geometry and momentum information into the topological structure. While the protein was processed by a multi-scale graph neural network, connecting surface to structure and sequence. Hierarchical pooling layers are introduced to narrow the whole protein information to the pocket level, where the multiple pockets of training proteins are pre-defined by top-predicted sites (including various allosteric sites) through self-developed tools[5]. This strategy can help prune the model and avoid overfitting due to oversized protein graphs. A multi-head attention mechanism with trigonometry consistency is employed to capture the interactions between a compound and pockets of a protein with extra supervision from the labels extracted from available high-quality co-crystal complex structures (PDBbind). A siamese network is coupled for predicting the binding affinity between the given pair of compound and protein/pocket, using the extracted compound features, pocket features as well as the derived interaction features. An industry-scale structure-activity relationship (SAR) dataset with compound structures and protein crystal structure/ Alphafold-predicted structure was cleaned (with AstraZeneca) for binding affinity learning[2]. The training dataset contains more than 5 million SAR data (mainly derived from CHEMBL and PubChem), including 1252 protein targets and 1.3 million unique compounds. The whole process was trained end-to-end and has been validated in the benchmark test sets. Most importantly, we will collect the preliminary bioactive data from recent patents (from NURIX), GOSTAR databases, and binders provided by the committee to fine-tune and re-evaluate the model so that the model will be CBL-B-specified.

The prediction process will focus on the TKB domain of CBL-B (especially the key binding residues revealed in 8GCY and NURIX poster), where we will select those compounds that our model believes bind to the key sites with high affinity. Ensemble predictions will be applied by using multiple states and crystal forms of CBL-B. Due to the efficiency of the deep learning algorithm, we can screen the whole Enamine REAL library in a very short time. Note that we will use BRICS and ECFP to exclude those molecules with similar scaffolds to the existing hit compounds.

To reduce the false positive rate, the compounds that have top-predicted affinity will be further selected according to their predicted cell-level anti-coronaviruses bioactivity (deep learning-driven QSAR model), solubility, diversity/novelty (considering there are some known binders), and customized MCS/drug-likeness filters, ensuring the quality of the hits.

What makes your approach stand out from the community? (<100 words)

1. Advanced physical-inspired deep virtual screening algorithm

2. Abundant, high-quality training data for model learning and customized CBL-B data for model finetuning

3. Our approach has multiple complementary mechanisms to prevent false-positive cases.

4. Targeted ensemble and screening strategies for TKB domain of CBL-B.

Method Name

DynamicBind

Commercial software packages used

Free software packages used

Python, Torch, RDKIT, Biopython, P2Rank

Relevant publications of previous uses by your group of this software/method

1. Zheng, S.*, Li, Y., Chen, S. et al (2020). Predicting drug–protein interaction using quasi-visual question answering system. Nature Machine Intelligence, 2(2), 134-140.

2. Wang, P., Zheng, S.*, Jiang, Y., Li, C., Liu, J., Wen, C., ... & Yang, Y. (2022). Structure-aware multimodal deep learning for drug–protein interaction prediction. Journal of chemical information and modeling, 62(5), 1308-1317.

3. Lu, W., Wu, Q., Zhang, J., Rao, J., Li, C., & Zheng, S*. (2022). TANKBind: Trigonometry-Aware Neural NetworKs for Drug-Protein Binding Structure Prediction. NeurIPS (spotlight).

4. Zheng, S.*, Tan, Y., Wang, Z., Li, C., Zhang, Z., Sang, X., Chen, H. and Yang, Y., (2022). Accelerated rational PROTAC design via deep learning and molecular simulations. Nature Machine Intelligence, 4(9).

5. Sun, Z., Zheng, S*., Zhao, H., Niu, Z., Lu, Y., Pan, Y., & Yang, Y. (2021). To improve the predictions of binding residues with DNA, RNA, carbohydrate, and peptide via multi-task deep neural networks. IEEE/ACM transactions on computational biology and bioinformatics.

6. Liu, Z., Huang, D., Zheng, S*. et al. (2021). Deep learning enables the discovery of highly potent anti-osteoporosis natural products. European Journal of Medicinal Chemistry, 210, p.112982.

7. Zheng, S.*, Yan, X., Yang, Y., & Xu, J. (2019). Identifying structure–property relationships through SMILES syntax analysis with self-attention mechanism. Journal of chemical information and modeling, 59(2), 914-923.

Challenge #4