Computational methods

Hit Identification

Method type (check all that applies)

Deep learning

High-throughput docking

Description of your approach (min 200 and max 800 words)

In the pursuit of identifying potential drug candidates，we have designed a rapid screening strategy that consists of two steps. Firstly, a rapid screening is performed to select drugs that may bind to the protein. Secondly, a reasonable binding pose prediction is made for the selected candidate drugs, which is used for affinity prediction.

In the first step, we will be utilizing two methods that our team has developed and refined over time: PointSite and SAT-CPI. PointSite is a novel point cloud segmentation method used for accurate identification of protein-ligand binding atoms. SAT-CPI is a structure-assisted model for predicting ligand-protein interactions. The model involves a twin-transformer network that combines protein and ligand features extracted from 3D proteins and ligands using PointSite and chemBERT models. The network's output layers consist of ResNet and fully-connected layers used for predicting interaction or non-interaction. After the first screening using SAT-CPI, any identified interaction ligands will undergo further refinement in the next docking step.

In the second step, we will be developing a diffusion model combined with molecular dynamics simulation to predict the TKB domain-ligand complex conformations for both binding pose and affinity estimation in drug screening. This approach aims to sample ligand poses by running a learned diffusion process that takes into account the contextual information of the TKB domain. To ensure that the final trajectory sampled by the diffusion model is structurally reasonable, we incorporate molecular dynamics (MD) to constrain each step of the diffusion process. The ligands used in this step are the ones predicted as interactions in the first step. Our ultimate goal is to identify the most promising drug candidates, which will be triaged and prioritized based on their predicted binding affinity and physical properties.

What makes your approach stand out from the community? (<100 words)

In our approach, we will use the contextual information of the TKB domain to accurately determine the binding pose of the ligand at the atomic level, while also considering the equilaterality of rotation and translation in 3D space. Furthermore, we plan to incorporate molecular dynamics into our end-to-end model to improve the accuracy of both binding pose and affinity predictions. Ultimately, this will enable us to identify the most promising drug candidates for further development.

Method Name

CPI-MD

Commercial software packages used

None

Free software packages used

Pytorch, SparseConvNet, ChemBERT, GROMACS

Relevant publications of previous uses by your group of this software/method

Li, Zhen, et al. "Predicting membrane protein contacts from non-membrane proteins by deep transfer learning." arXiv preprint arXiv:1704.07207 (2017).

Wang, Sheng, et al. "Accurate de novo prediction of protein contact map by ultra-deep learning model." PLoS computational biology 13.1 (2017): e1005324.

Yan, Xu, et al. "PointSite: a point cloud segmentation tool for identification of protein ligand binding atoms." Journal of Chemical Information and Modeling 62.11 (2022): 2835-2845.

Wang, Qin, et al. "Prior knowledge facilitates low homologous protein secondary structure prediction with DSM distillation." Bioinformatics 38.14 (2022): 3574-3581.

Wang, Qin, et al. "Contact-Distil: Boosting Low Homologous Protein Contact Map Prediction by Self-Supervised Distillation." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 36. No. 4. 2022.

Challenge #4