Computational methods

Hit Identification

Method type (check all that applies)

Deep learning

High-throughput docking

Machine learning

Description of your approach (min 200 and max 800 words)

In the hit identification phase, we plan to deploy a hybrid strategy combining the experience of medicinal chemists with EquiScore. EquiScore is a generic protein-ligand interaction prediction model based on geometric deep learning developed by our team. When designing the model, we thoroughly considered prior information from different sources, including chemical prior information, interaction prior information, spatial prior information, et. We integrated them into a deep learning framework to integrate multiple sources of information to characterize protein-ligand interactions in geometric space. In addition, we considered various potential problems in constructing protein-ligand prediction datasets and proposed several targeted data enhancement strategies so that the model can further extract representations that can be generalized to new targets and ultimately improve the model Screening capabilities on novel targets. In a large-scale retrospective benchmark test, EquiScore's screening ability surpassed the traditional scoring function GLIDE SP, a series of classic machine learning scoring functions, and the newly published deep learning scoring functions DeepDock, RTMScore, PIGNet, TANKBind, etc., and shows the best generalization performance on new targets. Within the team, we cooperated with the experimental department to carry out prospective experimental verification and successfully screened active small molecules of the target.

We will use the trained EquiScore model to screen active compounds in this competition. Specifically, we first use software to generate putative binding poses for all molecules in the database. Second, we will use EquiScore to score these poses and sort the generated pose. Finally, we will cluster the top-scoring molecules and select candidate compounds.

What makes your approach stand out from the community? (<100 words)

We employ several physical and a priori knowledge-based strategies, such as aromatic center, spatial distance, protein-ligand interaction information and space geometric information in the modeling process of EquiScore. At the same time, the advanced equivariant neural network and reasonable data augmentation strategies have further improved his expressive ability and generalization performance, and finally made our model show superior screening ability. Moreover, the performance of EquiScore has been validated both retrospectively and prospectively.

Method Name

EquiScore

Commercial software packages used

Schrödinger Suites 2020-4 version

Free software packages used

RDKit，ProLIF

Relevant publications of previous uses by your group of this software/method

Our article is being submitted

Hit Optimization Methods

Method type (check all that applies)

Deep learning

Machine learning

Description of your approach (min 200 and max 800 words)

In the hit optimization phase, we plan to deploy a hybrid strategy combining the experience of medicinal chemists with PBCNet. PBCNet is a deep learning model developed by our team specifically tailored for ranking relative binding affinity among a congeneric series of ligands, which makes PBCNet a perfect fit for the scenario of structure-based hit optimization. The effectiveness of PBCNet has been retrospectively validated by two held-out sets (provided by Schrödinger, Inc. and Merck KGaA), totaling over 460 ligands and 16 targets. The benchmarking results showed that our model outperformed Schrödinger's Glide, MM-GB/SA, and four recently reported deep learning models (DeltaDelta, Default2018, Dense, and PIGNet) by a large margin. Moreover, equipped with a small amount of fine-tuning data, the performance of PBCNet reaches that of Schrödinger's FEP+, which has been the de facto standard of computational lead optimization methods in the pharmaceutical industry. In addition, we have applied PBCNet to internal structure-based lead optimization projects where the effectiveness of PBCNet have been proven prospectively. We found that the implementation of PBCNet not only increases the success rate of the acquisition of highly active compounds, but also reduces the cycle time of the optimization process. We believe that our hybrid strategy allows for a more efficient and comprehensive exploration of the structure-activity relationship of the experimental hits, and more likely to yield highly active derivatives.

What makes your approach stand out from the community? (<100 words)

We employ several physical and a priori knowledge-based strategies, such as aromatic center, spatial distance and angle geometric information in the modeling process of PBCNet which makes it robust and accurate. Moreover, the performance of PBCNet has been validated both retrospectively and prospectively.

Method Name

PBCNet

Commercial software packages used

Schrödinger Suites 2020-4 version

Free software packages used

RDKit，openbabel

Relevant publications of previous uses by your group of this software/method

Yu J, Li Z, Chen G, Kong X, Hu J, Wang D, et al. PBCNet：Computing Relative Binding Affinity of Ligands to a Receptor Based on a Pairwise Binding Comparison Network for Lead Optimization. ChemRxiv. Cambridge: Cambridge Open Engage; 2023; Pre-printed version

Challenge #4