Computational methods

Hit Identification

Method type (check all that applies)

De novo design

Deep learning

Free energy perturbation

Physics-based

Hybrid of the above

Our approach integrates structure-based methods from computational chemistry and empirical knowledge with artificial intelligence (AI) techniques. Various precision levels and computational chemistry methods, alongside AI techniques, will be employed for

Description of your approach (min 200 and max 800 words)

Introduction：Our plan combines structure-based methods by computational biology and empirical knowledge with artificial intelligence (AI) techniques. By leveraging the powerful predictive capabilities of AI algorithms and the computational speed of GPUs, we aim to screen large molecular libraries efficiently. High-precision computational chemistry methods will enhance the hit rate of active molecules. Currently, only one compound with nanomolar activity has been reported for the triple Tudor domain of SETDB1. The discovery of additional lead compounds is urgently needed to accelerate clinical translation.Therefore, we will identify key components based on previous studies of SETDB1 to filter out the final potential active compounds. Below is a brief description of our methodology:

Methodology：1.Active Learning and Molecular Docking: We will employ active learning combined with molecular docking to screen large molecular libraries, such as Enamine.2.ADMET Property Screening: The ADMET properties of the final compounds will be checked, and compounds with low solubility or other undesirable properties will be removed.3.Redocking and Key Interaction Filtering: The results from the active learning protocol will be redocked, and compounds will be filtered based on key interactions derived from crystal structures and literature, such as hydrogen bonds and π-π stacking.4.AI-Based Force Field Analysis: Using the AI-based force field (AIMNet2), we will estimate the strain energy of docking poses, removing high-energy poses (including global strain and torsion strain) and checking the number of unsatisfied heteroatoms for H-bonding.5.Substructure Filtering: Key substructures summarized from previous research will be used to filter compounds, while maintaining the diversity of the final results.6.Molecular Dynamics Simulation: Stability and binding free energy will be assessed through MD simulations, utilizing the ASGBIE method for accurate calculations.7.For lead optimation, we will apply our development of tools (MolOpt, FragRep, and FragGrow) to optimize structures.

Detailed Workflow

1. Binding Site Analysis and Optimization:Initially, we will utilize protein structure alignment tools and long-term Molecular Dynamics (MD) simulations to identify and optimize binding sites within the SETDB1 triple Tudor domain (TTD) . We will analyze key interactions at these binding sites to refine subsequent screening results. Additionally, we will summarize key substructures from active molecules for secondary screening of hit molecules.

2. Initial Screening with Molecular Docking and Active Learning:Our strategy incorporates molecular docking and active learning to screen the molecular library. The process begins with extracting a subset of molecules from the library, which are docked into the glycine binding pocket of the LBD region to obtain docking scores and binding poses. Based on these docking scores and molecular structures, we train a graph neural network to quickly predict docking scores and associated uncertainties, forming an initial model.Using the initial model's uncertainty predictions, we select molecules with high predicted uncertainty from the molecular library for re-docking to expand the training dataset. We then retrain the model to enhance the accuracy of docking score predictions. This iterative process continues until the model's predictive performance is optimized. Using the optimized model, we select the highest-scoring molecules from the library, re-dock them, and obtain the binding poses. Compounds with low solubility and other undesirable ADMET properties are removed.

3. Secondary Screening and Energy Filtering:Finally, we conduct secondary screening and complex structure analysis based on key interactions and substructures identified from crystal structures. High-strain energy docking poses are filtered out using an AI-based force field. This process aims to select the best molecules and docking poses for further dynamic simulations.

4. Dynamic Simulations and Free Energy Calculations:Following dynamic simulations, we analyze the stability and key conformational changes of the complexes to assess the structural stability and conformational changes of key residues during dynamics. Subsequently, we calculate the binding free energy using the original alanine scanning binding interaction entropy (ASGBIE) method. Initially, ASGBIE is applied to reported complex structures to evaluate its prediction performance against experimental values and to identify hotspot amino acids. This step also involves optimizing method parameters and assessing structural reliability. For the complexes obtained from prior screenings, we calculate the binding energy and identify candidate molecules with binding energies superior to reported compounds for experimental validation.Free energy perturbation will also be used for further confirmation and screening of similar candidate compounds.

What makes your approach stand out from the community? (<100 words)

Our approach stands out by integrating AI algorithms with high-precision computational chemistry methods, enabling efficient screening of large molecular libraries. We employ active learning, AI-based force field analysis, and dynamic simulations to iteratively refine and optimize compound selection. This comprehensive methodology, combined with the use of advanced tools like AIMNet2 and ASGBIE, ensures precise identification of lead compounds. Our focus on optimizing the triple Tudor domain of SETDB1, a target with limited reported compounds, further distinguishes our innovative approach in accelerating clinical translation.

Method Name

Multiscale drug screening and design methods

Commercial software packages used

We don't need to use commercial software

Free software packages used

Amber, pymol academic version, and Schrödinger academic version will be used in this drug screening

Relevant publications of previous uses by your group of this software/method

Duan, L., Liu, X., & Zhang, J. Z. (2016). Interaction entropy: A new paradigm for highly efficient and reliable computation of protein–ligand binding free energy. Journal of the American Chemical Society, 138(17), 5722-5728.

Zhang, H., Saravanan, K. M., Zhang, J. Z., & Wu, X. (2023). Deep-learning based bioactive peptides generation and screening against Xanthine oxidase. bioRxiv, 2023-01.

He, L., Bao, J., Yang, Y., Dong, S., Zhang, L., Qi, Y., & Zhang, J. Z. (2019). Study of SHMT2 inhibitors and their binding mechanism by computational alanine scanning. Journal of Chemical Information and Modeling, 59(9), 3871-3878.

Pan, X., Wang, H., Li, C., Zhang, J. Z., & Ji, C. (2021). MolGpka: A web server for small molecule p K a prediction using a graph-convolutional neural network. Journal of Chemical Information and Modeling, 61(7), 3159-3165.

Zhang, H., Saravanan, K. M., Yang, Y., Wei, Y., Yi, P., & Zhang, J. Z. (2022). Generating and screening de novo compounds against given targets using ultrafast deep learning models as core components. Briefings in Bioinformatics, 23(4), bbac226.

Pan, X., Wang, H., Zhang, Y., Wang, X., Li, C., Ji, C., & Zhang, J. Z. (2022). AA-score: a new scoring function based on amino acid-specific interaction for molecular docking. Journal of Chemical Information and Modeling, 62(10), 2499-2509.

Wang, E., Sun, H., Wang, J., Wang, Z., Liu, H., Zhang, J. Z., & Hou, T. (2019). End-point binding free energy calculation with MM/PBSA and MM/GBSA: strategies and applications in drug design. Chemical reviews, 119(16), 9478-9508.

Bao, J., He, X., & Zhang, J. Z. (2021). DeepBSP—a machine learning method for accurate prediction of protein–ligand docking structures. Journal of chemical information and modeling, 61(5), 2231-2240.

Ji, C. G., & Zhang, J. Z. H. (2008). Protein polarization is critical to stabilizing AF-2 and helix-2′ domains in ligand binding to PPAR-γ. Journal of the American Chemical Society, 130(50), 17129-17133.

Wei, M., Zhang, X., Pan, X., Wang, B., Ji, C., Qi, Y., & Zhang, J. Z. (2022). HobPre: accurate prediction of human oral bioavailability for small molecules. Journal of Cheminformatics, 14(1), 1.

Duan, L. L., Mei, Y., Zhang, D., Zhang, Q. G., & Zhang, J. Z. (2010). Folding of a helix at room temperature is critically aided by electrostatic polarization of intraprotein hydrogen bonds. Journal of the American Chemical Society, 132(32), 11159-11164.

Xu, M., He, X., Zhu, T., & Zhang, J. Z. (2019). A fragment quantum mechanical method for metalloproteins. Journal of chemical theory and computation, 15(2), 1430-1439.

Tong, Y., Ji, C. G., Mei, Y., & Zhang, J. Z. (2009). Simulation of NMR data reveals that proteins’ local structures are stabilized by electronic polarization. Journal of the American Chemical Society, 131(24), 8636-8641.

Tong, Y., Mei, Y., Li, Y. L., Ji, C. G., & Zhang, J. Z. (2010). Electrostatic polarization makes a substantial contribution to the free energy of avidin− biotin binding. Journal of the American Chemical Society, 132(14), 5137-5142.

Challenge #6