Computational methods

Hit Identification

Method type (check all that applies)

De novo design

Deep learning

Free energy perturbation

High-throughput docking

Machine learning

Physics-based

Description of your approach (min 200 and max 800 words)

Here we adopt the hybrid of two different computational strategies for the hit identification of SARS-CoV-2 Nsp3 Mac1 inhibitors.

1. Hit identification via the large-scale virtual screening of Enamine databases

1.1 Initial virtual screening and structural filtering processes

Large-scale virtual screening of Enamine commercially available and Real database analogs will be carried out via a dynamic computation allocation framework providing computing power of up to a few hundred GPUs and over twenty thousand CPU cores. Initial virtual screening hits will be further analyzed to remove undesirable structures, such as those that are less drug-like, Pan-Assay Interference Compounds (PAINS), and toxic or highly reactive structures. Next, the remaining virtual hits will be grouped into different structural clusters. Finally, some representative structures with good docking scores will be selected for careful binding mode analyses for subsequent binding energy and pose evaluation.

1.2 Follow-up binding-energy and binding mode analyses

Right after the Glide docking process, MM/GBSA will be utilized to evaluate those potential virtual hits with good docking scores. Literature active compounds or fragments will be used as additional control compounds to calibrate parameters used during either docking or MM/GBSA binding energy calculation. In the subsequent evaluation stage, if multiple structural series are present, free energy perturbation (FEP) calculations will be used to compute relative binding free energies for each congeneric series of compounds. If no congeneric counterparts are found, absolute binding free energy will be calculated instead. The whole binding energy computational workflow is built upon GROMACS.

From the interaction pattern point of view, we will apply the independent gradient model (IGM) method to analyze and compare non-covalent interaction fingerprints between proposed binding models of virtual hits and certain binding site residues of NSP3. The IGM approach is a recent electron density-based computational method that detects and quantifies covalent and non-covalent interactions (NCIs). It calculates NCIs by analyzing the topological properties of electron density. It thus can provide a more comprehensive and quantitative description of non-covalent interactions between binding site resides and ligands than traditional rule-based empirical methods. Meanwhile, corresponding IGM fingerprints derived from literature fragments or active compounds can also be used as references to help analyze our virtual screening hits. More details of deriving non-covalent interactions (NCIs) fingerprints from the experimental/computed electron density of protein-ligand complexes can be found in the recent JCIM paper from our StoneWise team members: DOI: 10.1021/acs.jcim.1c01406.

The aforementioned binding mode and binding energy analyses can also be used for subsequent task #3 (virtual screening of merged selections). In addition, besides directly reporting catalog ids of promising virtual hits, our medicinal chemistry members will investigate binding modes of interesting virtual hits and suggest potential modifications to enhance their potency and drug-likeness properties. Those suggested structures can be fed back to the binding free energy computational workflow to guide the hit or lead optimization.

2. Fragment-based virtual screening to inspire new design ideas

To further expand the chemical space beyond commercially available analogs, we will perform a parallel fragment-based virtual screening of both virtual and commercially available fragment libraries against the macrodomain domain of SARS-CoV-2 NSP3. We will compare results with fragment hits reported in biophysical screening literature to identify additional novel promising fragment binders. During the virtual screening and post-docking analysis processes, we will incorporate specific structural constraints with some critical residues within the NSP3 macrodomain binding site to facilitate the binding mode and post-docking analyses.

Promising fragment virtual hits or literature scaffolds may serve as novel starting points for de novo drug design. We plan to apply three independent deep molecular generation approaches. First, Astrazeneca's open-source REINVENT method will be used as the baseline method. My group recently implemented another transformer-based deep molecular generation method that introduces a new decoding strategy, Pruned Tree Search, for Large-Scale Scaffold-Constrained Molecule Generation. It allows us to exhaustively explore the chemical space more efficiently using Chemical Language Models (CLMs)than classical beam search and top-k methods. We are about to submit our manuscript and will disclose all source codes of our method in 2 months at https://github.uconn.edu/mldrugdiscovery. Our 3rd method used the 3D-electron density (ED) of molecules as their representation instead. A GAN model is trained to take the ED of a pocket as input to learn pocket-ligand complementarity and then generate suitable ligand ED for the input binding pocket. An ED interpretation module is further utilized to learn constraints on ligand validity and interpret the generated ED back to actual 3D molecules. More details of the 3rd approach have been described in the paper, DOI: 10.1038/s41598-022-19363-6, by StoneWise participants in our team. Finally,

We will also compare generated structures and the chemical space coverage of three generative methods used in this project. Finally, we will discuss each method's pros and cons and share our thoughts for future improvement with the broad chemistry community.

What makes your approach stand out from the community? (<100 words)

First, our multidisciplinary team is made up of experienced computational chemists, organic synthetic chemists, structural biologists, and mathematical optimization experts. All of us will bring different skills and expertise together to contribute to this drug discovery project.

Meanwhile, the such academia-industry collaboration will be essential for creating a healthy AI-drug-discovery ecosystem, training the next generation of skilled workforce, and promoting broader community collaboration to address the technical bottleneck of investigated computational methods.

Method Name

The combination of large-scale virtual screening and a variety of deep molecular generation methods for hit identification and optimization

Commercial software packages used

Glide (Schrödinger Release)

Free software packages used

GROMACS (for binding free energy computation)

RDKit

Multiwfn (for electronic wavefunction analysis)

pytorch (as deep learning framework)

Relevant publications of previous uses by your group of this software/method

Regarding Electron Density based 3D Molecule Generation and Non-covalent interaction analysis

Wang, L., Bai, R., Shi, X. et al. A pocket-based 3D molecule generative model fueled by experimental electron density. Sci Rep 12, 15100 (2022). https://doi.org/10.1038/s41598-022-19363-6
Ding, K., Yin, S., Li, Z., et al. Observing Noncovalent Interactions in Experimental Electron Density for Macromolecular Systems: A Novel Perspective for Protein–Ligand Interaction Research. Journal of Chemical Information and Modeling 2022 62 (7), 1734-1743 DOI: 10.1021/acs.jcim.1c01406

Regarding Pruned Tree Search for Large-Scale Scaffold-Constrained Molecule Generation

Manuscript is under preparation and will be submitted in 1 month. All source codes will be available as well under my research group github, https://github.uconn.edu/mldrugdiscovery, in 2 months.

Challenge #3