Here we adopt the hybrid of two different computational strategies for the hit identification of SARS-CoV-2 Nsp3 Mac1 inhibitors.
1. Hit identification via the large-scale virtual screening of Enamine databases
1.1 Initial virtual screening and structural filtering processes
Large-scale virtual screening of Enamine commercially available and Real database analogs will be carried out via a dynamic computation allocation framework providing computing power of up to a few hundred GPUs and over twenty thousand CPU cores. Initial virtual screening hits will be further analyzed to remove undesirable structures, such as those that are less drug-like, Pan-Assay Interference Compounds (PAINS), and toxic or highly reactive structures. Next, the remaining virtual hits will be grouped into different structural clusters. Finally, some representative structures with good docking scores will be selected for careful binding mode analyses for subsequent binding energy and pose evaluation.
1.2 Follow-up binding-energy and binding mode analyses
Right after the Glide docking process, MM/GBSA will be utilized to evaluate those potential virtual hits with good docking scores. Literature active compounds or fragments will be used as additional control compounds to calibrate parameters used during either docking or MM/GBSA binding energy calculation. In the subsequent evaluation stage, if multiple structural series are present, free energy perturbation (FEP) calculations will be used to compute relative binding free energies for each congeneric series of compounds. If no congeneric counterparts are found, absolute binding free energy will be calculated instead. The whole binding energy computational workflow is built upon GROMACS.
From the interaction pattern point of view, we will apply the independent gradient model (IGM) method to analyze and compare non-covalent interaction fingerprints between proposed binding models of virtual hits and certain binding site residues of NSP3. The IGM approach is a recent electron density-based computational method that detects and quantifies covalent and non-covalent interactions (NCIs). It calculates NCIs by analyzing the topological properties of electron density. It thus can provide a more comprehensive and quantitative description of non-covalent interactions between binding site resides and ligands than traditional rule-based empirical methods. Meanwhile, corresponding IGM fingerprints derived from literature fragments or active compounds can also be used as references to help analyze our virtual screening hits. More details of deriving non-covalent interactions (NCIs) fingerprints from the experimental/computed electron density of protein-ligand complexes can be found in the recent JCIM paper from our StoneWise team members: DOI: 10.1021/acs.jcim.1c01406.
The aforementioned binding mode and binding energy analyses can also be used for subsequent task #3 (virtual screening of merged selections). In addition, besides directly reporting catalog ids of promising virtual hits, our medicinal chemistry members will investigate binding modes of interesting virtual hits and suggest potential modifications to enhance their potency and drug-likeness properties. Those suggested structures can be fed back to the binding free energy computational workflow to guide the hit or lead optimization.
2. Fragment-based virtual screening to inspire new design ideas
To further expand the chemical space beyond commercially available analogs, we will perform a parallel fragment-based virtual screening of both virtual and commercially available fragment libraries against the macrodomain domain of SARS-CoV-2 NSP3. We will compare results with fragment hits reported in biophysical screening literature to identify additional novel promising fragment binders. During the virtual screening and post-docking analysis processes, we will incorporate specific structural constraints with some critical residues within the NSP3 macrodomain binding site to facilitate the binding mode and post-docking analyses.
Promising fragment virtual hits or literature scaffolds may serve as novel starting points for de novo drug design. We plan to apply three independent deep molecular generation approaches. First, Astrazeneca's open-source REINVENT method will be used as the baseline method. My group recently implemented another transformer-based deep molecular generation method that introduces a new decoding strategy, Pruned Tree Search, for Large-Scale Scaffold-Constrained Molecule Generation. It allows us to exhaustively explore the chemical space more efficiently using Chemical Language Models (CLMs)than classical beam search and top-k methods. We are about to submit our manuscript and will disclose all source codes of our method in 2 months at https://github.uconn.edu/mldrugdiscovery. Our 3rd method used the 3D-electron density (ED) of molecules as their representation instead. A GAN model is trained to take the ED of a pocket as input to learn pocket-ligand complementarity and then generate suitable ligand ED for the input binding pocket. An ED interpretation module is further utilized to learn constraints on ligand validity and interpret the generated ED back to actual 3D molecules. More details of the 3rd approach have been described in the paper, DOI: 10.1038/s41598-022-19363-6, by StoneWise participants in our team. Finally,
We will also compare generated structures and the chemical space coverage of three generative methods used in this project. Finally, we will discuss each method's pros and cons and share our thoughts for future improvement with the broad chemistry community.