Computational methods

Hit Identification

Hybrid of the above

cheminformatics, physics-based molecular dynamics, machine learning, high-throughput docking

Description of your approach (min 200 and max 800 words)

Using our expertise in medicinal chemistry, structural biology, cheminformatics, machine learning (ML) and structure-based drug design (SBDD) we will generate hits for the RNA-binding site of the SARS-CoV-2 NSP13 helicase.

Researchers at Diamond Light Source Ltd., the Structural Genomics Consortium, the University of Toronto and the University of Johannesburg released a study1, in which they note that the NSP13 RNA binding pocket is not only highly conserved across multiple coronaviruses but is also highly druggable, making it an excellent candidate to develop anti-viral therapeutics. We will compile a list of the available NSP13 protein structures from the Protein Data Bank (PDB), with (e.g. 5RLH,5RMA, 5RML, 5RLZ, 5RMM) or without (e.g. 7KRO, 6XEZ, 5RL9, 5RLR, 5RLJ, 5RLI, 5RM2, 5RM7, 5RLW) ligands bound to the RNA-binding site. We will shortlist those amenable to virtual screening: sufficient resolution, completeness, and low B-factors. The apo protein structures will help us determine protein flexibility upon ligand binding and may point to key interactions. We will further analyze these structures noting various ligand chemotypes, variations in amino acid sidechain conformations, and the presence of conserved water molecules. If the water molecule information is insufficient in the available experimental structures, we will perform molecular dynamics (MD) simulations of NSP13 to identify regions with high occupancy of water molecules.

If we determine more than one protein structure to be suitable for our VS campaign, we will carry multiple structures forward in our benchmarking sanity-checks. We will assess the performance of our docking program in reproducing poses from relevant PDB structures, and modify our approach as required. Using our rigid, semi-flexible, and fully-flexible docking approaches, we will evaluate which protein conformation(s) perform best in self- and cross-docking studies. With sanity-checks and benchmarking complete, we will have our model(s) to use prospectively.

Using our in-house drug discovery & cheminformatics platform (peer-reviewed, proprietary code), we will identify a suitable subset of compounds from the Enamine Real Database using various filters which follow medicinal chemistry standards & CACHE white paper guidelines. To consider structurally diverse compounds, we will cluster this set using ECFP4 fingerprints. Our approach is rooted in docking this filtered set of small molecules to the chosen protein model(s) using our state-of-the-art docking program, which considers protein flexibility, displaceable water molecules, and protein-ligand complementarity inside the active site. We are confident in the predicted poses (especially after our retrospective analysis), and we envision several avenues to score/select compounds for testing.

The CACHE Challenge #2 will enable us to test multiple approaches and hypotheses simultaneously against a second target; we used a similar multi-pronged approach during CACHE Challenge #1:

We will pick the top-scoring molecules ranked based on our docking scoring function.

We will use a machine learning (ML) algorithm based on Graph Neural Networks (GNN) to predict the docked scores of molecules (an approach proposed in recent literature). In this second approach we would consider 2-3 orders of magnitude more molecules and then prioritize high-ranking compounds for our comparatively resource-intensive docking algorithm.

We will employ a Quantum Mechanics-Based Scoring Function (QMSF) on molecules not picked using the first two approaches and rank the molecules based on the calculated relative free energy of binding using a more accurate scoring function than a standard docking function.

Finally, our team will participate in a “hit-picking party”, wherein we will visualize the predicted poses and make a human-based selection following discussion and critique.

For the hit SAR stage, this workflow will change as follows: we will search for analogues of the hits in the filtered set using the 2D analogue search module available on our platform. We will then undertake similar steps outlined above with the new, focused library.

Ultimately, we aim to establish a pros/cons list for incorporating ML in physics-based SBDD approaches. Following each approach, essential interactions with NSP13 based on available structures and literature as well as overall fit will be assessed. 100 total top-ranking compounds yielded by the multiple approaches will be selected for testing, aimed at being evenly distributed across methods. "Computational negative controls" may also be selected to support our hypotheses. In line with the SGC/CACHE principles, we will document our research progress and publicize it for all to follow and reference; we are taking a research-centered focus with this opportunity. We hope the sharing of our findings will help guide future efforts in SBDD.

References:

Newman, J.A., Douangamath, A., Yadzani, S. et al. Structure, mechanism and crystallographic fragment screening of the SARS-CoV-2 NSP13 helicase. Nat Commun 12, 4848 (2021). https://doi.org/10.1038/s41467-021-25166-6

What makes your approach stand out from the community? (<100 words)

As we will be using in-house software for the VS, any customization required for docking to this target can be implemented easily by making changes to our code. We will be using a four-pronged approach to score docked poses using methods from physics, quantum mechanics, machine learning as well as medicinal chemists’ intuition. Additionally, we will be publishing our approach as we did with CACHE Challenge #1 (https://chemrxiv.org/engage/chemrxiv/article-details/62ec286badfd353fe6270e34), in the spirit of sharing our knowledge and hope of engaging in insightful discussions with other members of the community.

Method Name

Hybrid

Commercial software packages used

In-house

Free software packages used

GROMACS and AMBER (if required).

Relevant publications of previous uses by your group of this software/method

1. Design, synthesis and in vitro evaluation of novel SARS-CoV-2 3CLpro covalent inhibitors (2022): https://doi.org/10.1016/j.ejmech.2021.114046

2. Discovery of covalent prolyl oligopeptidase boronic ester inhibitors (2020): https://doi.org/10.1016/j.ejmech.2019.111783

3. Integrated Synthetic, Biophysical, and Computational Investigations of Covalent Inhibitors of Prolyl Oligopeptidase and Fibroblast Activation Protein α (2019): https://doi.org/10.1021/acs.jmedchem.9b00642

4. Docking Ligands into Flexible and Solvated Macromolecules. 8. Forming New Bonds – Challenges and Opportunities (2022): https://doi.org/10.1021/acs.jcim.1c00701