Computational methods

Hit Identification

Method type (check all that applies)

High-throughput docking

Hybrid of the above

Hybrid of tiered screening workflow involving alpha spheres volume derived fastROCS queries, in-house fragment flooding pharmacophore analysis, followed by OpenEye docking

Description of your approach (min 200 and max 800 words)

We will identify the most conserved residues of the NSP13 RNA-Binding tunnel where there are co-crystalized fragments (PDB: 5RML, 5RMM, 5RLZ and 5RLH) by performing multiple sequence alignment (MSA) with the Kalign algorithm on approximately 200,000 SARS-CoV-2 NSP13 sequences from the NCBI. We will determine the amino acids close to these fragments that can form interactions with the predicted hit molecules. Preliminary analysis found multiple conserved amino acids in each binding site not only across SARS-CoV-2 but also in MERS-CoV (YP_009047224.1) and SARS-CoV-1 (NP_828870.1) sequences. At the end of the hit generation phase, we will determine if the hit molecules make interactions with the conserved residues in the binding pockets. This will support our hypothesis that the predicted hits will be resistant to mutations in the NSP13 binding site and possibly also bind to the NSP13 RNA-Binding tunnel of different CoVs.

We will implement a tiered virtual screening (VS) strategy. As part of our work developing this proposal, we have already completed most of the steps described below against the purchasable ZINC database so have hits. However, we plan on augmenting initial pharmacophore analysis with MoPBS (detailed below), implementing a cost analysis of the top ranked compounds and will also screen Enamine.

Positive controls: Due to the limited number of known compounds binding to these sites, inclusion of positive controls will not be as rigorous as we would like. We will use the four fragments from the X-ray structures as positive controls of the screening process. We are aware that the VS strategies we will implement will be more feature rich than these fragments, so we expect that the fragments will not pass each stage of the process. However, their X-ray binding poses will be invaluable for guiding our visual analysis of potential binding modes of predicted active compounds.

Stage 1 (fastROCS). We will use MOE SiteFinder to locate Alpha-Spheres dummy atoms in the fragment binding pockets. These dummy atoms reflect hydrophobic/hydrophilic cavity points which will be used to generate volume queries by assigning carbon/oxygen atoms to the hydrophobic and hydrophilic dummy atoms respectively. Different fastROCS volume/feature query sub-sets will be created out of the dummy atoms and input to the first stage of screening the ZINC/Enamine databases – using fastROCS due to its incomparable throughput. The conformations of the database molecules will be generated using OpenEye OMEGA. Our server can screen 50 conformers of each purchasable molecule in ZINC but only 5 conformers of molecules in each of the Enamine tranches due to hardware restrictions. We will also examine the possibility of using Irish supercomputing infrastructure (ICHEC or TCHPC). The output from this stage will be 500,000 best ranked compounds each from ZINC and Enamine.

Stage 2 (MoPBS). In order to sample the interaction preference of the NSP13 binding sites in more detail, we will use our recently published in-house MoPBS algorithm to overlay each protein-ligand complex, flood each protein binding site with fragments (HBA, HBD, Aro and Hyd), combine the fragment output for k-means clustering and assignment of pharmacophore features within the binding site. Multiple MOE pharmacophores will be created and used to query the output from the fastROCS screening stage. Pharmacophores containing 6-10 features will screen the fastROCS output. The stringency of the pharmacophores will be tailored to select 50,000 molecules from each dataset (ZINC and Enamine).

Stage 3 (FRED Docking). 100,000 compounds will enter this stage. OpenEye’s MakeReceptor will prepare the proteins for docking. Docking studies of the fastROCS/pharmacophore hits will be performed on the binding sites using OpenEye FRED. An in-house software integration platform, DataPype, will streamline dataset processing and docking calculations (manuscript under preparation). DataPype is python software designed to seamlessly integrate each step of a VS process: From ligand and protein preparation, through to tiered or consensus screening with multiple VS algorithms and metrics reporting. These steps will be repeated for each binding site sub-set from the four crystal structures. Only ligands that interact with previously identified conserved amino acids will be considered.

Stage 4 (Clustering). Finally, we will perform ECFP clustering, using Pipeline Pilot, of the hit molecules into 100 clusters and sort each cluster by docking score. Before deciding on compound purchase, we will calculate the prices of the top 10 compounds in each cluster and also use this as a filter. We will visually examine mapping at each stage of each candidate compound for purchase, prior to confirming the choice of one compound from each cluster.

Preliminary target product profile: This will map to the CACHE scoring scheme including IC50 <1 μM, Log D<3, MW<400.

Hit optimisation: If hit compounds are identified our models will be updated with the new released activity data. MCS approaches and similarity analyses will be used to select subsets of Enamine and ZINC for screening through the updated process detailed above.

What makes your approach stand out from the community? (<100 words)

We will perform the study in two Phases. (1) We will use bioinformatics tools to identify the most conserved residues in the binding pocket of the co-crystalised fragments. (2) A 4-stage tiered screening approach will be used for predicting SARS-CoV-2 NSP13 inhibitors.

Use volume/shape information of the binding pockets (fastROCS)
Use in-house pharmacophore generation software (MoPBS/MOE)
Perform docking in the binding pockets and use the in-house DataPype screening platform to streamline and automate VS processing (FRED)
Cluster hits based on structural circular fingerprints, analyse prices and visually confirm selected hits for purchase (Pipeline Pilot)

Method Name

Tiered screening incorporating molecular shape, pharmacophore features, docking and clustering

Commercial software packages used

Molecular Operating Environment (MOE) by the Chemical Computing Group
OpenEye- fastROCS (hit identification using shape information), OMEGA (generating conformations), MakeReceptor (preparing binding pocket for docking) and FRED (docking)
Pipeline Pilot (Biovia)

Free software packages used

Kalign
In-house MoPBS pharmacophore generation software
In-house VS streamlining software DataPype

Relevant publications of previous uses by your group of this software/method

Dr Fayne has published 43 papers, a book chapter and two patents, the vast majority of which describe computational design approaches for discovering novel small molecules targeting proteins involved in human disease.

Dr Fayne and Ms Kandwal have recently published a paper using pharmacophore queries to propose the mechanism of action and possible repurposing opportunities of approved drugs showing in vitro efficacy against SARS-CoV-2.

Kandwal S. Fayne D. Repurposing drugs for treatment of SARS-CoV-2 infection: computational design insights into mechanisms of action, J Biomol Struct Dyn, 2022, 40(3):1316-1330. Published online Sept 2020. doi: 10.1080/07391102.2020.1825232

Ms Kandwal started her PhD with Dr Fayne in Sept 2021 on designing small molecule inhibitors of conserved regions of SARS-CoV-2 nsps. A review paper on nsp conservation across group 4 viruses is being finalised for submission.

The MoPBS software is described in this just published paper:

Braun J, Fayne D. Mapping of Protein Binding Sites using clustering algorithms - Development of a pharmacophore based drug discovery tool. J Mol Graph Model. 2022 Sep;115:108228

A paper describing DataPype is currently under preparation for submission next month.

Tiered-screening approaches previously developed by the research group are described in the following papers. All of these published approaches have successfully identified hit compounds from commercial vendor databases, primarily by only screening the SPECS database. The proposed work in this study represented a significant increase in complexity and scale from these previous works but we have the expertise, infrastructure and software to ensure a successful completion of the described project.

Nevin, DK, Peters, MB, Carta, G, Fayne, D, Lloyd, DG, Integrated virtual screening for the identification of novel and selective Peroxisome Proliferating Activated Receptor (PPAR) modulators. J. Med. Chem. 2012 55(11):4978-89

McKay, PB, Fayne, D*, Horn, HW, James, T, Peters, MB, Carta, G, Caboni, L, Nevin, DK, Price, T, Bradley, G, Williams, DC, Rice, JE, Lloyd, DG. Consensus computational ligand-based design for the identification of novel modulators of human Estrogen Receptor alpha. Mol Inf. 2012 31(3-4) 246–258. *Corresponding author

Caboni L, Kinsella GK, Blanco F, Fayne D, Jagoe WN, Carr M, Williams DC, Meegan MJ, Lloyd DG. “True” antiandrogens-selective non-ligand-binding pocket disruptors of androgen receptor-coactivator interactions: novel tools for prostate cancer, J Med Chem. 2012 55(4):1635-44

McKay PB, Peters MB, Carta G, Flood CT, Dempsey E, Bell A, Berry C, Lloyd DG, Fayne D. Identification of plasmepsin inhibitors as selective anti-malarial agents using ligand based drug design. Bioorg Med Chem Lett. 2011 1;21(11):3335-41

Yang Y, Carta G, Peters MB, Price T, O’Boyle N, Knox AJS, Fayne D, Williams DC, Meegan MJ, Lloyd DG. tieredScreen – layered virtual screening tool for the identification of novel Estrogen Receptor Alpha modulators. Mol Inf. 2010, 29, 421 – 430

Challenge #2