Computational methods

Hit Identification

Method type (check all that applies)

High-throughput docking

Machine learning

Physics-based

Description of your approach (min 200 and max 800 words)

Abstract

We will tackle the discovery and design of NSP13 inhibitors by employing a transparent sequential workflow. This will combine complimentary computational approaches, including target analysis, molecular dynamics, chemoinformatics (molecular fingerprints, molecular docking) and data mining, eventually supported by machine learning (ML). Sections of the proposal, namely “Hit identification”, “Virtual screening of the merged selection”, and “Hit optimization” differ for how we will employ the mentioned approaches, however due to the large overlap, the proposal can be uniquely described herein.

Transparent sequential workflow:

i) Target analysis and target preparation via CoMFA and molecular dynamics (MD); these will be used only in the preliminary analysis, where we will define the targets for all the structure-based methods.

ii) Library preparation, employing complete ZINC as well as Enamine along with a fragment-based library design.

iii) Ultra-large scale and high-throghput virtual screening and docking experiments, conducted over large libraries, to provide different sets of candidates for Hit identification and Hit optimization phases, and repeated for the Virtual screening of the merged selection phase to provide the ranked list of molecules (selected by all Participants).

iv) Additional specific ligand-oriented steps (fragment-based optimization, molecular fingerprints, “Traffic-Light” filtering, LIE methodology), limited to Hit identification and Hit optimization phases.

v) Data mining, to support ranking of compounds for all the three phases; limitedly to the Hit optimization phase, eventual support by ML.

Details for the different steps

Libraries

The Enamine in-stock dataset, the REAL Enamine dataset as well as the whole ZINC library will be used in the Hit identification and Hit optimization phases. Libraries will be prepared using Schrödinger Small Molecule Discovery (SMD) Suite and subjected to filtering via RDKIT based in-house software to retain only molecules with the desired properties [1], according to the “Traffic-Light” scheme of CACHE [2].

In the Hit optimization, whenever the obtained derivatives were not available, ad hoc synthesis might be provided by the applicant’s Med Chem group.

Target Analysis

The PDB structures of NSP13 helicase of SARS-CoV-2, after removal of RNA and fragments, will be subjected to Molecular Dynamics simulations based on AMBER ff, followed by a pocket-comparison analysis based on ProBiS and BioGPS software. The former has already been successfully used for prioritization of compounds for viral protease identification and identification of new compounds in antibacterial drug design [3,4]. The latter has already been used for describing and comparing pockets of different protein families studied in the fight of SARS-CoV-2 [5]. Further hints might emerge from comparing the NSP13 helicase to the whole proteome [6]. Results of this step will be a set of pockets (either fixed or transient) for virtual screening and docking calculations.

Fragment-Based Design, based on CoMFA

The ligand fragments available in the X-ray structures named in the CACHE challenge description (5RLH, 5RLZ, 5RMM, 5RML) will be subjected to structure-based analysis by means of comparison of GRID-based Molecular Interaction Fields (MIF) of fragments, optimized on the fitting of their molecular atoms over the MIF of the corresponding protein pocket. Several fragments and structures will be considered and the most promising solutions (in terms of overall GRID energy) will be used as reference on ligand-based searches through the whole ZINC library, by means of molecular fingerprints (RDKIT). In the Hit identification, the solutions will constitute the “fragment-enriched library” to be subjected to further calculations. In the Hit optimization, the (docked poses of the) active hits will serve as starting fragments.

High-Throughput Virtual Screening and Docking (HTVS&D)

To enhance the chance of hit identification we will employ a wide chemical space consisting of ultra-large library along with the parallel fragment-based approach (previously described). We will refer to the datasets to be subjected to virtual screening and docking as follows: the set of molecules coming from the fragment-based design as “fragment-based dataset”; the other set of molecules as “raw dataset”. In this manner, we believe the chemical space from fragments spanning to lead-like and drug-like molecules can be covered. Virtual screening and docking will be performed with CmDock, although other virtual screening and docking software, namely Flap/Flapdock and/or Vina, will be used for validation intra-experiment in a serial consensus manner: preliminary faster screening by using CmDock, refinement via “consensus” with the other method(s).

Predicted Outcome

The results of virtual screening and docking will consist of a large quantity of heterogeneous data. With transparent chemoinformatics pipelines, mostly based on KNIME platform, HTVS&D data will be carefully handled and, if and when data will permit (balance active/inactive), we will also support the prioritization of candidates by ML methods. The complete sequential workflow will be thus employed for final selection of candidates (Hit identification) and candidate ranking (Virtual screening of merged selection) as well as for proposing/optimizing derivatives in the Hit optimization phase.

References

[1] 10.3390/ijms23105727

[2] 10.1038/s41570-022-00363-z

[3] 10.3390/molecules25245808

[4] 10.1016/j.csbj.2022.05.010

[5] 10.1021/acs.jcim.2c00169

[6] 10.1093/bioinformatics/btq100

What makes your approach stand out from the community? (<100 words)

We will tackle the problem via ultra-large scale docking and virtual screening approaches, taking advantage of very large libraries (ENAMINE and ZINC) as well as a dedicated fragment library. Besides library and chemical space size, we believe we have the advantage of thorough knowledge of the software (especially CmDock and Flap), that will guarantee performance to be monitored in detail. Another factor is target preparation and analysis, with the combination of molecular dynamics, ProBiS binding site analysis and advanced tools based on GRID Molecular Interaction Fields. Finally, the selection of candidates will be supported by LIE method and ML-based classification.

Method Name

Ultra-Large Scale Virtual Screening & Docking

Commercial software packages used

Schöringer SMD Suite, GRID, Flap, BioGPS

Free software packages used

CmDock, PyMOL, Q, R, Python, RDKit, KNIME, ProBiS

Relevant publications of previous uses by your group of this software/method

Kralj, S.; Jukič, M.; Bren, U. (2022). Comparative Analyses of Medicinal Chemistry and Cheminformatics Filters with Accessible Implementation in Konstanz Information Miner (KNIME). International Journal of Molecular Sciences, 23, 5727.

Kolarič, A.; Jukič, M.; Bren, U. (2022). Novel Small-Molecule Inhibitors of the SARS-CoV-2 Spike Protein Binding to Neuropilin 1. Pharmaceuticals, 15, 165.

Jukič, M.; Auger, R.; Folcher, V.; Proj, M.; Barreteau, H.; Gobec, S.; Touzé, T. (2022). Towards discovery of inhibitors of the undecaprenyl-pyrophosphate phosphatase BacA by virtual high-throughput screening. Computational and Structural Biotechnology Journal, 20, 2360-2371.

Siragusa, L.; Menna, G.; Buratta, F.; Baroni, M.; Desantis, J.; Cruciani, G.; Goracci, L. (2022) Cross-Relationship Map of Cavities from Coronaviruses. Journal of Chemical Information and Modeling, 62, 2901-2908.

Cross, S.; Cruciani, G. (2022) FragExplorer: GRID-Based Fragment Growing and Replacement. Journal of Chemical Information and Modeling, 62, 1224-1235.

Tortorella, S.; Carosati, E.; Sorbi, G.; Bocci, G.; Cross, S.; Cruciani, G.; Storchi, L. (2021) Combining machine learning and quantum mechanics yields more chemically aware molecular descriptors for medicinal chemistry applications. Journal of Computational Chemistry, 42, 2068-2078.

Kralj, S.; Jukič, M.; Bren, U. (2021). Commercial SARS-CoV-2 Targeted, Protease Inhibitor Focused and Protein–Protein Interaction Inhibitor Focused Molecular Libraries for Virtual Screening and Drug Design. International journal of molecular sciences, 23, 393.

Jukič, M.; Škrlj, B.; Tomšič, G.; Pleško, S.; Podlipnik, Č.; Bren, U. (2021). Prioritisation of compounds for 3CLpro inhibitor development on SARS-CoV-2 variants. Molecules, 26, 3003.

Jukič, M.; Janežič, D.; Bren, U. (2020). Ensemble docking coupled to linear interaction energy calculations for identification of coronavirus main protease (3CLpro) non-covalent small-molecule inhibitors. Molecules, 25, 5808.

Bocci, G.; Carosati, E.; Vayer, P.; Arrault, A.; Lozano, S. Cruciani, G. (2017) ADME-Space: a new tool for medicinal chemists to explore ADME properties. Scientific Reports, 7, 1-13.

Carosati, E.; van den Höfel, N.; Reif M.; Randazzo, G. M.; Stanitzki, B.; Stevens, J.; Gabbert, H. E.; Cruciani, G.; Mannhold, R.; Mahotka, C. (2015) Discovery of Novel, Potent, and Specific Cell-Death Inducers in the Jurkat Acute Lymphoblastic Leukemia Cell Line. ChemMedChem, 10, 1700-1706.

Carosati, E.; Tochowicz, A.; Marverti, G.; Guaitoli, G.; Benedetti, P.; Ferrari, S.; Stroud, R. M.; Finer-Moore, J.; Luciani, R.; Farina, D.; Cruciani, G.; Costi, M. P. (2012) Inhibitor of Ovarian Cancer Cells Growth by Virtual Screening: A New Thiazole Derivative Targeting Human Thymidylate Synthase. Journal of Medicinal Chemistry, 55, 10272-10276.

Brincat, J. P.; Carosati, E.; Sabatini, S.; Manfroni, G.; Fravolini, A.; Raygada, J. L.; Patel, D.; Kaatz, G. W.; Cruciani, G. (2011) Discovery of Novel Inhibitors of the NorA Multidrug Transporter of Staphylococcus aureus. Journal of Medicinal Chemistry, 54, 354-365.

Konc, J.; Janežič, D. (2010). ProBiS algorithm for detection of structurally similar protein binding sites by local structural alignment. Bioinformatics, 26, 1160-1168.

Challenge #2