CACHE

CRITICAL ASSESSMENT OF COMPUTATIONAL HIT-FINDING EXPERIMENTS

DONATE

  • About
    • WHAT IS CACHE
    • Read More
    • Spotlight
  • CACHE News
  • JOIN A CHALLENGE
    • Challenge #1
      • Announcement
      • Computation methods
    • Challenge #2
      • Announcement
      • Computation methods
    • Challenge #3
      • Announcement
    • FAQ
  • Sponsor a Challenge
  • CONTACT

Challenge #2

Application

HIT IDENTIFICATION

Method type (check all that applies)
Deep learning
High-throughput docking
Machine learning

Description of your approach (min 200 and max 800 words)

We propose to apply a massive library screening workflow which exhaustively screens the 4.5 billion compound Enamine REAL database using a deep-learning-based Drug Target Interaction (DTI) prediction engine to identify molecules likely to bind to the RNA binding site of NSP13 helicase of SARS-CoV-2.

Recently, DTI tools have emerged as a new class of predictive drug discovery algorithms [1] that train on large datasets of pairwise protein-ligand binding pairs available through bioactivity databases (e.g., ChEMBL, BindingDB, STITCH, and GOSTAR). At their core, DTI models combine protein and ligand feature vectors into neural networks to train models capable of identifying interacting pairs. The proposed DTI model differs from other literature-reported solutions in that it systematically maps DTI pairs derived from bioactivity databases onto 3D protein structures to generate local, site-specific structural protein features. The use of local site-specific 3D features was designed to boost inter-protein generalizability and novel target performance. Model predictions are therefore structurally-informed, but ligand-independent.

For the CACHE hit identification stage, the workflow will consist of the following steps:

  1. Create a machine learning pocket representation of the RNA binding site of SARS-CoV-2 NSP13 helicase that incorporates multiple data sources, including the 3D structure of the binding pocket and functional annotations for the NSP13 helicase.
  2. The NSP13 RNA binding site will be screened against all 4.5 billion molecules of the Enamine REAL database using the DTI prediction model to identify candidate hit molecules.
  3. The 15,000 highest-scoring molecules will be filtered by physical-chemical properties, molecular docking score, predicted ADMET properties, and predicted off-target activity against human proteins. Activity against human proteins will be predicted through a counter-screened using the aforementioned DTI model against a human proteome composed of 79,817 p2rank-predicted pockets [2] from 16,818 Alphafold2-modelled structures made available from the EBI Alphafold2 (AF2) repository [3].
  4. Molecules that pass the filtering process will be clustered by fingerprint similarity. The final 100 compounds will be selected from the representatives of each cluster to maintain structural diversity.

[1] MacKinnon, S. S., Madani Tonekaboni, S. A., & Windemuth, A. (2021). Proteome-scale drug-target interaction predictions: Approaches and applications. Current Protocols, 1, e302. doi: 10.1002/cpz1.302

[2] Krivák, R., Hoksza, D. P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure. J Cheminform 10, 39 (2018). https://doi.org/10.1186/s13321-018-0285-8

[3] Mihaly Varadi, Stephen Anyango, Mandar Deshpande, Sreenath Nair, Cindy Natassia, Galabina Yordanova, David Yuan, Oana Stroe, Gemma Wood, Agata Laydon, Augustin Žídek, Tim Green, Kathryn Tunyasuvunakool, Stig Petersen, John Jumper, Ellen Clancy, Richard Green, Ankur Vora, Mira Lutfi, Michael Figurnov, Andrew Cowie, Nicole Hobbs, Pushmeet Kohli, Gerard Kleywegt, Ewan Birney, Demis Hassabis, Sameer Velankar, AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models, Nucleic Acids Research, Volume 50, Issue D1, 7 January 2022, Pages D439–D444, https://doi.org/10.1093/nar/gkab1061

Method Name
Massive Library Screening using Structurally-Augmented Drug-Target Interaction (DTI) prediction models

Commercial software packages used

MatchMaker (Cyclica Inc.)

Free software packages used

Python-based ML stack (PyTorch, scikit-learn) 

BioPython computational biology toolkit

RD-Kit computational chemistry toolkit

Various structural biology tools for structural analysis and visualization, including P2Rank, NGL viewer, Autodock Vina.

Relevant publications of previous uses by your group of this software/method

This recent review linked below [1], published by our group outlines the general ML strategy behind “DTI Models”. The second link provides a sample application towards drug repurposing, whereby a smaller drug repurposing library was screened to discover previously unreported off-target interactions with distinct bioactivities, rather than an exhaustive 4.1b molecule REAL database which has only recently become technically feasible. The last reference provides a recent example of a small molecule hit identified by a massive library screen on a novel target (publication currently in preparation).

  1. MacKinnon, S. S., Tonekaboni, S. A. M. & Windemuth, A. Proteome‐Scale Drug‐Target Interaction Predictions: Approaches and Applications. Curr Protoc 1, (2021). Link.
  2. 1.Sugiyama, M. G. et al. Multiscale interactome analysis coupled with off-target drug predictions reveals drug repurposing candidates for human coronavirus disease. Sci Rep-uk 11, 23315 (2021). Link.
  3. Kimani, S., Owen, J., Dong, A., Li, Y., Hutchinson, A., Seitova, A., Shahani, V.M., Schapira, M., Arrowsmith, C.H., Edwards, A.M., Halabelian, L., “Crystal structure of the WDR domain of human DCAF1 in complex with CYCA-117-70”. Cyclica Press Release. SGC Link. PDB Link 7SSE.

Cache

All rights reserved
v4.33.6.19

Footer first

  • Login
  • Applicant Login
  • Privacy Policy
  • FAQ
This website is licensed under CC-BY 4.0