Challenge #3 – COMPUTATIONAL METHODS
Here is a list of all computational methods used for hit identification in CACHE Challenge #3. Click on the Description for more details. Some participants preferred not to release their publications to stay anonymous at this time.
Our approach called DockAI is a new technology that combines docking with a state-of-the-art active learning methodology to significantly improve the efficiency and effectiveness of virtual screening and hit identification.
Read more...
Our approach combines the expertise of Kozakov Lab at Stony Brook and Tropsha Lab at UNC. Our workflow uses several complimentary modules for identification of high affinity hits for a given protein target with a known 3D structure. Identification of binding site hot-spot 4(8) information together with conventional structure-based virtual screening methods are enabling components of our hit selection approach.
Read more...Glide by Schrodinger
We propose a generative Artificial Intelligence workflow whereby we combine a Generative Adversarial Network (GAN) and Reinforcement Learning (RL) for simultaneous hit identification and optimization.
Read more...OpenEye
Here we adopt the hybrid of two different computational strategies for the hit identification of SARS-CoV-2 Nsp3 Mac1 inhibitors.
1. Hit identification via the large-scale virtual screening of Enamine databases
1.1 Initial virtual screening and structural filtering processes
Read more...Glide (Schrödinger Release)
Our proposed hit-identification workflow extends pipelines developed by our team during CACHE Challenge 2. The core methodology consists of high-throughput docking followed by binding affinity estimation using Molecular Mechanics Poisson-Boltzman Surface Area (MMPBSA) on multiple poses drawn from a molecular dynamics (MD) run of the protein-ligand complex.
Read more...None
We will use structure-based ultra-large virtual screenings using VirtualFlow. Step 1: Protein preparation Protein structures will be prepared with Maestro from Schrödinger (protonation state assignment, assignment of missing atoms/side chains, hydrogen atoms, ...). MD simulations of the target protein will be carried out using Amber 18. Conformations will be clustered, and representative structures of the clusters will be used for the virtual screens.
Read more...Maestro (protein preparation)
CODASS3
This proposal is a substantially enhanced version of our previously successful CACHE2 proposal. It includes improvements to every stage of our COmbined Docking And Similarity Search 2.0 (CODASS2) workflow that was applied to that challenge, as well as introducing additional tools and features to boost both its throughput (and thus the size of its screening library) and the reliability of its predictions. In summary, these improvements are:
Read more...
None
We developed multi-scale and multi-task neural networks to learn binding structures and binding affinities between compounds and proteins based on our previous works[1-3]. The model takes geometric graph representation of compounds and proteins as input. The compound was processed by a physics-driven graph neural network, integrating the geometry and momentum information into the topological structure.
Read more...NA
The hit identification and drug discovery strategy consist in high-throughput docking for the identification of modulators of the NSP3 helicase of SARS-CoV-2.
Read more...Schrodinger Drug Discovery suite, BIOVIA Pipeline Pilot, BioSolvIT, MolSoft ICM.
Our approach follows multiple stages that gradually funnel massive ligand libraries into hits, leads, and optimized leads. The multiple stages combine earlier data-driven methods and latter principle/physics-driven methods as detailed as follows.
Read more...
The Enamine REAL Database (5.5 billion compounds) will be used as target database for a deep learning-accelerated virtual screening campaign against the ADPr site of SARS-CoV-2 Nsp3 macrodomain (Mac1). First, we will remove molecules with a computed Tanimoto index of more than 0.6 from any available Mac1 ligand, in order to prioritize completely novel scaffolds.
Read more...Maestro, Glide, ICM
The project will begin with a structure-based analysis of the RNA binding cavity of SARS-CoV-2 Nsp3, based on the crystal structure and the fragments, molecular dynamics simulations, and the in-house program PyRod [1,2] to sample interaction points in the binding pocket. Briefly, PyRod traces water molecules in protein binding cavities and generates dynamic maps describing the interaction patterns of the water molecules with respect to the protein.
Read more...InteLigand - LigandScout
CCG - MOE
Schrodinger - Desmond
CCDC - GOLD
OpenEye - Szybki
To identify hit molecules for the macrodomain of SARS-CoV-2 Nsp3, we will use the V-Dock approach developed by our group. The V-dock approach uses deep learning models that predict the protein-ligand docking scores from SMILES strings using the docking results of a subset of the whole library instead of directly docking all ligands. We have already shown that protein-ligand docking scores can be accurately predicted from the SMILES representations.
Read more...Glide
We will use our expertise in AI/ML, cheminformatics, structure-based drug design (SBDD), medicinal chemistry to generate hits for NSP3 Macrodomain (Mac1). Using our in-house drug discovery & cheminformatics platform (published in scientific literature, proprietary code), we will identify a suitable subset of compounds from the Enamine Real Database using various filters which follow medicinal chemistry standards & CACHE white paper guidelines.
Read more...in-house
The small molecule libraries will be obtained from the ZINC and Mcule purchasable databases and further common filters will be applied to remove the duplicates. Additionally, an in-house Evolutionary chemical binding similarity (ECBS) method (PMID: 31504818) will be using for primary virtual screening of the curated database.
Read more...BIOVIA Discovery Studio Client
Modular synthon-based approach - V-SYNTHES was published in Nature 601, 452–459 (2022). It first identifies the best scaffold–synthon combinations as seeds suitable for further growth, and then iteratively elaborates these seeds to select complete molecules with the best docking scores.
Read more...ICM-Pro is provided by MolSoft.
Using our Binary Star platform, we will employ a comprehensive computational protocol to enable the discovery and optimization of novel lead compounds for the ADPr site of SARS-CoV-2 Nsp3 macrodomain (Mac1). We will run a target analysis workflow (target validation and identification step) for the experimentally determined high-resolution structures of SARS-CoV-2 Nsp3 Mac1, to choose a suitable structure for the virtual screening campaign.
Read more...Schrödinger
AMBER.
We propose to apply a massive library screening workflow that exhaustively screens the 4.5 billion compound Enamine REAL database using a deep-learning-based Drug Target Interaction (DTI) prediction engine to identify molecules likely to bind to SARS-CoV-2 Nsp3.
Read more...I have developed a genetic algorithm (GA) that can search Enamine Real Space and will use it to find molecules with good docking scores to the target.
Read more...Glide
Our proposed pipeline consists of three steps. As a preliminary step, we will define a binding site around the ADPr site of PDB 7KQB.
Read more...Gaussian
Foldit is a crowd-sourced molecular biology game. The CACHE challenge will run as a series of regular puzzles in the Foldit platform and, if prior drug design puzzle experience is any indication, will see participation levels equivalent to other Foldit puzzles.
Read more...We will identify the most conserved residues of the Nsp3 Mac1 adenine binding cavity and the proximal ribose site where there are co-crystalized fragments (54 and 9 PBD submissions respectively) and lead-like small molecules (i.e. Gahbauer et al., bioRxiv. 2022). This will be done by performing multiple sequence alignment (MSA) with the Kalign algorithm on approximately 200,000 SARS-CoV-2 Nsp3 sequences from the NCBI.
Read more...We will build on an analysis pipeline we have developed that is capable of searching billion-scale small-molecule libraries for binding candidates to a target pocket. In the first phase of the pipeline, we will perform a fast (and approximate) affinity prediction using a strategy based on graph neural networks (GNNs). We have developed GNNs that compute representations of both ligand and protein pocket based on a diverse collection of surface properties.
Read more...N/A
Our proposal is to link a set of fragments co-crystallized with nsp3, using either a deep generative model or a knowledge-based linker database, to afford drug-like molecules spanning at least two subpockets of the target. In a first step, 186 co-crystallized fragments (Schuller et al., Sci Adv.
Read more...SYBYL x2.1.1, Certara USA Inc., Princeton, U.S.A.
Szybki, Filter: OpenEye Scientific Sofware, Santa Fe, U.S.A.
We will deploy a proprietary deep learning-based framework to rapidly screen multi-billion small molecule libraries. The performance of the proposed framework is tested on several curated as well as publicly-available unbiased benchmarking datasets. To demonstrate the actual application of the framework, we have screened 1.37 billion molecules to discover new inhibitors of the epigenetic protein BRD9 bromodomain.
Read more...Maestro (protein preparation)
Glide for docking/hits prioritization (but we'll make a decision later whether to use it or Autodock4/Autpodock Vina or SMINA)