Challenge #3 – COMPUTATIONAL METHODS

Here is a list of all computational methods used for hit identification in CACHE Challenge #3. Click on the Description for more details. Some participants preferred not to release their publications to stay anonymous at this time.

Description

Method name

Commercial software

Free software

Our approach called DockAI is a new technology that combines docking with a state-of-the-art active learning methodology to significantly improve the efficiency and effectiveness of virtual screening and hit identification.

DockAI

Torch Torch serv Openbabel Rdkit Ambertools Gromacs Sander

Our approach combines the expertise of Kozakov Lab at Stony Brook and Tropsha Lab at UNC. Our workflow uses several complimentary modules for identification of high affinity hits for a given protein target with a known 3D structure. Identification of binding site hot-spot 4(8) information together with conventional structure-based virtual screening methods are enabling components of our hit selection approach.

Frag2Hits

Glide by Schrodinger

FTMap server (https://ftmap.bu.edu/), RDKit;

We propose a generative Artificial Intelligence workflow whereby we combine a Generative Adversarial Network (GAN) and Reinforcement Learning (RL) for simultaneous hit identification and optimization.

MERLIND: Multi-Expert Reinforcement Learning in Drug Discovery

OpenEye

Autodock Vina AMBER OpenMM PyTorch

Here we adopt the hybrid of two different computational strategies for the hit identification of SARS-CoV-2 Nsp3 Mac1 inhibitors.

1. Hit identification via the large-scale virtual screening of Enamine databases

1.1 Initial virtual screening and structural filtering processes

The combination of large-scale virtual screening and a variety of deep molecular generation methods for hit identification and optimization

Glide (Schrödinger Release)

GROMACS (for binding free energy computation) RDKit Multiwfn (for electronic wavefunction analysis) pytorch (as deep learning framework)

Our proposed hit-identification workflow extends pipelines developed by our team during CACHE Challenge 2. The core methodology consists of high-throughput docking followed by binding affinity estimation using Molecular Mechanics Poisson-Boltzman Surface Area (MMPBSA) on multiple poses drawn from a molecular dynamics (MD) run of the protein-ligand complex.

Reaction informed fragment fuser (RiFF)

None

AutoDock Vina AutoDock4 rDock GROMACS DOCK v6 & v3 (Kuntz Group UCSF) PyAutoFEP

We will use structure-based ultra-large virtual screenings using VirtualFlow. Step 1: Protein preparation Protein structures will be prepared with Maestro from Schrödinger (protonation state assignment, assignment of missing atoms/side chains, hydrogen atoms, ...). MD simulations of the target protein will be carried out using Amber 18. Conformations will be clustered, and representative structures of the clusters will be used for the virtual screens.

VirtualFlow/Ultra-Large Virtual Screens

Maestro (protein preparation)

VirtualFlow, AutoDock Vina, QuickVina, Smina, Plants, GWOVina

CODASS3

This proposal is a substantially enhanced version of our previously successful CACHE2 proposal. It includes improvements to every stage of our COmbined Docking And Similarity Search 2.0 (CODASS2) workflow that was applied to that challenge, as well as introducing additional tools and features to boost both its throughput (and thus the size of its screening library) and the reliability of its predictions. In summary, these improvements are:

COmbined Docking and Similarity Search 3.0 (CODASS3)

None

Autodock, Vina-GPU+, GWOVina, RF-Score-VS v2, SCORCH2, Osiris DataWarrior, PDB2PQR, OpenBabel, RDKit, Autodock-SS, NAMD, MOPAC2016 (free to academics), Filter-it

We developed multi-scale and multi-task neural networks to learn binding structures and binding affinities between compounds and proteins based on our previous works[1-3]. The model takes geometric graph representation of compounds and proteins as input. The compound was processed by a physics-driven graph neural network, integrating the geometry and momentum information into the topological structure.

Deep Multi-scale Learning for Drug-Protein Interaction Prediction

NA

Python, Torch, RDKIT, Biopython, P2Rank

The hit identification and drug discovery strategy consist in high-throughput docking for the identification of modulators of the NSP3 helicase of SARS-CoV-2.

Hybrid: High-throughput docking coupled with reevaluation of top hits & docked poses

Schrodinger Drug Discovery suite, BIOVIA Pipeline Pilot, BioSolvIT, MolSoft ICM.

Open MM, RDkit.

Our approach follows multiple stages that gradually funnel massive ligand libraries into hits, leads, and optimized leads. The multiple stages combine earlier data-driven methods and latter principle/physics-driven methods as detailed as follows.

DeepAffinity

DeepAffinity, RDKit, AutoDock-Vina, NAMD

The Enamine REAL Database (5.5 billion compounds) will be used as target database for a deep learning-accelerated virtual screening campaign against the ADPr site of SARS-CoV-2 Nsp3 macrodomain (Mac1). First, we will remove molecules with a computed Tanimoto index of more than 0.6 from any available Mac1 ligand, in order to prioritize completely novel scaffolds.

Deep Docking

Maestro, Glide, ICM

Deep Docking, Autodock-GPU

In our workflow we will employ 3D-pharmacophore screening to synergize the information coming from co-crystalized fragment crystal structures with the information from molecular dynamics simulations.

Dynamic 3D Pharmacophores

InteLigand - LigandScout

CCG - MOE

Schrodinger - Desmond

CCDC - GOLD

PyRod OpenMMDL RDKit KNIME Python

To identify hit molecules for the macrodomain of SARS-CoV-2 Nsp3, we will use the V-Dock approach developed by our group. The V-dock approach uses deep learning models that predict the protein-ligand docking scores from SMILES strings using the docking results of a subset of the whole library instead of directly docking all ligands. We have already shown that protein-ligand docking scores can be accurately predicted from the SMILES representations.

SNU-Dock

Glide

RDKit Autodock-GPU Openbabel

We will use our expertise in AI/ML, cheminformatics, structure-based drug design (SBDD), medicinal chemistry to generate hits for NSP3 Macrodomain (Mac1). Using our in-house drug discovery & cheminformatics platform (published in scientific literature, proprietary code), we will identify a suitable subset of compounds

Deep Learning Approach

in-house

Gromacs : If MD needed Python and Deep Learning packages: Tensorflow, Scikit, Pandas, and Numpy

The small molecule libraries will be obtained from the ZINC and Mcule purchasable databases and common filters will be applied to remove duplicates. Additionally, an in-house Evolutionary chemical binding similarity (ECBS) method (PMID: 31504818) will be used for the primary virtual screening of the curated database.

Evolutionary chemical binding similarity (ECBS) method

RDKit, AutoDock VINA, AutoDock, DOCK, RASPD+, AMBER, SwissADME

Modular synthon-based approach - V-SYNTHES was published in Nature 601, 452–459 (2022). It first identifies the best scaffold–synthon combinations as seeds suitable for further growth, and then iteratively elaborates these seeds to select complete molecules with the best docking scores.

V-SYNTHES

ICM-Pro is provided by MolSoft.

RDKit, KNIME

Using our Binary Star platform, we will employ a comprehensive computational protocol to enable the discovery and optimization of novel lead compounds for the ADPr site of SARS-CoV-2 Nsp3 macrodomain (Mac1). We will run a target analysis workflow (target validation and identification step) for the experimentally determined high-resolution structures of SARS-CoV-2 Nsp3 Mac1, to choose a suitable structure for the virtual screening campaign.

A comprehensive computational protocol that includes: De novo design, high-throughput docking, MD simulations, and FEP calculations.

Schrödinger

AMBER.

NAMD VMD CHARMM-GUI

We propose to apply a massive library screening workflow that exhaustively screens the 4.5 billion compound Enamine REAL database using a deep-learning-based Drug Target Interaction (DTI) prediction engine to identify molecules likely to bind to SARS-CoV-2 Nsp3.

Massive library screening using structurally-augmented Drug-Target Interaction (DTI) prediction models

RD-Kit, Vina, PyTorch

I have developed a genetic algorithm (GA) that can search Enamine Real Space and will use it to find molecules with good docking scores to the target.

Synthon-GA

Glide

RDKit, Sython-GA, Molbloom

Our proposed pipeline consists of three steps. As a preliminary step, we will define a binding site around the ADPr site of PDB 7KQB.

CMOD Design

Gaussian

OpenMM, OpenForceField, Gromacs, MDAnalysis, AmberTools, Autodock Vina, Ledock, Plants, internally developed machine learning models (MILCDock)

Foldit is a crowd-sourced molecular biology game. The CACHE challenge will run as a series of regular puzzles in the Foldit platform and, if prior drug design puzzle experience is any indication, will see participation levels equivalent to other Foldit puzzles.

Drugit

Foldit/Rosetta/RDKit/ZINC API/BCL

We will identify the most conserved residues of the Nsp3 Mac1 adenine binding cavity and the proximal ribose site where there are co-crystalized fragments (54 and 9 PBD submissions respectively) and lead-like small molecules (i.e. Gahbauer et al., bioRxiv. 2022). This will be done by performing multiple sequence alignment (MSA) with the Kalign algorithm on approximately 200,000 SARS-CoV-2 Nsp3 sequences from the NCBI.

Tiered screening incorporating molecular shape, pharmacophore features, docking, FEP and clustering

Molecular Operating Environment (MOE) by the Chemical Computing Group

Kalign In-house MoPBS pharmacophore generation software In-house VS streamlining software DataPype GROMACS

We will build on an analysis pipeline we have developed that is capable of searching billion-scale small-molecule libraries for binding candidates to a target pocket. In the first phase of the pipeline, we will perform a fast (and approximate) affinity prediction using a strategy based on graph neural networks (GNNs). We have developed GNNs that compute representations of both ligand and protein pocket based on a diverse collection of surface properties.

PocketPackerPicker

N/A

Autodock Vina PLIP MARTINI

Our proposal is to link a set of fragments co-crystallized with nsp3, using either a deep generative model or a knowledge-based linker database, to afford drug-like molecules spanning at least two subpockets of the target. In a first step, 186 co-crystallized fragments (Schuller et al., Sci Adv.

POEM (Pocket oriented elaboration of molecules)

SYBYL x2.1.1, Certara USA Inc., Princeton, U.S.A.

Szybki, Filter: OpenEye Scientific Sofware, Santa Fe, U.S.A.

IChem, DeLinker, rdkit, POEM, PLANTS

We will deploy a proprietary deep learning-based framework to rapidly screen multi-billion small molecule libraries. The performance of the proposed framework is tested on several curated as well as publicly-available unbiased benchmarking datasets. To demonstrate the actual application of the framework, we have screened 1.37 billion molecules to discover new inhibitors of the epigenetic protein BRD9 bromodomain.

PrDIN

Maestro (protein preparation)

Glide for docking/hits prioritization (but we'll make a decision later whether to use it or Autodock4/Autpodock Vina or SMINA)

Python, TensorFlow, Keras, RDkit, Autodock4/Autdock Vina/SMINA, Gromacs (if required), PyMOL