Challenge #4 – COMPUTATIONAL METHODS

Here is a list of all computational methods used for hit identification in CACHE Challenge #4. Click on the Description for more details. Some participants preferred not to release their publications to stay anonymous at this time.

Description

Method name

Commercial software

Free software

The hit identification and drug discovery strategy consist in high-throughput docking for the identification of binder to the TKB domain of CBL-B that (1) bind to the same pocket and complete with the co-crystallized compound, (2) represent novel chemical templates and (3) bind with a KD below 30 uM. Read more...

Schrodinger Drug Discovery suite, BIOVIA Pipeline Pilot, BioSolvIT, MolSoft ICM

Our approach combines expertise of Kozakov Lab at Stony Brook and Tropsha Lab at UNC. Our workflow uses several complimentary modules for identification of high affinity hits for a given protein target with a known 3D structure. Identification of the binding site hot-spot information together with conventional structure-based virtual screening methods enhanvced by generative modeling are key enabling components of our hit selection approach.

Frag2Hits

Glide by Schrödinger

FTMap server (https://ftmap.bu.edu/), RDKit; ReLeaSE (https://github.com/isayev/ReLeaSE)

We will use structure-based ultra-large virtual screenings using VirtualFlow 2.0 [Gorgulla 2023]. The procedure will consist of four steps.

VirtualFlow/Ultra-Large Virtual Screens

Maestro (protein preparation)

VirtualFlow, AutoDock Vina, QuickVina, Smina, Plants, GWOVina

First, we will use an ML protein-ligand binding predictor as the primary filter to accelerate a massive number of molecular docking calculations. To do this, we will collect docking data for the given target protein using small molecules from the Enamine Diversity Library (approximately 3.9 million compounds) as ligands.

SNU-Dock

rDOCK Autodock-GPU vina-GPU LeDock

Our proposed hit-identification workflow extends pipelines developed by our team during CACHE Challenge 2 and 3. The core methodology consists of high-throughput docking followed by binding affinity estimation using Molecular Mechanics Poisson-Boltzman Surface Area (MMPBSA) on multiple poses drawn from a molecular dynamics (MD) run of the protein-ligand complex. Given the computational cost of running MD and MMPBSA.

AutoDock Vina AutoDock4 rDock GROMACS DOCK v6 & v3 (Kuntz Group UCSF) PyAutoFEP OpenBabel Chemprop

In the first stage, we will develop a structure-based CBLB-specific QSAR model for fast ligand-based screening based on the publicly available data of 3D CLBL-ligand structures and binding compounds. The distinguishing feature of the QSAR model is the graph neural network architecture coupled with Behler-Parrinello symmetry functions in the representation of protein-ligand complexes [1].

iMolecule

ICM-Pro

BiteNet GraphDelta Gromacs RDkit Deepchem Smina Gnina Tensorflow Pytorch

CODASS4

COmbined Docking and Similarity Search 4.0 (CODASS4)

None

AutoDockTools, Autodock-GPU, Vina-GPU+, GWOVina, RF-Score-VS v2, SCORCH 2.0, Osiris DataWarrior, PDB2PQR, OpenBabel, RDKit, Autodock-SS, Filter-it, FEPOPS, SILCS (free to academics), Miniconda, GROMACS, USRCAT, OpenFEPOPS

We propose a comprehensive workflow for identifying potential hit compounds using natural language processing, molecular docking, and unsupervised learning. Our goal is to leverage the vast high-performance computing resources at Argonne National Laboratory to perform inference on the entire enamine database.

Parallelized Inhibitor Prediction using Transformers (PIPT)

None

AutoDock Vina 1.1.2, AutoDock Tools, RDKit, Enamine Database, VMD, Python, MDAnalysis, PDB Database, NAMD, Tensorflow, PyTorch, scikit-learn

In the pursuit of identifying potential drug candidates，we have designed a rapid screening strategy that consists of two steps. Firstly, a rapid screening is performed to select drugs that may bind to the protein. Secondly, a reasonable binding pose prediction is made for the selected candidate drugs, which is used for affinity prediction.

CPI-MD

None

Pytorch, SparseConvNet, ChemBERT, GROMACS

We developed multi-scale and multi-task neural networks to learn binding structures and binding affinities between compounds and proteins based on our previous works[1-3]. The model takes geometric graph representation of compounds and proteins as input. The compound was processed by a physics-driven graph neural network, integrating the geometry and momentum information into the topological structure.

DynamicBind

NA

Python, Torch, RDKIT, Biopython, P2Rank

Our approach is an improved version based on our recently published work, TankBind.

TankBind

P2Rank. RDKit

Our Approach

Summary. The proposed methodology will be anchored by our in-house, flexible small-molecule design engine Sq-SYNT. The design engine leverages a suite of both physics- and data-driven protocols that can be flexibly assembled into a problem-specific workflow. In this proposal, we use a tailored design approach anchored on

Solvent-Augmented and Fragment-guided large virtual screening using Sq-SYNT

NA

Gromacs, Autodock, Q6, Python, RdKit, ProTox, SwissADME

We will use our expertise in cheminformatics, molecular dynamics (MD), structure-based drug design (SBDD), and medicinal chemistry to generate hits for the tyrosine kinase binding (TKB) domain of the Cbl Proto-Oncogene B (CBL-B).

Hybrid

proprietary drug discovery software

GROMACS

Recent advances in AI/ML coupled with Biolexis's internally developed, fully automated MolecuLern Workflow helping this shift, accelerating, and improving R&D activities of our company's pipeline and some of our collaborator's programs. Recent advances in high computing, the availability of proven computational algorithms, large, validated data set ML training models, and deep neural networks stemmed an exceptional speed in the field of drug discovery and development.

MolecuLern

GOLD

MolSoft's ICM

Schrodinger

Rosetta

DiffDock score plus consensus scores from commercial software to MolecuLern score as criteria for the hit-selection process.

The approach involves a new computational protocol called PyRMD2Dock, which combines the Ligand-Based Virtual Screening (LBVS) tool PyRMD with the popular docking software AutoDock-GPU (AD4-GPU) to enhance the throughput of virtual screening campaigns for drug discovery. By implementing PyRMD2Dock, it is possible to rapidly screen massive chemical databases and identify those with the highest predicted binding affinity to a target protein.

PyRMD2Dock

none

PyRMD, AutoDock-GPU

A computer-implemented method for screening ligand candidates for a target protein. This is done through an in-house developed, integrated ensemble machine learning (ML) model for predicting binding affinity with very high speed and precision.

i-TripleD by ANYO Labs

none

F-Pocket, D-Pocket, RDKit

Modular synthon-based approach - V-SYNTHES was published in Nature 601, 452–459 (2022). It first identifies the best scaffold–synthon combinations as seeds suitable for further growth, and then iteratively elaborates these seeds to select complete molecules with the best docking scores.

V-SYNTHES

ICM-Pro is provided by MolSoft.

RDKit, KNIME

We intend to employ KarmaDock, an innovative deep learning (DL) paradigm for ligand docking developed in-house, to carry out a hierarchical virtual screening of the Enamine database. The initial strategy consists of the following steps:

KarmaDock

Schrödinger

AutoDock Tools, OpenBabel, RDkit, PyTorch, MDAnalysis, pytorch_geometric, Prody.

Our hit identification method starts with a screening performance evaluation to select optimal docking software for the given protein target. During the screening process, we will comprehensively consider the consistency of the ligand binding pose and the binding score to improve the success rate of hit discovery. The workflow of our method is as follows:

fastVS

Schrödinger

AutoDock Vina, AutoDock-GPU, LeDock, PLANTS, UCSF DOCK6, AmberTools, OpenBabel.

We will use a novel method named SpaceDock that we are currently benchmarking, which consists in two steps :(i) docking 150K commercial building blocks (e.g. amines, acids, sulfonylchlorides, etc...) in the binding site of interest, (ii) assemble pairs of chemically-compatible building blocks on-the-fly according to a set of 40 common organic chemistry reactions (e.g.

SpaceDock

CCDC GOLD v.2022

OpenEye SZBYKI v.2.4

BioSolveIT HYDEscorer v.1.5

IChem v.5.2.9: http://bioinfo-pharma.u-strasbg.fr/labwebsite/download.html RDKit: Open-source cheminformatics; http://www.rdkit.org, https://github.com/rdkit/rdkit

We plan to combine the commercial Schrodinger software (https://www.schrodinger.com/) and the DrugFlow platform developed by our company (https://drugflow.com/) to conduct a hierarchical virtual screening on the Enamine database. The preliminary scheme is as follows: (1) prepare the protein and molecules using the Protein Preparation and Ligand Preparation modules embedded in DrugFlow.

drugflow

Schrodinger (https://www.schrodinger.com/)

DrugFlow (https://drugflow.com/)

RDkit (http://www.rdkit.org/) OpenBabel (https://github.com/openbabel/openbabel/) MDAnalysis (https://github.com/MDAnalysis/mdanalysis/) Prody (http://prody.csb.pitt.edu/downloads/) PyTorch (https://pytorch.org/) DGL (https://github.com/dmlc/dgl/)

In the hit identification phase, we plan to deploy a hybrid strategy combining the experience of medicinal chemists with EquiScore. EquiScore is a generic protein-ligand interaction prediction model based on geometric deep learning developed by our team. When designing the model, we thoroughly considered prior information from different sources, including chemical prior information, interaction prior information, spatial prior information, et.

EquiScore

Schrödinger Suites 2020-4 version

RDKit，ProLIF

1. Primary screening with evolutionary chemical binding similarity model

Evolutionary Chemical Binding Similarity (ECBS)

BIOVIA Discovery Studio

RDKit, AutoDock VINA, DOCK6, DiffDock, ECBS