Challenge #1 – COMPUTATIONAL METHODS

Here is a list of all computational methods used for hit identification in CACHE Challenge #1. Click on the Description for more details. Some participants preferred not to release their publications to stay anonymous at this time.

Description

Method name

Commercial software

Free software

Our approach combines expertise of Kozakov Lab at Stony Brook and Tropsha Lab at UNC. Our workflow uses several complimentary modules for identification of high affinity hits for a given protein target with a known 3D structure. Identification of binding site hot-spot information together with conventional structure-based virtual screening methods are enabling components of our hit selection approach. Read more...

Frag2Hits

Glide by Schrödinger

FTMap server (https://ftmap.bu.edu/), RDKit

The workflow will consist of a preliminary part and two production parts. On the preliminary stage MD simulation of the protein will be performed to study protein flexibility and choose representative conformations. On the first stage we will enumerate possible ligands for each of the representative protein conformations by the fragment-based de novo approach CReM guided by docking (Autodock Vina). Read more...

CReM

FTrees

CReM, Autodock Vina, Smina, gnina, DeepDock

First, an ultra-large library of ~4.5B purchasable molecules from Enamine REAL will be parameterized and prepared using our own AI-accelerated quantum-mechanical methods to prepare it for subsequent structure-based automated virtual screening. Read more...

DeepDocking, neural network atomistic force fields, molecular dynamics, thermodynamic integration, free energy methods, de novo generative models

Glide, ICM, OpenEye toolkit, AMBER

OpenChem, torchani, pytorch, Autodock-GPU

A computational workflow will be implemented sequentially in order to i) identify the most promising binding sites within the WDR domain of LRRK2 through a fragment-based computational screening approach, ii) identify and select fragments with lead-like properties and/or high ligand efficiencies, iii) screen databases using the identified fragments and Lipinsky/Mozziconacci rules, and iv) perform high-throughput docking of the selected molecules in the proposed binding sites. Read more...

Sequential fragment-based/HTS docking hit identification

none

GROMACS, gnina, vina, qvina-w, VMD, Pymol, python

The project will begin with a structure - based analysis of the central binding cavity of the WDR40 domain, using molecular dynamics simulations together with in - house program PyRod [1,2] to sample interaction points in the binding pocket. Briefly, PyRod traces water molecules in protein binding cavities and generates dynamic maps describing the interaction patterns of the water molecules with respect to the protein. Read more...

Dynamic 3D Pharmacophores

LigandScout (Inte:Ligand, Austria) GOLD - (The Cambridge Crystallographic Data Centre, UK) Szybki (OpenEye, NM, USA)

Desmond (deshaw research, NY, USA) OpenMM (https://openmm.org/)

Our proposed pipeline consists of five steps. First, we will pass the on-demand ZINC20 database of small molecules through a custom docking pipeline using a machine learning-enhanced consensus module. This machine learning consensus module combines binding affinity and pose predictions from five traditional docking tools to rank the small molecules according to the probability of binding. Read more...

Deep integration of physics and machine learning based methods for accurate molecule ranking

Autodock Vina, Autodock4, Ledock, rDock, PLANTS, OpenMM, internally developed machine learning models (omission of name for blind review)

The hit identification and drug discovery strategy consistes in high-throughput docking for the identification of quality, developable LRRK2 modulators. Several large libraries of commercially available compounds (VitasM, LifeChemicals, MolPort, Enamine) were downloaded and prepared using the LigPrep preparation workflow from Maestro (Schrödinger, Inc). Read more...

High-throughput docking, with rescoring using structural waters and ligand conformational strain

Glide/Maestro/Drug Discovery suite (Schrödinger, Inc), SeeSAR (BioSolveIT), BIOVIA Pipeline Pilot.

N/A

We propose applying Cyclica’s Ligand Design massive library screening workflow, which has been successfully applied on over several commercial hit-finding programs. The multi-scale workflow exhaustively screens Enamine’s REAL database, composed of 4.1 billion compounds through a series of three successive predictive tasks of increasingly demanding computational requirements. Read more...

Ligand Design - Massive Library Screening

MatchMaker

Python-based ML stack (PyTorch, scikit-learn); BioPython computational biology toolkit; RD-Kit computational chemistry toolkit; Various structural biology tools for structural analysis and visualization, including P2Rank, NGL viewer, Autodock Vina.

Library. The library for the screening will be composed of a large set of purchasable molecules (from ZINC20) enriched by library of molecules that exist or are synthesisable at the Applicant’s MedChem group. Traffic-Light (TL) Pre-filtering. Different filters will address the “Traffic Light (TL) criteria”: molecular fingerprints as well as molecular descriptors (both 2D and 3D) will be used to calculate solubility in water, logD and the other TL-parameters. Additional ADME filtering. Read more...

MIF-based Virtual Screening

GRID, FLAP, BioGPS, VolSurf+

At this stage of the challenge, the goal is to identify primary hits. In the absence of prior ligand information for the target, we are aiming at exploiting the state-ofthe-art protein subpocket-fragment interactions for de novo design, via a 4-step workflow. In the first step, the sc-PDB database [1] (30,000 protein-ligand complexes in the 2022 archive) ligands are fragmented in their co-crystallized conformation with an interaction-aware fragmentation method [2]. Read more...

ProCare

OpenEye Scientific Software (Santa Fe, NM 87508, U.S.A. https://www.eyesopen.com), Corina (Molecular Networks GmbH, 90411 Nürnberg, Germany.

IChem (free academic license), ProCare, DeLinker, Protoss (free academic license), PLANTS (free academic license), RDKit

Due to the lack of available screening data for molecules targeting the WD40 repeat, we will start by searching the Enamine Make-On-Demand library with our own novel evolutionary optimization algorithm RosettaLigandEvolution (LigEvol) and ligand docking with RosettaLigand. Both are part of the Rosetta software suite and developed by us. Read more...

RosettaLigand, RosettaLigandEvolution, BCL

Rosetta Suite, Biology and Chemistry Library (BCL), PyMOL, Python– RDKit and other free libraries

The approach described herein will utilize computational and chemical intuition built from work that the team can bring together, including previous work on WDR5 and RBBP4 on design of compounds that affect structural integrity of homo-multimers and on protein folding in general. We would begin by identifying compounds de novo that would be expected to bind to Thr2356 and two other anchor residues in the "donut" region. Read more...

MM docking, MD, de novo design, fingerprint searching will be used

MOE

Our approach follows multiple stages that gradually funnel massive ligand libraries into hits, leads, and optimized leads. The multiple stages combine earlier data-driven and latter principle/physics-driven methods as detailed as follows. 1. Our own meta deep learning-based early-stage screening. Read more...

DeepAffinity

https://github.com/Shen-Lab/DeepAffinity (GPL-3.0), https://github.com/Shen-Lab/CPAC (GPL-3.0), https://github.com/Shen-Lab/Drug-Combo-Generator (Apache-2.0), https://github.com/ccsb-scripps/AutoDock-Vina (Apache-2.0), https://www.ks.uiuc.edu/Research/namd/license.html (Non-Exclusive, Non-Commercial Use License)

Summary: We will use structure-based ultra-large virtual screenings using VirtualFlow. Step 1: Protein preparation Protein structures will be prepared with Maestro from Schrödinger (protonation state assignment, assignment of missing atoms/side chains, hydrogen atoms, ...). MD simulations of the target protein will be carried out using Amber 18. Conformations will be clustered, and representative structures of the clusters will be used for the virtual screens. Read more...

VirtualFlow

Maestro (protein preparation)

VirtualFlow, AutoDock Vina, QuickVina, Smina, Plants, GWOVina

We plan to employ a dual prediction strategy. First, we will use a physics-based protocol with high-throughput docking with Schrödinger Glide (optionally combined with an active learning approach for higher throughput) or a Phase Pharmacophore model built based on the active site in WD40 domains. Further prioritization may be performed by MM-PBSA and absolute free energy calculations with Schrödinger FEP+. Read more...

Schrödinger suite

REINVENT, scikit-learn

We will use our expertise in cheminformatics, structure-based drug design (SBDD), medicinal chemistry, and machine learning (ML) to generate hits for LRRK2. Using our in-house drug discovery & cheminformatics platform (published in scientific literature, proprietary code), we will identify a suitable subset of compounds from the Enamine Real Database using various filters which follow medicinal chemistry standards & CACHE white paper guidelines. Read more...

Hybrid

in-house

Gromacs if MD is needed

To identify central cavity binders of WD40 repeat (WDR) domain of LRRK2, we will follow a multi-strategy approach to increase the probability of finding potent hits. The current release of Enamine-REAL database with ~4B compounds will be screened using each strategy and top-ranking compounds will be selected. Specifically, we propose to employ 4 strategies and approximately 25 compounds from each strategy will be selected. Read more...

ToKenVS

OpenEye Software, Schrodinger Suite

RDKit, FTMap, MayaChemTools, PyMol, R program

We developed a multi-scale and multi-task neural network to learn interaction domain and binding affinities between compounds and proteins. The model takes graph representation of compounds and proteins as input. The compound was processed by a physics-driven graph neural network, integrating the geometry and momentum information to the topological structure. While the protein was processed by a multi-scale graph neural network, connecting surface to structure and sequence. Read more...

Multi-scale drug-protein interaction prediction (MS-DPI)

None

Python, Torch, RDKIT, Biopython, P2Rank

Our hit identification technique involves three stages. The first involves aggregating a list of initial compounds. We utilize our in-house database of in-stock and on-demand compounds from various vendors (MCule, Enamine, etc) and aggregators (ZINC) (see citations). We also utilize a database of synthetically accessible compounds created through computationally running known synthetic reaction pathways (SAVI). Read more...

Ensemble-Based Docking

OpenEye Toolkit

OpenMM, RDKit, PyTorch

I have developed a genetic algorithm (GA) that can search Enamine Real Space and will use it to find molecules with good docking scores to the target. Read more...

Synthon-GA

Glide for docking (though open source docking programs can also be used).

Synthi, Synthon-GA

FRASE-bot is a computational platform enabling de novo construction of small-molecule ligands directly in the binding pocket of a target protein. It makes use of machine learning to distill 3D information relevant to the protein of interest from thousands of 3D protein-ligand complexes in the Protein Data Bank (PDB) and respective structure-activity relationships (SAR). Read more...

FRASE-bot

Pipeline Pilot, Schrodinger suite

KNIME, RDKit, PyTorch

Our goal with this competition is to evaluate our released tools (rather than new methods). Pharmit (http://pharmit.csb.pitt.edu) will be used to perform pharmacophore screens of purchasable compounds. Pharmacophores will be constructed in a mostly manual process through inspection of known binders and docked fragments. We will screen all the commercial libraries in Pharmit (MolPort, MCULE, Chemspace, LabNetwork). Read more...

Pharmit+gnina

Pharmit, gnina

We will perform equilibrium Molecular Dynamics (MD) calculations, using the Amber suite, with a mix of co-solvent molecules at different concentrations (co-solvent MD simulations).In Co-solvent MD simulations water and co-solvents compete for pockets on the protein surface and the simulation results provide insight into desolvation contributions and also provide information regarding interaction between co-solvents and the interaction surfaces of LRRK2's WD40 domain. Read more...

Co-solvent molecular dynamics informed pharmacophore screening

AMBER , Schrodinger suite,Nanome

chimera, chimera X, packmol