CHALLENGE #2 – COMPUTATIONAL METHODS
Here is a list of all computational methods used for hit identification in CACHE challenge #2. Click on the application ID for more details.
Our goal with this competition is to evaluate our released tools. Namely, gnina (https://github.com/gnina) and pharmit (http://pharmit.csb.pitt.edu).
Read more...- Selection of 10,000: compounds available for purchase (in stock) will be obtained from the ZINC database, from which Morgan fingerprints are computed using RDKit with BAMBU(https://pypi.org/project/bambu-qsar/). The outliers and dimensionality of the dataset will be reduced using Principal Component Analysis (PCA), preserving 95% of the variance, followed by the UMAP algorithm, reducing to two dimensions.
AutoDock Vina 1.1.2, AutoDock Tools, Primordia, RDKit, BINANA, PaDEL Descriptors, ZINC Database, PDB Database, Alphafold EBI Database, Gromacs, Pymol, VMD, Python and Biopython
Our hit identification workflow combines physics-based cheminformatics methods together with novel machine learning algorithms. We employ a fragment-based virtual screening with significant speed-ups from our novel pharmacophore matching algorithm. Secondly, we enrich the pool of the potential hits with de novo generated drug-like candidates. These candidates are then ranked and refined using sequential binding affinity estimation techniques of increasing accuracy.
Read more...- Molecular Operating Environment (MOE)
- Glide (Schrödinger)
- Q-Chem
- Gaussian
- AutoDock Vina
- Protein-Ligand ANT System (PLANTS)
- GROMACS
- Dock 3.7 (Kuntz Group UCSF)
Using our expertise in medicinal chemistry, structural biology, cheminformatics, machine learning (ML) and structure-based drug design (SBDD) we will generate hits for the RNA-binding site of the SARS-CoV-2 NSP13 helicase.
Read more...In-house
GROMACS and AMBER (if required).
The hit identification and drug discovery strategy consist in high-throughput docking for the identification of modulators of the NSP13 helicase of SARS-CoV-2.
Read more...Schrodinger's Drug Discovery Suite, BioSolvIT SeeSAR, MolSoft ICM.
None.
Abstract
Read more...Schöringer SMD Suite, GRID, Flap, BioGPS
CmDock, PyMOL, Q, R, Python, RDKit, KNIME, ProBiS
Our approach consists of two general steps, each of which has some flexibility.
Read more...None
Python
Autodock vina
pytorch, gpytorch, botorch
Our proposed pipeline consists of four steps. As a preliminary step, because of the four similar protein PDBs for this CACHE challenge, we will run unrestrained MD simulations for all four PDB structures and compare the resulting Boltzmann Distributions. If no major differences can be found, we will limit further steps to PDB 5RLZ.
Read more...Gaussian
OpenMM, OpenForceField, Gromacs, MDAnalysis, AmberTools, Autodock Vina, Ledock, Plants, internally developed machine learning models (MILCDock)
Our team of Computational Chemists and Machine Learning experts is part of a Science CRO that has continuous impact on the drug discovery community by collaborating with big pharma and incubating biotechs. In drug discovery projects, we prioritize compounds based on our million-scale in-house compound database, which includes structures, bioactivities, and PhysChem data.
Read more...Molsoft ICM-Pro
CCG MOE
AMBER
KNIME server
Python + libraries (OpenMM, RDKit, pandas, matplotlib, numpy)
PyRod
The project will begin with a structure-based analysis of the RNA binding cavity of NSP13, based on the crystal structure 7KRN, using molecular dynamics simulations together with in-house program PyRod [1,2] to sample interaction points in the binding pocket. Briefly, PyRod traces water molecules in protein binding cavities and generates dynamic maps describing the interaction patterns of the water molecules with respect to the protein.
Read more...InteLigand - LigandScout
CCG - MOE
Schrodinger - Desmond
CCDC - GOLD
OpenEye - Szybki
PyRod
De novo hit identification will be pursued using a fragment growing/linking approach, followed by free energy calculations (if time). Designed compounds will be used as targets in a similarity screen of the Enamine Real Database catalog, or synthesized directly in house:
Read more...N/A
FEgrow: https://github.com/cole-group/FEgrow
gnina: https://github.com/gnina/gnina
DeLinker: https://github.com/oxpig/DeLinker
We present an end-to-end lead optimization system for discovery based on an AI-gym environment called ``Reinforcement Learning for Molecular Modeling" (RLMM). RLMM automates running fully customizable molecular dynamic simulations inside of an agent-based molecular design protocol. RLMM is fully autonomous---from a single starting ligand, protein structure, and configuration file, RLMM cycles through designs for lead optimization informed by physics-based simulations.
Read more...OpenEye
RDKit, OpenMM, AMBER20
We will identify the most conserved residues of the NSP13 RNA-Binding tunnel where there are co-crystalized fragments (PDB: 5RML, 5RMM, 5RLZ and 5RLH) by performing multiple sequence alignment (MSA) with the Kalign algorithm on approximately 200,000 SARS-CoV-2 NSP13 sequences from the NCBI. We will determine the amino acids close to these fragments that can form interactions with the predicted hit molecules.
Read more...- Molecular Operating Environment (MOE) by the Chemical Computing Group
- OpenEye- fastROCS (hit identification using shape information), OMEGA (generating conformations), MakeReceptor (preparing binding pocket for docking) and FRED (docking)
- Pipeline Pilot (Biovia)
- Kalign
- In-house MoPBS pharmacophore generation software
- In-house VS streamlining software DataPype
The in-stock 3D molecules from the ZINC20 database or Mcule Purchasable molecules will be subjected to common filters after duplicates are removed and conformers will be generated.
Read more...BIOVIA Discovery Studio Client
RDKit, AutoDock VINA, AutoDock, graphDelta, Gromacs, gmx_MMPBSA
We propose to apply a massive library screening workflow which exhaustively screens the 4.5 billion compound Enamine REAL database using a deep-learning-based Drug Target Interaction (DTI) prediction engine to identify molecules likely to bind to the RNA binding site of NSP13 helicase of SARS-CoV-2.
Read more...MatchMaker (Cyclica Inc.)
Python-based ML stack (PyTorch, scikit-learn)
BioPython computational biology toolkit
RD-Kit computational chemistry toolkit
Various structural biology tools for structural analysis and visualization, including P2Rank, NGL viewer, Autodock Vina.
For the hit identification we rely on the combination of different methods used and developed in our group. The workflow follows the visual inspection of all structures together with quality assessment in order to choose the most suitable virtual screening workflow.
Expansion of the co-crystallized fragments is planned to be done in parallel from two main approaches:
Read more...- Pipeline Pilot (BioVia)
- Spark, Ignite (Cresset)
- Glide, ABFEP (Schrodinger)
- FlexX, Ftrees, SpaceLight (BioSolveIT)
- Synthesia (Univ. Hamburg) -> free for academic use
- Arthor (NextMove)
I have developed a genetic algorithm (GA) that can search Enamine Real Space and will use it to find molecules with good docking scores to the target.
Read more...Glide, SmallWorld, FTrees
Synthon-GA
Our approach combines the expertise of Kozakov Lab at Stony Brook and Tropsha Lab at UNC. Our workflow uses several complimentary modules for identification of high affinity hits for a given protein target with a known 3D structure. Identification of binding site hot-spot information together with conventional structure-based virtual screening methods are enabling components of our hit selection approach.
Read more...Glide, to additionally veryfy nominated hits
Autodock, FTMap, LigTBM, ReLeaSE
We developed a multi-scale and multi-task neural network to learn binding poses and binding affinities between compounds and proteins. The model takes geometric graph representation of compounds and proteins as input. The compound was processed by a physics-driven graph neural network, integrating the geometry and momentum information into the topological structure. While the protein was processed by a multi-scale graph neural network, connecting surface to structure and sequence.
Read more...Python, Torch, RDKIT, Biopython, P2Rank
FRASE-bot is a computational platform enabling de novo construction of small-molecule ligands directly in the binding pocket of a target protein. It makes use of machine learning to distill 3D information relevant to the protein of interest from thousands of 3D protein-ligand complexes in the Protein Data Bank (PDB) and respective structure-activity relationships (SAR).
Read more...Schrodinger, Pipeline Pilot
RDKit, Keras/TensorFlow
Our team will recommend compounds predicted to bind to the RNA-binding site of the SARS-CoV-2 helicase NSP13 and with potential for subsequent medicinal chemistry optimization. To this end, we will first filter a set of commercially available molecules (including those suggested in the CACHE guidelines) to reduce potential safety liabilities and undesired chemical reactivity and maximize lead-likeness. This step will also considerably reduce the chemical space that needs to be considered.
Read more...StarDrop
Autodock, PSOVina2, GWOVina, RF-Score-VS v2, SCORCH, Osiris DataWarrior, PDB2PQR, OpenBabel, RDKit
Modular synthon-based approach - V-SYNTHES was published in Nature 601, 452–459 (2022). It first identifies the best scaffold–synthon combinations as seeds suitable for further growth, and then iteratively elaborates these seeds to select complete molecules with the best docking scores.
Read more...ICM-Pro is provided by MolSoft.
RDKit, KNIME
Foldit is a crowd-sourced molecular biology game. For this challenge, Foldit players will use the graphical small molecule design tools to manually add atoms, bonds and fragments to a starting ligand with the binding pocket (derived from the crystal structures with starting fragments) to optimize the designed ligand for binding into the protein pocket.
Read more...Foldit/Rosetta/RDKit/ZINC API/BCL/OpenBabel
We will screen out the hit compounds using a Structural Systems Pharmacology (SSP) scheme (1-2). In this scheme, the core is the function-site interaction fingerprint (Fs-IFP) approach (3). Using the Fs-IFP approach, we explore the structural insights into the binding sites across the whole structural proteome. Additionally, this SSP scheme combines MD simulations, Free energy calculations, and machine learning models.
Read more...IChem (from Dr. Rognan group)
Autodock Vina (Docking)
Acemd (MD simulations)