Computational methods

Hit Identification

Method type (check all that applies)

Deep learning

High-throughput docking

Machine learning

Physics-based

Hybrid of the above

Our CODASS 4.0 workflow has components of all the boxes checked above (see details below)

Description of your approach (min 200 and max 800 words)

CODASS4

Our previous proposal to CACHE3 was ranked joint 6th out of 33 entries by the CACHE3 peer review process. Here we describe a substantially enhanced version. CODASS4 relies on the data fusion concept of consensus to significantly boost the reliability of both its small molecule binding mode (herein referred to as “pose”) predictions, and its predictions of ligand binding likelihoods.CODASS4 is a sophisticated process, and is therefore best understood when visualised as a workflow diagram:

https://docs.google.com/presentation/d/19pmIHdPchmF53ewnVFS2f1mmowjLS6FT/edit?usp=share_link&ouid=107110117710701955698&rtpof=true&sd=true

(please click on the link, or copy-paste to a web browser).

The improvements in CODASS4 include:

Consensus Methods:

Consensus Docking - also known as consensus posing or multi-docking (and not to be confused with consensus scoring), consensus docking leverages multiple docking programs to substantially boost the reliability of the predicted poses that are fed into the downstream scoring schemes. Rigorous evaluation by multiple research groups (including ours, which made the initial discovery) has shown this to hold true under a variety of conditions; see our publications and the following, which all reference our Houston & Walkinshaw 2013 publication that originally reported the concept:

https://pubmed.ncbi.nlm.nih.gov/27311630/

https://doi.org/10.2174/1573407214666181023114820

https://pubmed.ncbi.nlm.nih.gov/36615367/

In addition to our standard CODASS usage of Autodock-GPU, Vina-GPU+ and GWOVina, three new pose prediction methods have been added to our consensus docking scheme:

SILCS-MC: This newly adopted MD approach is substantially cheaper computationally than conventional MD for large numbers of compound evaluations. The consideration of local target backbone flexibility and atom-level target-water interactions are significant advantages of this method over docking tools. https://doi.org/10.1021/acs.jcim.9b00210

DiffDock: Like SILCS-MC, the pose prediction accuracy of DiffDock has been shown to be agnostic of local variations in protein backbone as well as side chains within the target’s active ligand binding site pocket. https://arxiv.org/abs/2210.01776v2

DeepDock: The geometric deep learning approach of DeepDock learns a potential that is specific for each ligand-target complex. Thus, it can be retrained using the valuable information represented by the 895 ligands with known IC50s, producing a target-specific version of this algorithm. See more below on the advantages of training target-specific approaches. https://doi.org/10.1038/s42256-021-00409-9

SAIYAN: We will be exploring the use of SAIYAN, our latest DL high-throughput, structure-based binding prediction software. SAIYAN uses an exclusive approach allowing it to leverage over an order of magnitude more training data than previous ML docking-based tools, improving its accuracy and generality. Its architecture leverages geometric deep learning and GPU parallelism, enabling ultra-fast inference across multi-billion chemical libraries.

Consensus Scoring

Classical Scoring Functions: a battery of methods based on force-field, knowledge-based, and machine learning approaches, with a proven track record in reliability (see our publications). An additional open-source method currently under late-stage in-house development in-house, RFRanker, will be added to this consensus scoring scheme.

Generic DL Scoring Function: An evolution of the DL-based Scoring Function SCORCH (which we recently published, see relevant publications), SCORCH 2.0 is itself a consensus method as it combines the ML/DL methods of its predecessor SCORCH 1.0 (GBDT using XGBoost, a FF NN, and a W&D NN) in a new way, namely by implementing a consensus model by average prediction. The result is superior screening, and ranking power as well as a higher throughput.

Target-specific DL Scoring Function: Target-specific Scoring Functions are generally much more predictive than generic SFs. http://dx.doi.org/10.1016/j.ddtec.2020.09.001; https://doi.org/10.1016/j.cbpa.2021.04.009. One example in the literature is a prospective study that used a Scoring Function based on a docking-based QSAR model trained with as few as 47 target-docked actives: https://doi.org/10.1016/j.ejmech.2014.01.019. The 895 known ligands of the CBLB enable the training of a target-specific Scoring Function, a trivial task for our group, given our experience with training four different DL SFs (see above), particularly as there are now tools to facilitate the generation and validation of such SFs:

https://www.sciencedirect.com/science/article/pii/S1740674920300135?via%3Dihub#sec0020

Final consensus scores are calculated in two ways: 1. The conventional “Rank-by-rank” scheme originally reported to be superior by https://doi.org/10.1021/ci010025x and 2. The new “Exponential consensus ranking” method reported by https://doi.org/10.1038/s41598-019-41594-3

Similarity Search

The second crucial element of CODASS is its Similarity Search component, which enables the screening of much larger virtual chemical libraries than structure-based methods alone. In addition, it enables the use of the 895 known ligands (with predicted binding poses supplied by our consensus docking scheme) to search for compounds with similar pharmacophores or protein-ligand interaction patterns but different chemical templates.

Free FEature POint PharmacophoreS (OpenFEPOPS): Our own open-source implementation of the FEPOPS scaffold-hopping molecular similarity evaluation method (https://doi.org/10.1016/j.jmgm.2007.02.005) has been integrated into CODASS4, complementing our existing 2D (FP2+Tanimoto), Graph (USRCAT) and 3D-based (Autodock-SS) similarity techniques for querying large chemical databases. This allows the identification of potential binders in accurate pose prediction regimes which would have been missed using other similarity matching techniques. Autodock-SS, USRCAT and OpenFEPOPS can all be considered “scaffold-hopping” methods, as they do not rely on molecular templates to perform their similarity evaluations and are thus suited to identify novel templates.

What makes your approach stand out from the community? (<100 words)

Our focus is on using data fusion to leverage a comprehensive battery of state-of-the-art and beyond-state-of-the-art structure-based and ligand-based methodologies in a sophisticated approach that has been expanded and refined over the last 11 years. Our rigorous in-house testing, against many different targets, in both prospective and retrospective studies, has shown that consensus methods (which include consensus pose prediction and the distinct consensus scoring) are more reliable than single methods used alone. CODASS4 represents our latest workflow to increase the size of chemical libraries that can be screened, and predictive power. It also provides built-in redundancy and thus contingency.

Method Name

COmbined Docking and Similarity Search 4.0 (CODASS4)

Commercial software packages used

None

Free software packages used

AutoDockTools, Autodock-GPU, Vina-GPU+, GWOVina, RF-Score-VS v2, SCORCH 2.0, Osiris DataWarrior, PDB2PQR, OpenBabel, RDKit, Autodock-SS, Filter-it, FEPOPS, SILCS (free to academics), Miniconda, GROMACS, USRCAT, OpenFEPOPS

Relevant publications of previous uses by your group of this software/method

SCORCH: Improving structure-based virtual screening with machine learning classifiers, data augmentation, and uncertainty estimation. doi: 10.1016/j.jare.2022.07.001.

Comparison of ATP-binding pockets and discovery of homologous recombination inhibitors. doi: 10.1016/j.bmc.2022.116923

Consensus Docking: Improving the Reliability of Docking in a Virtual Screening Context. doi:10.1021/ci300399w

Design of drug-like hepsin inhibitors against prostate cancer and kidney stones. doi: 10.1016/j.apsb.2019.09.008

Structure- and ligand-based virtual screening identifies new scaffolds for inhibitors of the oncoprotein MDM2. doi: 10.1371/journal.pone.0121424

Identification and activity of inhibitors of the essential nematode-specific metalloprotease DPY-31. doi: 10.1016/j.bmcl.2015.10.077

Inhibition of the ERCC1-XPF structure-specific endonuclease to overcome cancer chemoresistance. doi: 10.1016/j.dnarep.2015.04.002

Discovery of a novel ligand that modulates the protein-protein interactions of the AAA+ superfamily oncoprotein reptin. doi: 10.1039/c4sc03885a

UFSRAT: Ultra-fast Shape Recognition with Atom Types - the discovery of novel bioactive small molecular scaffolds for FKBP12 and 11βHSD1. doi: 10.1371/journal.pone.0116570

Gao Y, Houston DR. “A new Score Function Based on a Random Forest Model for Structure-based Virtual Screening”, manuscript in preparation

Gumbis G, Ben Y, Houston DR. "Creation of Pharmacophore Model for Small Molecule Inhibitors of T. brucei Phosphofructokinase and Analysis of Inhibitor-Protein Complex by Docking, Molecular Dynamics and SEQM", manuscript in preparation

Boyang N, Wang R, Khalaf H, Blay-Roger V, Houston DR. “Autodock-SS: AutoDock for Multiconformational Ligand-Based Virtual Screening”, manuscript in preparation.

Note preliminary methodologies, data and findings for the above manuscripts in preparation will automatically be made open to the public in Mar/Apr 2023 in the form of MSc Dissertations, according to the University of Edinburgh’s Central Library publishing schedule:

https://www.sps.ed.ac.uk/students/postgraduate/taught-msc/your-studies/msc-taught-dissertations/msc-dissertation-library

The original CODASS 1.0 method was presented at the "Computational Chemical Biology: probing biology with in silico tools” Conference at The University of Manchester in 2012:

https://www.researchgate.net/publication/259216315_CODASS_A_New_Process_for_Ligand_Discovery_In_Silico

Our CODASS 3.0 workflow was described in detail in our CACHE3 application.

All of our tools, methodology and workflow (which we have named COmbined Docking And Similarity Search 4.0 or CODASS4), are open-source, and described in the literature (or soon will be). Our latest improvements are so new that some of them are not yet published. These are listed in the References section as Manuscripts in Preparation.

Virtual screening of merged selections

Method type (check all that applies)

Deep learning

High-throughput docking

Machine learning

Physics-based

Hit Optimization Methods

Method type (check all that applies)

Deep learning

High-throughput docking

Machine learning

Physics-based

Hybrid of the above

CODASS4.1

Description of your approach (min 200 and max 800 words)

Our method will be very similar to CODASS4, except the additional Kd data will be added to the existing data used to re-train DeepDock and our target-specific SF, and the LBVS components of CODASS4 will come to the fore to provide compounds suitable to build a more detailed QSAR picture.

What makes your approach stand out from the community? (<100 words)

Our use of target-specific methods and incorporation of the additional binding data.

Method Name

CODASS4.1

Commercial software packages used

None

Free software packages used

Same as CODASS4

Relevant publications of previous uses by your group of this software/method

Same as CODASS4

Challenge #4