Computational methods

Hit Identification

Method type (check all that applies)

High-throughput docking

Machine learning

Physics-based

Description of your approach (min 200 and max 800 words)

Our Approach

Summary. The proposed methodology will be anchored by our in-house, flexible small-molecule design engine Sq-SYNT. The design engine leverages a suite of both physics- and data-driven protocols that can be flexibly assembled into a problem-specific workflow. In this proposal, we use a tailored design approach anchored on

Fragment-guided exploration of large tracts of chemical space as represented by the Enamine REAL database.
Capturing key solvent-mediated interactions, which are expected to be important in ligand recognition within the CBLB binding site [PDB ID 8GCY] [1].
Building on the understanding of ligand recognition in the hit identification step using an adapted LIE (adLIE) and FEP approach suited for ligand optimization [2,3].

Hit identification:

Step-1A: Identification of High Occupancy Solvent Sites in the Binding Pocket: The structure of a small-molecule inhibitor-bound CBLB has revealed the importance of bridging waters (PDB ID: 8GCY) [1]. Hence, as a first step, we wish to include a robust solvent network within the binding site prior to large-scale virtual screening. The inclusion of waters from the single small-molecule inhibitor-bound structure could bias virtual screening towards the identification of scaffolds similar to the co-crystal ligand (isoindolin-1-one). Hence, to identify high occupancy solvent molecules in the binding site, molecular dynamics (MD) simulation will be performed on an apo-like (ligand-removed) CBLB structure with restraints on protein atoms. From the trajectories, high-occupancy solvent sites will be identified using energetics and analysis of translation. These high-occupancy solvent sites will be compared with those available from all CBLB structures in the PDB. Besides the small-molecule inhibitor-bound structure, three CBLB structures with peptide inhibitors have been reported (PDB ID: 3PFV, 3ZNI, and 5AXI) [4-6]. These peptide inhibitors bind to a different site when compared to the small-molecule inhibitor and reveal a water network within the unoccupied small-molecule binding site. Combinations of high-occupancy solvent sites from MD simulations and experimentally obtained waters will be used to create multiple screening models of the CBLB binding site, which vary in terms of included waters. To select the ideal structure for virtual screening, a retrospective set of known ligands (CHEMBL, SDF file from CACHE) with IC50/Kd/Ki < 10 micromolar along with property-matched decoys will be created [7]. This retrospective set is then to be screened against the different models and evaluated for early enrichment and aLogAUC [8]. The model providing the best early enrichment and alogAUC along with consistent ligand poses will be selected for prospective screening using the large Enamine REAL database.

Step-1B: Virtual Screening on the Selected Receptor Model and the use of a Fragment-guided approach

The guiding principle in our virtual screening strategy will be the use of a fragment set from the enamine REAL library [HAC: 15-18 atoms] as a “probe” step towards exploration of a larger chemical space within the same database. This provides all the advantages of fragment-based design, without the need for a fragment screening step. Initially, the created fragment library (HAC: 15-18 of Enamine real) will be evaluated for the presence of potentially toxic, reactive or PAINS scaffolds and removed. Such a filtered library shall be created using in-house SMARTS-based filters and data-driven toxicity prediction protocols. The “cleaned” set is then to be screened against the selected structure using AUTODOCK [9]. The top-ranked molecules (~top 1% of the database) are to be further scrutinized based on clustering (2D similarity) and physicochemical clustering. The resulting clusters shall be inspected and a diverse scaffold set is to be created from these clusters. In the next step, super structures of the above fragment-derived scaffolds (using Tversky similarity) with heavy atom count of 21-25 will be constructed from the Enamine REAL database. The HAC cutoff selected is a balance between allowing sufficient space to further optimize the most promising hits and identifying strong binders below the Kd cutoff of 30 micromolar. This obtained library of superstructures is to be screened against the same structural model using AUTODOCK [9]. The top-ranked molecules from the docking will then be visually inspected. Finally, a diverse set of 100 compounds will be provided for screening to the competition. In total, a large chemical subspace from nearly ~2 Billion compounds from the Enamine REAL library would have been evaluated.

Feasibility of our hit identification approach at TKB domain of CBLB:

Previously 895 molecules were reported in the literature for this domain, which was provided as SDF file in this challenge and these compounds have IC50 in the range of 100-5000 nM. The average ligand efficiency (LE) calculated for all the above compounds is 0.28 kcal/mol/HA. With our approach, we will try to achieve Kd < 30 micromolar with a heavy atom count of 21-25, which translates to ligand efficiency 0.26-0.3 kcal/mol/HA, which is in-line with the previously achieved mean LE. Further, our virtual screening strategy combines two screening steps and considers the importance of solvent-mediated interactions within the CBLB site. Thus, a significant fraction of the hits are expected to have LE better than the mean from currently known ligands (0.28 kcal/mol/HA).

What makes your approach stand out from the community? (<100 words)

We believe the following aspects of our approach makes it unique-

The inclusion of high-occupancy solvent molecules in the binding site using a blend of MD simulations and a ligand-guided modelling approach allows us to capture the key water-mediated interactions within the small-molecule binding site of CBLB.
An Incremental guided exploration of chemical space using a multi-step increase in heavy atom count from 15 to 31 in various steps of screening. This permits us to explore the large Enamine REAL database in an unbiased manner without allowing approximate scoring functions of molecular docking from becoming a limiting factor.

Method Name

Solvent-Augmented and Fragment-guided large virtual screening using Sq-SYNT

Commercial software packages used

Free software packages used

Gromacs, Autodock, Q6, Python, RdKit, ProTox, SwissADME

Relevant publications of previous uses by your group of this software/method

References:

Co-crystal structure of CBL-B in complex with N-Aryl isoindolin-1-one inhibitor Kimani, S., Zeng, H., Dong, A., Li, Y., Santhakumar, V., Arrowsmith, C.H., Edwards, A.M., Halabelian, L. Structural Genomics Consortium (SGC), (to be published).

“Adapted Linear Interaction Energy”: A Structure-Based LIE Parametrization for Fast Prediction of Protein–Ligand Affinities.Linder, M., Ranganathan, A., & Brinck, T. J.Chem.Theory.Comput., (2013), 9(2), 1230-1239.

Fragment optimization for GPCRs by molecular dynamics free energy calculations: probing druggable subpockets of the A 2A adenosine receptor binding site. Matricon, P*., Ranganathan, A*., Warnick, E., Gao, Z.G., Rudling, A., Lambertucci, C., Marucci, G., Ezzati, A., Jaiteh, M., Dal Ben, D. Jacobson, K.A. and Carlsson. J., sci. rep. 7.1 (2017): 1-12. * Equal contribution authors

Structure of Cbl-b TKB domain in complex with EGFR pY1069 peptide (To be published). Chaikuad, A., Guo, K., Cooper, C.D.O., Ayinampudi, V., Krojer, T., Muniz, J.R.C., Vollmar, M., Canning, P., Gileadi, O., von Delft, F., Arrowsmith, C.H., Weigelt, J., Edwards, A.M., Bountra, C., Bullock, A., Structural Genomics Consortium (SGC) Crystal, (to be published).

Essentiality of a non-RING element in priming donor ubiquitin for catalysis by a monomeric E3. Dou, H., Buetow, L., Sibbet, G. J., Cameron, K., & Huang, D. T. Nat. Struc. Mol. Biol., (2013), 20(8), 982-986.

Structural analysis of the TKB domain of ubiquitin ligase Cbl-b complexed with its small inhibitory peptide, Cblin. Ohno, A., Ochi, A., Maita, N., Ueji, T., Bando, A., Nakao, R & Nikawa, T. Arch. Biochem. Biophy., (2016), 594, 1-7.

Huang, N., Shoichet, B. K., & Irwin, J. J. Benchmarking sets for molecular docking. J. Med. Chem, (2006), 49(23), 6789-6801.

Fragment-based discovery of subtype-selective adenosine receptor ligands from homology models. Ranganathan, A., Stoddart, L. A., Hill, S. J., & Carlsson, J.. J.Med. Chem. (2015),58(24), 9578-9590..

Autodock4 and AutoDockTools4: automated docking with selective receptor flexiblity. Morris, G. M., Huey, R., Lindstrom, W., Sanner, M. F., Belew, R. K., Goodsell, D. S. and Olson, A. J. J. Comp. Chem. (2009), 16: 2785-91.

Hit Optimization Methods

Method type (check all that applies)

Free energy perturbation

High-throughput docking

Machine learning

Physics-based

Description of your approach (min 200 and max 800 words)

Hit optimization: Up to the five most promising hits will be prioritized after experimental screening at CBLB. Hit expansion/optimization will be performed using an adapted LIE (adLIE) approach and a FEP method suited for SAR and hit optimization [2,3]. Firstly an extensive analog set for the 5 hits will be constructed by finding superstructures of the core scaffolds (using Tversky similarity) from the enamine REAL library (HAC < 31) and identifying synthetically feasible analogs using named reactions. Both the adLIE and FEP methods use MD simulations by considering solvent molecules explicitly. The initial experimental results will be used to augment our solvent network analysis and aid the creation of MD topologies for adLIE and FEP calculations [2,3]. Initially, the analog libraries will be screened using adLIE and the top-ranked molecules by binding free energy will be re-evaluated using FEP. The top ~50 analogs obtained from the FEP calculations will be provided for experimental evaluation at TKB domain of CBLB.

What makes your approach stand out from the community? (<100 words)

Our specialized adLIE and FEP protocols for hit expansion are founded on fragment optimization protocols. These protocols are ideal for hit-optimization. Furthermore the optimization workflow is well-equipped to leverage selective solvent displacement from partially desolvated sites and have proprietary protocols to handle scaffold-hopping if required.

Method Name

Sq SYNTs flexible hit optimization protocol

Commercial software packages used

Free software packages used

Gromacs, Autodock, Q6, Python, RdKit, ProTox, SwissADME

Relevant publications of previous uses by your group of this software/method

“Adapted Linear Interaction Energy”: A Structure-Based LIE Parametrization for Fast Prediction of Protein–Ligand Affinities.Linder, M., Ranganathan, A., & Brinck, T. J.Chem.Theory.Comput., (2013), 9(2), 1230-1239.

Fragment optimization for GPCRs by molecular dynamics free energy calculations: probing druggable subpockets of the A 2A adenosine receptor binding site. Matricon, P*., Ranganathan, A*., Warnick, E., Gao, Z.G., Rudling, A., Lambertucci, C., Marucci, G., Ezzati, A., Jaiteh, M., Dal Ben, D. Jacobson, K.A. and Carlsson. J., sci. rep. 7.1 (2017): 1-12. * Equal contribution authors

Challenge #4