Our Approach
Summary. The proposed methodology will be anchored by our in-house, flexible small-molecule design engine Sq-SYNT. The design engine leverages a suite of both physics- and data-driven protocols that can be flexibly assembled into a problem-specific workflow. In this proposal, we use a tailored design approach anchored on
- Fragment-guided exploration of large tracts of chemical space as represented by the Enamine REAL database.
- Capturing key solvent-mediated interactions, which are expected to be important in ligand recognition within the CBLB binding site [PDB ID 8GCY] [1].
- Building on the understanding of ligand recognition in the hit identification step using an adapted LIE (adLIE) and FEP approach suited for ligand optimization [2,3].
Hit identification:
Step-1A: Identification of High Occupancy Solvent Sites in the Binding Pocket: The structure of a small-molecule inhibitor-bound CBLB has revealed the importance of bridging waters (PDB ID: 8GCY) [1]. Hence, as a first step, we wish to include a robust solvent network within the binding site prior to large-scale virtual screening. The inclusion of waters from the single small-molecule inhibitor-bound structure could bias virtual screening towards the identification of scaffolds similar to the co-crystal ligand (isoindolin-1-one). Hence, to identify high occupancy solvent molecules in the binding site, molecular dynamics (MD) simulation will be performed on an apo-like (ligand-removed) CBLB structure with restraints on protein atoms. From the trajectories, high-occupancy solvent sites will be identified using energetics and analysis of translation. These high-occupancy solvent sites will be compared with those available from all CBLB structures in the PDB. Besides the small-molecule inhibitor-bound structure, three CBLB structures with peptide inhibitors have been reported (PDB ID: 3PFV, 3ZNI, and 5AXI) [4-6]. These peptide inhibitors bind to a different site when compared to the small-molecule inhibitor and reveal a water network within the unoccupied small-molecule binding site. Combinations of high-occupancy solvent sites from MD simulations and experimentally obtained waters will be used to create multiple screening models of the CBLB binding site, which vary in terms of included waters. To select the ideal structure for virtual screening, a retrospective set of known ligands (CHEMBL, SDF file from CACHE) with IC50/Kd/Ki < 10 micromolar along with property-matched decoys will be created [7]. This retrospective set is then to be screened against the different models and evaluated for early enrichment and aLogAUC [8]. The model providing the best early enrichment and alogAUC along with consistent ligand poses will be selected for prospective screening using the large Enamine REAL database.
Step-1B: Virtual Screening on the Selected Receptor Model and the use of a Fragment-guided approach
The guiding principle in our virtual screening strategy will be the use of a fragment set from the enamine REAL library [HAC: 15-18 atoms] as a “probe” step towards exploration of a larger chemical space within the same database. This provides all the advantages of fragment-based design, without the need for a fragment screening step. Initially, the created fragment library (HAC: 15-18 of Enamine real) will be evaluated for the presence of potentially toxic, reactive or PAINS scaffolds and removed. Such a filtered library shall be created using in-house SMARTS-based filters and data-driven toxicity prediction protocols. The “cleaned” set is then to be screened against the selected structure using AUTODOCK [9]. The top-ranked molecules (~top 1% of the database) are to be further scrutinized based on clustering (2D similarity) and physicochemical clustering. The resulting clusters shall be inspected and a diverse scaffold set is to be created from these clusters. In the next step, super structures of the above fragment-derived scaffolds (using Tversky similarity) with heavy atom count of 21-25 will be constructed from the Enamine REAL database. The HAC cutoff selected is a balance between allowing sufficient space to further optimize the most promising hits and identifying strong binders below the Kd cutoff of 30 micromolar. This obtained library of superstructures is to be screened against the same structural model using AUTODOCK [9]. The top-ranked molecules from the docking will then be visually inspected. Finally, a diverse set of 100 compounds will be provided for screening to the competition. In total, a large chemical subspace from nearly ~2 Billion compounds from the Enamine REAL library would have been evaluated.
Feasibility of our hit identification approach at TKB domain of CBLB:
Previously 895 molecules were reported in the literature for this domain, which was provided as SDF file in this challenge and these compounds have IC50 in the range of 100-5000 nM. The average ligand efficiency (LE) calculated for all the above compounds is 0.28 kcal/mol/HA. With our approach, we will try to achieve Kd < 30 micromolar with a heavy atom count of 21-25, which translates to ligand efficiency 0.26-0.3 kcal/mol/HA, which is in-line with the previously achieved mean LE. Further, our virtual screening strategy combines two screening steps and considers the importance of solvent-mediated interactions within the CBLB site. Thus, a significant fraction of the hits are expected to have LE better than the mean from currently known ligands (0.28 kcal/mol/HA).