Computational methods

Hit Identification

Method type (check all that applies)

De novo design

Deep learning

Machine learning

Physics-based

Hybrid of the above

We employ a combination of physics-based and machine learning appraches to nominate top-scoring hits

Description of your approach (min 200 and max 800 words)

Our approach combines expertise of Kozakov Lab at Stony Brook and Tropsha Lab at UNC. Our workflow uses several complimentary modules for identification of high affinity hits for a given protein target with a known 3D structure. Identification of the binding site hot-spot information together with conventional structure-based virtual screening methods enhanvced by generative modeling are key enabling components of our hit selection approach. First, we will use FTMap, a computational mapping algorithm that identifies binding regions on the surface of the target protein with major contributions to the ligand binding free energy. FTMap samples all possible positions of small organic molecule probes and scores them using a physical energy function. The binding site regions that bind multiple probes identify the binding hot spots and the corresponding favorable chemical groups. In addition to hypothesis-free FTMap, we will use a different approach towards hot spot identification, LigTBM, that was inspired by structural similarity search methods. The basic idea is to match physico-chemical environment of the protein to the micro pockets containing small organic molecule probes extracted from PDB structures containing bound ligands. Early version of LigTBM was the top performer by docking model accuracy at NIH sponsored D3R competition, and most recently in the CASP ligand experiment. This matching procedure will also provide us with possible fragment placement within the target protein, so the data will be presented in the same form as FTMap data, which will facilitate their comparative analysis and identification of consensus hot spots. Finally, we will employ a recently developed deep learning based extension of FTMap, which mimics the original FTMap but uses an improved scoring function during sampling, which is trained on fragment PDB data. The hot spot information will be used to create a pharmacophore model for the next stage of virtual screening. Using this model, we then will perform a pharmacophore-based screening of the entire Enamine REAL library (~40B with tautomers) to select a subset of the target specific compounds based on the fitting to our pharmacophore hypothesis. In addition, we will use pharmacophore models to bias our deep and reinforcement learning method termed ReLeaSE to generate target-specific novel hit compounds. These hit compounds will be employed as queries for another round of similarity searching against the entire Enamine library to identify purchasable compounds most similar to the generative hits. The combined set of pharmacophore- and generatve compound based virtual screening (typically, ~1M molecules) will be docked into the binding site using Glide by Schrödinger. The top scored docking hits then will be additionally prioritized using the hot spot information and the LigTBM type approach. These consensus hits will be nominated for the experimental testing.

What makes your approach stand out from the community? (<100 words)

We employ predominantly methodologies and software developed within our groups. Unique features of our approach reside with our use of computionally means (FTMap and LigTBM) for the identification of hot spots, which are used to formulate the pharmacophore hypotheses. Additional unique element is the use of generative and reinforcement learning (with the bias provided by pharmacophore hypotheses) to design computational hits de novo. Further, we nominate Enamine compounds that fit pharmacophores as well as molecules similar to the generative hits as candidates for the experimental testing; however, we preserve generative hits as these compounds may be useful in the next phase of hit optimization.

Method Name

Frag2Hits

Commercial software packages used

Glide by Schrödinger

Free software packages used

FTMap server (https://ftmap.bu.edu/), RDKit; ReLeaSE (https://github.com/isayev/ReLeaSE)

Relevant publications of previous uses by your group of this software/method

Kozakov D, Grove LE, Hall DR, Bohnuud T, Mottarella SE, Luo L, Xia B, Beglov D, Vajda S. The FTMap family of web servers for determining and characterizing ligand-binding hot spots of proteins. Nature Protocols. 2015

Popova M, Isayev O, Tropsha A.* Deep reinforcement learning for de novo drug design. Sci Adv. 2018 Jul 25;4(7):eaap7885. doi: 10.1126/sciadv.aap7885;

Alekseenko, A.; Kotelnikov, S.; Ignatov, M.; Egbert, M.; Kholodov, Y.; Vajda, S.; Kozakov, D. ClusPro LigTBM: Automated Template-Based Small Molecule Docking. J. Mol. Biol. 2019. https://doi.org/10.1016/j.jmb.2019.12.011;

Korshunova M, Ginsburg B, Tropsha A, Isayev O. OpenChem: A Deep Learning Toolkit for Computational Chemistry and Drug Design. J Chem Inf Model. 2021 Jan 25;61(1):7-13. doi: 10.1021/acs.jcim.0c00971

Challenge #4