A combined ligand-, structure-, and interaction-based approach will be applied to identify novel ligands for MCHR1. In this combination, the structure-based component enables reaching out far beyond the training set, while available structure-activity relationships (SAR) data help keeping those false-positive-prone techniques in check.
First, interaction-based hits will be generated using the FRASE-based hit-finding robot (FRASE-bot) that has been successfully applied in previous CACHE Challenges. Our platform was among the winners of Challenge #1 and invited to Round 2 of Challenge #2 (pending experimental confirmation). We did not apply to Challenges #3 and #4. The AlphaFold structure of MCHR1 will be used in this step. A detailed description of the FRASE-bot workflow can be found in a preprint [1]. The underlying concepts and techniques are (i) FRAgments in Structural Environments (FRASE; a concept that enables seeding of novel target proteins with ligand fragments through FRASE database screening), (ii) interaction graphs (a graph representation of the ligand interacting with the nearby residues in 3D), (ii) a graph transformer network (GTN) on interaction graphs (an ML model enabling prediction of learning and prediction of properties for ligand-protein properties from the interaction graphs). The major steps of the FRASE-bot workflow include: (1) screening of the FRASE database for ligand fragments to be seeded in the protein structure; (2) scoring of seeded ligand fragments in the target protein for the “nativeness” of their binding pose; (3) converting ligand fragments into pharmacophoric features and aggregating them into composite pharmacophore queries; (4) searching billion-scale collections of commercially available compounds for ligands containing seeded fragments from step #2 and matching pharmacophore queries from step #3; (5) docking and diversity-based filtering of the docking hits; and (6) triage including visual assessment, as well as ML- and molecular dynamics (MD)-based free-energy calculations.
Second, two supervised machine learning (ML) models will be trained to exploit the available structure-activity relationships for MCHR1. One model, a GTN, will be built on interaction graphs of docked ligands with known activities. Another, more conventional, ML model will be a GTN trained on ligand graphs. Like any graph neural network (GNN), GTN are able to learn a molecular representation that is best suited for the data set of interest. Moreover, unlike graph-convolutional networks or graph attention networks, GTN are capable of capturing interrelationships between distant nodes of a graph, thus recovering latent pharmacophores.
Finally, a consensus hit list will be created from the structure- and ligand-based hit lists.
All components of our platform were developed in Python using public libraries, such as RDKit, Tensorflow, pandas, and others. We use Autodock Vina as the docking engine and GROMACS for MD simulations. We progressively share our code on the lab’s GitHub page (https://github.com/kireevlab) as soon as we find them robust enough (new models and algorithms are being posted regularly).
References
- Yi An, Jiwoong Lim, Marta Glavatskikh, Xiaowen Wang, Jacqueline Norris-Drouin, P. Brian Hardy, Tina M. Leisner, Kenneth H. Pearce, and Dmitri Kireev, Machine Learning-driven Fragment-based Discovery of CIB1-directed Anti-Tumor Agents by FRASE-bot, Nat. Comm. (Under Review), 16 August 2023, PREPRINT (Version 1) available at Research Square [https://doi.org/10.21203/rs.3.rs-3197490/v1].