I have developed a genetic algorithm (GA) that can search Enamine Real Space and will use it to find molecules with good docking scores to the target. My previous work has shown that GAs are very effective in searching a large chemical space for molecules for optimal properties,1-4 including docking scores.5 The genes are defined as a set of compatible synthons that can be combined to form molecules using a set of reaction rules and these genes are evolved to give molecules with good docking scores, starting from randomly generated genes.
The open source program Synt-On (formerly SynthI6) is used to create the synthons from the building blocks (freely available from the Enamine web page) used to make the Enamine Real Space library. The exact reaction rules used to make the Enamine Real Space library are proprietary so the exact molecules found by the GA are not in the REAL data base. Therefore, we perform a similarity search using SmallWorld7 or FTrees from BioSolveIT and redock the closets analogs. Preliminary results from the CACHE1 challenge shows that many of these have docking scores comparable or better than those found by GA.
The final list of 100 molecules will be assembled based on docking scores, inspection of the docking poses, number of commercially available derivates, drug-likeness, and structural diversity.
The synthon-based GA code (Synthon-GA) can also be used with other synthon sets, for example, based on in-house sets of building blocks. It can also be used with ML models of activity instead docking scores, so it will be a very general tool for molecule discovery using huge make-on-demand libraries.
Glide, SmallWorld, FTrees
1. Jensen, Jan H. 2019. “A Graph-Based Genetic Algorithm and Generative model/Monte Carlo Tree Search for the Exploration of Chemical Space.” Chemical Science 10 (12): 3567–72. 2. Henault, Emilie S., Maria H. Rasmussen, and Jan H. Jensen. 2020. “Chemical Space Exploration: How Genetic Algorithms Find the Needle in the Haystack.” PeerJ Physical Chemistry 2 (July): e11. 3. Koerstz, Mads, Anders S. Christensen, Kurt V. Mikkelsen, Mogens Brøndsted Nielsen, and Jan H. Jensen. 2021. “High Throughput Virtual Screening of 230 Billion Molecular Solar Heat Battery Candidates.” PeerJ Physical Chemistry 3 (February): e16.
4. Ree, Nicolai, Mads Koerstz, Kurt V. Mikkelsen, and Jan H. Jensen. 2021. “Virtual Screening of Norbornadiene-Based Molecular Solar Thermal Energy Storage Systems Using a Genetic Algorithm.” The Journal of Chemical Physics 155 (18): 184105. 5.Steinmann, Casper, and Jan H. Jensen. 2021. “Using a Genetic Algorithm to Find Molecules with Good Docking Scores.” PeerJ Physical Chemistry 3 (May): e18. 6. Zabolotna, Yuliana, Dmitriy M. Volochnyuk, Sergey V. Ryabukhin, Kostiantyn Gavrylenko, Dragos Horvath, Olga Klimchuk, Oleksandr Oksiuta, Gilles Marcou, and Alexandre Varnek. 2021. “SynthI: A New Open-Source Tool for Synthon-Based Library Design.” Journal of Chemical Information and Modeling, November. https://doi.org/10.1021/acs.jcim.1c00754. 7. Irwin, John J., Khanh G. Tang, Jennifer Young, Chinzorig Dandarchuluun, Benjamin R. Wong, Munkhzul Khurelbaatar, Yurii S. Moroz, John Mayfield, and Roger A. Sayle. 2020. “ZINC20-A Free Ultralarge-Scale Chemical Database for Ligand Discovery.” Journal of Chemical Information and Modeling 60 (12): 6065–73.