Challenge #1
Application
HIT IDENTIFICATION
I have developed a genetic algorithm (GA) that can search Enamine Real Space and will use it to find molecules with good docking scores to the target. My previous work has shown that GAs are very effective in searching a large chemical space for molecules for optimal properties,1-4 including docking scores.5 The genes are defined as a set of compatible synthons that can be combined to form molecules using a set of reaction rules and these genes are evolved to give molecules with good docking scores, starting from randomly generated genes. The open source program SynthI6 is used to create the synthons from the building blocks (freely available from the Enamine web page) used to make the Enamine Real Space library. The exact reaction rules used to make the Enamine Real Space library are proprietary, but the reaction rules implemented in SynthI seems to be a close approximation. Preliminary data shows that many of the molecules found by the GA can either be found in the Enamine Real Space library or differ from such compounds by only a few atoms (found by a search using the SmallWorld algorithm7). Molecules with small changes will then be docked to the target to make sure that these small differences do not affect the docking scores appreciably. The final list of 100 molecules will be assembled based on docking scores, inspection of the docking poses, number of commercially available derivates, drug-likeness, and structural diversity. The synthon-based GA code (Synthon-GA) can also be used with other synthon sets, for example, based on in-house sets of building blocks. It can also be used with ML models of activity instead docking scores, so it will be a very general tool for molecule discovery using huge make-on-demand libraries.
Glide for docking (though open source docking programs can also be used).
Synthi, Synthon-GA
1. Jensen, Jan H. 2019. “A Graph-Based Genetic Algorithm and Generative model/Monte Carlo Tree Search for the Exploration of Chemical Space.” Chemical Science 10 (12): 3567–72. 2. Henault, Emilie S., Maria H. Rasmussen, and Jan H. Jensen. 2020. “Chemical Space Exploration: How Genetic Algorithms Find the Needle in the Haystack.” PeerJ Physical Chemistry 2 (July): e11. 3. Koerstz, Mads, Anders S. Christensen, Kurt V. Mikkelsen, Mogens Brøndsted Nielsen, and Jan H. Jensen. 2021. “High Throughput Virtual Screening of 230 Billion Molecular Solar Heat Battery Candidates.” PeerJ Physical Chemistry 3 (February): e16. 4. Ree, Nicolai, Mads Koerstz, Kurt V. Mikkelsen, and Jan H. Jensen. 2021. “Virtual Screening of Norbornadiene-Based Molecular Solar Thermal Energy Storage Systems Using a Genetic Algorithm.” The Journal of Chemical Physics 155 (18): 184105. 5.Steinmann, Casper, and Jan H. Jensen. 2021. “Using a Genetic Algorithm to Find Molecules with Good Docking Scores.” PeerJ Physical Chemistry 3 (May): e18. 6. Zabolotna, Yuliana, Dmitriy M. Volochnyuk, Sergey V. Ryabukhin, Kostiantyn Gavrylenko, Dragos Horvath, Olga Klimchuk, Oleksandr Oksiuta, Gilles Marcou, and Alexandre Varnek. 2021. “SynthI: A New Open-Source Tool for Synthon-Based Library Design.” Journal of Chemical Information and Modeling, November. https://doi.org/10.1021/acs.jcim.1c00754. 7. Irwin, John J., Khanh G. Tang, Jennifer Young, Chinzorig Dandarchuluun, Benjamin R. Wong, Munkhzul Khurelbaatar, Yurii S. Moroz, John Mayfield, and Roger A. Sayle. 2020. “ZINC20-A Free Ultralarge-Scale Chemical Database for Ligand Discovery.” Journal of Chemical Information and Modeling 60 (12): 6065–73.