Challenge #3

Hit Identification

Method type (check all that applies)

De novo design

Deep learning

High-throughput docking

Machine learning

Hybrid of the above

We propose a deep learning approach for de novo molecule design. We will curate our training data based on high-throughput docking results.

Description of your approach (min 200 and max 800 words)

We propose a generative Artificial Intelligence workflow whereby we combine a Generative Adversarial Network (GAN) and Reinforcement Learning (RL) for simultaneous hit identification and optimization.

Within this framework, we propose to employ our novel RL algorithm called Multi-Expert Reinforcement Learning for Drug Design (MERLIND) as the generator of a GAN. MERLIND features a cluster of RL experts individually specializing in and optimizing for a desired molecular property, e.g., drug-likeness, toxicity, solubility, synthetic accessibility. Each expert optimizes the policy of the RL agent with a reward signal that is aligned with their corresponding specialty. Additionally, a reward signal for ‘realness’ is obtained from the GAN discriminator, which is trained to ensure that RL-based hit optimization does not produce unrealistic ligands by perturbing synthetic ligands away from natural high-performing ligands. The generator will operate on the SMILES representation of molecules and learn a policy to generate synthetic molecules one character at a time. At each time step, characters that the agent has hitherto appended to the molecule representation will represent the state of an RL agent while the set of potential next characters will constitute an action space.

Despite the superhuman performance that RL achieved in Chess and more recently Go, Reinforcement Learning is yet to be applied successfully to complex real-life problems like drug discovery. Insurmountable complexity of searching space in this domain renders RL incapable of sufficiently exploring possible actions and states to learn a policy. Our method overcomes this by constraining the action and state space based on the consensus of multiple RL experts. Potential actions that make it past each experts’ elimination criteria will be considered by the algorithm as a valid next step. Moreover, in landmark RL applications like AlphaGo, a human expert acts as the oracle to suggest plausible actions given a state. Accordingly, in our work, we will employ a decoder-based, GPT-like Transformer network to act as the oracle. The set of characters that can be appended at each step will be sampled from this oracle to be further optimized by other experts. We will equip the oracle with necessary domain knowledge by training it on a Masked Language Modeling task on a training dataset SMILES strings representing real ligands in the ENAMINE catalog. We elaborate on our approach for training dataset curation below. Consequently, the Transformer will learn inter-token dependencies that make up the SMILES representations of the top-scoring SARS-CoV-2 Nsp3 macrodomain (Mac1) ligands in the ENAMINE catalog. Once this is accomplished, we will sample a certain number of synthetic sequences from the oracle, and the same number of real sequences from the training data. We will then pre-train the discriminator on the binary classification task of discerning real sequences from synthetic ones. At the end of this pre-training process, the oracle is expected to generate diverse, high-scoring ligands that the discriminator can not differentiate from the pre-existing high-scoring ligands in the ENAMINE dataset.

As is the case with any RL agent, our agent depends on meaningful reward signals to take actions in the action space. In our proposed framework, these reward signals will be provided not only by individual experts on their respective criteria, but also by a trainable discriminator network evaluation of the ‘realness’ of a synthetic sequence. We are proposing to employ an encoder-based, BERT-like Transformer network for this purpose. Combined reward from the RL experts and discriminator will simultaneously optimize our hits for desired properties while keeping them realistic with respect to the ligands in the ENAMINE catalog.

Since the oracle Transformer will model its creations after the training data, we will invest significant High-Performance Computing resources to construct a training dataset of diverse and high-scoring hits as follows. First, we will perform a scaffold-based clustering of 5.5 billion drug-like molecules in the ENAMINE catalog. Then, we will perform stratified sampling on these clusters to acquire a chemically diverse set of 10 million representative ligands. We will then use AutoDock Vina to compute the docking scores of these molecules with the Mac1. Upon the completion of this large docking run, the molecules and their corresponding docking scores will be used to train a Directed Graph Convolutional Network as a docking surrogate, i.e., a regressor on the docking score. With the cutting-edge AI accelerator hardware at the Argonne National Laboratory, we will predict the docking scores of the rest of the ENAMINE dataset using the surrogate model. Informed by these predictions and the scaffold-based clusters, we will sample 2 million molecules as the training data for our oracle, optimizing the tradeoff between high docking score and chemical diversity. Additionally, the docking surrogate trained at this step will be used for evaluating virtual screening of the merged selections, as well.

What makes your approach stand out from the community? (<100 words)

First; to the best of our knowledge, we are the first to utilize multi-expert RL in drug discovery, allowing us to overcome the insurmountable search space that prohibited the application of RL to this problem so far. Second, our cluster-of-experts approach to RL enforces that each expert’s reward is pertinent to a specific property. Therefore, exploration-exploitation decisions of the policy informed by these rewards renders it more explainable than a black-box neural network. Finally, we are proposing to publicize a structure-based clustering of 5.5 billion compounds in the ENAMINE catalog, along with their predicted docking scores to the competition target.

Method Name

MERLIND: Multi-Expert Reinforcement Learning in Drug Discovery

Commercial software packages used

OpenEye

Free software packages used

Autodock Vina

AMBER

OpenMM

PyTorch

Relevant publications of previous uses by your group of this software/method

1. Aymen Al Saadi, Dario Alfe, Yadu Babuji, Agastya Bhati, Ben Blaiszik, Alexander Brace, Thomas Brettin, Kyle Chard, Ryan Chard, Austin Clyde, Peter Coveney, Ian Foster, Tom Gibbs, Shantenu Jha, Kristopher Keipert, Dieter Kranzlmüller, Thorsten Kurth, Hyungro Lee, Zhuozhao Li, Heng Ma, Gerald Mathias, Andre Merzky, Alexander Partin, Arvind Ramanathan, Ashka Shah, Abraham Stern, Rick Stevens, Li Tan, Mikhail Titov, Anda Trifan, Aristeidis Tsaris, Matteo Turilli, Huub Van Dam, Shunzhou Wan, David Wifling, and Junqi Yin. 2021. IMPECCABLE: Integrated Modeling PipelinE for COVID Cure by Assessing Better LEads. In 50th International Conference on Parallel Processing (ICPP 2021). Association for Computing Machinery, New York, NY, USA, Article 40, 1–12. https://doi.org/10.1145/3472456.3473524

2. High-Throughput Virtual Screening and Validation of a SARS-CoV-2 Main Protease Noncovalent Inhibitor

Austin Clyde, Stephanie Galanie, Daniel W. Kneller, Heng Ma, Yadu Babuji, Ben Blaiszik, Alexander Brace, Thomas Brettin, Kyle Chard, Ryan Chard, Leighton Coates, Ian Foster, Darin Hauner, Vilmos Kertesz, Neeraj Kumar, Hyungro Lee, Zhuozhao Li, Andre Merzky, Jurgen G. Schmidt, Li Tan, Mikhail Titov, Anda Trifan, Matteo Turilli, Hubertus Van Dam, Srinivas C. Chennubhotla, Shantenu Jha, Andrey Kovalevsky, Arvind Ramanathan, Martha S. Head, and Rick Stevens

Journal of Chemical Information and Modeling 2022 62 (1), 116-128

DOI: 10.1021/acs.jcim.1c00851

Virtual screening of merged selections

Method type (check all that applies)

Deep learning

High-throughput docking

Physics-based

Hybrid of the above

We will train a Deep Neural Network to predict the docking score of a compound to the target. We will rank the merged selections based on that model's predictions. We will perform MM/GBSA on the top 10 compounds from each participant.

Description of your approach (min 200 and max 800 words)

We propose a two-stage virtual screening procedure for the merged solutions. Our first step after the reception of the merged solutions is to rank them based on docking scores using our docking surrogate model from the hit identification step. Given that we previously achieved an inference throughput of tens of thousands with this neural network architecture, we project that we will have a docking-based ranking of the solutions within minutes of receiving them. Once we have a ranking of the merged solutions, we will select top 10 molecules from each participant for downstream high-fidelity screening via the Molecular Mechanics with Generalized Born and Surface Area Solvation (MM/GBSA) method using the AMBER software toolkit. A more detailed overview of our approach is as follows.

In the hit identification section, we proposed a docking surrogate model as a ranking mechanism to select high-scoring compounds into the training dataset for the oracle in our generator. We propose to take further advantage of this docking surrogate to rank the merged selections based on their predicted docking score. The docking surrogate will be trained on 10 million small molecules acquired through stratified sampling of scaffold-based clusters within the ENAMINE catalog. Using Autodock Vina, we will dock these 10 million compounds to the SARS-CoV-2 Nsp3 macrodomain (Mac1). Once the docking scores are acquired, a Directed Graph Convolutional Network (D-GCN) will be trained as a surrogate to by-pass the costly docking runs and computationally predict the docking scores via deep learning. Since this regressor model will have access to a structurally diverse pool of representative compounds selected from 5.5 billion drug-like molecules, we expect it to generalize well to the merged selections dataset that would be shared with us in September, 2023. We have applied this neural network architecture to similar tasks before and have achieved an inference throughput of tens of thousands per second. Accordingly, with our model, we will achieve a ranking of the merged solutions within days of receiving them.

As mentioned in the previous section, we favor a Directed Graph Convolutional Network architecture for our docking surrogate for the following reasons. First, these models operate on 2-dimensional graph representations of the molecules. Consequently, they can incorporate topological information pertaining to atoms and bonds within a molecule for accurate molecular property prediction. Furthermore, the particular architecture we are proposing augments the graph representations with RDKit-computed atom and edge features. Thus, the model leverages known physicochemical properties of a molecule to predict unknown ones of interest, in this case the docking score. Current literature presents D-GCNs as a favorable choice as they have been shown to demonstrate state-of-the-art performance on various molecular property prediction tasks.

MM/GBSA is a popular method for binding free energy calculation which has been used for protein-ligand docking with success. Despite being a high-fidelity estimate of binding free energy, it is considerably more expensive than docking from a computational standpoint. Within the timeframe and computational resource constraints of this project, we will select only the top 10 solutions from each participant based on the ranking from our docking surrogate model. We will run the MM/GBSA on our high-performance computing clusters using the AMBER software toolkit. Based on the MM/GBSA free binding energy scores, we will recommend 3 best solutions from each participant for further validation in an experimental setting.

What makes your approach stand out from the community? (<100 words)

Our approach gains an edge by combining cutting-edge Machine Learning-based molecular property prediction with Molecular Dynamics-based high-fidelity MM/GBSA method. Our ML model renders our screening pipeline time and computation efficient by narrowing down the candidate pool based on the docking score approximation it refined over millions of representative compounds. Following this initial ranking, the MM/GBSA step will boost the confidence of our screening via high-fidelity MD simulations.