CACHE

CRITICAL ASSESSMENT OF COMPUTATIONAL HIT-FINDING EXPERIMENTS

DONATE

  • About
    • WHAT IS CACHE
    • Read More
    • Spotlight
    • Conferences
  • CACHE News
  • CHALLENGES
    • Challenge #1
      • Announcement
      • Computation methods
      • Preliminary results
    • Challenge #2
      • Announcement
      • Computation methods
      • Preliminary results
    • Challenge #3
      • Announcement
      • Computation methods
    • Challenge #4
      • Announcement
      • Computation methods
    • FAQ
  • Sponsor a Challenge
  • CONTACT

Challenge #3

Hit Identification
Method type (check all that applies)
De novo design
Deep learning
High-throughput docking
Machine learning
Hybrid of the above
We propose a deep learning approach for de novo molecule design. We will curate our training data based on high-throughput docking results.
Description of your approach (min 200 and max 800 words)

We propose a generative Artificial Intelligence workflow whereby we combine a Generative Adversarial Network (GAN) and Reinforcement Learning (RL) for simultaneous hit identification and optimization.

Within this framework, we propose to employ our novel RL algorithm called Multi-Expert Reinforcement Learning for Drug Design (MERLIND) as the generator of a GAN. MERLIND features a cluster of RL experts individually specializing in and optimizing for a desired molecular property, e.g., drug-likeness, toxicity, solubility, synthetic accessibility. Each expert optimizes the policy of the RL agent with a reward signal that is aligned with their corresponding specialty. Additionally, a reward signal for ‘realness’ is obtained from the GAN discriminator, which is trained to ensure that RL-based hit optimization does not produce unrealistic ligands by perturbing synthetic ligands away from natural high-performing ligands. The generator will operate on the SMILES representation of molecules and learn a policy to generate synthetic molecules one character at a time. At each time step, characters that the agent has hitherto appended to the molecule representation will represent the state of an RL agent while the set of potential next characters will constitute an action space.

Despite the superhuman performance that RL achieved in Chess and more recently Go, Reinforcement Learning is yet to be applied successfully to complex real-life problems like drug discovery. Insurmountable complexity of searching space in this domain renders RL incapable of sufficiently exploring possible actions and states to learn a policy. Our method overcomes this by constraining the action and state space based on the consensus of multiple RL experts. Potential actions that make it past each experts’ elimination criteria will be considered by the algorithm as a valid next step. Moreover, in landmark RL applications like AlphaGo, a human expert acts as the oracle to suggest plausible actions given a state. Accordingly, in our work, we will employ a decoder-based, GPT-like Transformer network to act as the oracle. The set of characters that can be appended at each step will be sampled from this oracle to be further optimized by other experts. We will equip the oracle with necessary domain knowledge by training it on a Masked Language Modeling task on a training dataset SMILES strings representing real ligands in the ENAMINE catalog. We elaborate on our approach for training dataset curation below. Consequently, the Transformer will learn inter-token dependencies that make up the SMILES representations of the top-scoring SARS-CoV-2 Nsp3 macrodomain (Mac1) ligands in the ENAMINE catalog. Once this is accomplished, we will sample a certain number of synthetic sequences from the oracle, and the same number of real sequences from the training data. We will then pre-train the discriminator on the binary classification task of discerning real sequences from synthetic ones. At the end of this pre-training process, the oracle is expected to generate diverse, high-scoring ligands that the discriminator can not differentiate from the pre-existing high-scoring ligands in the ENAMINE dataset.

As is the case with any RL agent, our agent depends on meaningful reward signals to take actions in the action space. In our proposed framework, these reward signals will be provided not only by individual experts on their respective criteria, but also by a trainable discriminator network evaluation of the ‘realness’ of a synthetic sequence. We are proposing to employ an encoder-based, BERT-like Transformer network for this purpose. Combined reward from the RL experts and discriminator will simultaneously optimize our hits for desired properties while keeping them realistic with respect to the ligands in the ENAMINE catalog. 

Since the oracle Transformer will model its creations after the training data, we will invest significant High-Performance Computing resources to construct a training dataset of diverse and high-scoring hits as follows. First, we will perform a scaffold-based clustering of 5.5 billion drug-like molecules in the ENAMINE catalog.  Then, we will perform stratified sampling on these clusters to acquire a chemically diverse set of 10 million representative ligands. We will then use AutoDock Vina to compute the docking scores of these molecules with the Mac1. Upon the completion of this large docking run, the molecules and their corresponding docking scores will be used to train a Directed Graph Convolutional Network as a docking surrogate, i.e., a regressor on the docking score. With the cutting-edge AI accelerator hardware at the Argonne National Laboratory, we will predict the docking scores of the rest of the ENAMINE dataset using the surrogate model. Informed by these predictions and the scaffold-based clusters, we will sample 2 million molecules as the training data for our oracle, optimizing the tradeoff between high docking score and chemical diversity. Additionally, the docking surrogate trained at this step will be used for evaluating virtual screening of the merged selections, as well.

What makes your approach stand out from the community? (<100 words)

First; to the best of our knowledge, we are the first to utilize multi-expert RL in drug discovery, allowing us to overcome the insurmountable search space that prohibited the application of RL to this problem so far. Second, our cluster-of-experts approach to RL enforces that each expert’s reward is pertinent to a specific property. Therefore, exploration-exploitation decisions of the policy informed by these rewards renders it more explainable than a black-box neural network. Finally, we are proposing to publicize a structure-based clustering of 5.5 billion compounds in the ENAMINE catalog, along with their predicted docking scores to the competition target.

Method Name
MERLIND: Multi-Expert Reinforcement Learning in Drug Discovery
Commercial software packages used

OpenEye

 

Free software packages used

Autodock Vina

AMBER

OpenMM

PyTorch

 

 

 

Relevant publications of previous uses by your group of this software/method

1. Aymen Al Saadi, Dario Alfe, Yadu Babuji, Agastya Bhati, Ben Blaiszik, Alexander Brace, Thomas Brettin, Kyle Chard, Ryan Chard, Austin Clyde, Peter Coveney, Ian Foster, Tom Gibbs, Shantenu Jha, Kristopher Keipert, Dieter Kranzlmüller, Thorsten Kurth, Hyungro Lee, Zhuozhao Li, Heng Ma, Gerald Mathias, Andre Merzky, Alexander Partin, Arvind Ramanathan, Ashka Shah, Abraham Stern, Rick Stevens, Li Tan, Mikhail Titov, Anda Trifan, Aristeidis Tsaris, Matteo Turilli, Huub Van Dam, Shunzhou Wan, David Wifling, and Junqi Yin. 2021. IMPECCABLE: Integrated Modeling PipelinE for COVID Cure by Assessing Better LEads. In 50th International Conference on Parallel Processing (ICPP 2021). Association for Computing Machinery, New York, NY, USA, Article 40, 1–12. https://doi.org/10.1145/3472456.3473524

 

2. High-Throughput Virtual Screening and Validation of a SARS-CoV-2 Main Protease Noncovalent Inhibitor
Austin Clyde, Stephanie Galanie, Daniel W. Kneller, Heng Ma, Yadu Babuji, Ben Blaiszik, Alexander Brace, Thomas Brettin, Kyle Chard, Ryan Chard, Leighton Coates, Ian Foster, Darin Hauner, Vilmos Kertesz, Neeraj Kumar, Hyungro Lee, Zhuozhao Li, Andre Merzky, Jurgen G. Schmidt, Li Tan, Mikhail Titov, Anda Trifan, Matteo Turilli, Hubertus Van Dam, Srinivas C. Chennubhotla, Shantenu Jha, Andrey Kovalevsky, Arvind Ramanathan, Martha S. Head, and Rick Stevens
Journal of Chemical Information and Modeling 2022 62 (1), 116-128
DOI: 10.1021/acs.jcim.1c00851

Cache

All rights reserved
v5.47.19.49

Footer first

  • Login
  • Applicant Login
  • Privacy Policy
  • FAQ
  • Docs
This website is licensed under CC-BY 4.0