We present an end-to-end lead optimization system for discovery based on an AI-gym environment called ``Reinforcement Learning for Molecular Modeling" (RLMM). RLMM automates running fully customizable molecular dynamic simulations inside of an agent-based molecular design protocol. RLMM is fully autonomous---from a single starting ligand, protein structure, and configuration file, RLMM cycles through designs for lead optimization informed by physics-based simulations. RLMM connects various state-of-the-art molecular dynamics simulations with an agent-based policy environment. We outline the basis of the molecular dynamic’s simulations utilized in RLMM and describe the methods for navigating chemical space in an agent-based model.
Computational tools for assessing the binding affinity for a ligand generally rely on molecular dynamics simulations. Given the aim to connect machine learning with physics-based modeling, we focus on employing physics-based simulations of protein-ligand complexes. Various software packages exist with an API for creating, running, and analyzing molecular dynamics simulations Standard molecular dynamics models. Molecular dynamics simulations are widely used to estimate the binding affinity of a proten-ligand complex computationally. Advanced sampling techniques are also used in molecular dynamics simulations such Markov Chain Monte Carlo (MCMC) or replica exchange. Molecular mechanics generalized borne surface area (MMGBSA) is a technique for estimating the binding affinity of a protein-ligand complex. MMGBSA methods are less computationally expensive than free energy estimations. Free energy estimation software packages utilize more complicated achemical techniques. We utilize the MMGBSA.py script from Amber20 to estimate the MMGBSA scores for a series of molecular dynamics snapshots.
RLMM is comprised of five general components that make up the backbone of the platform: system building, simulation setup, action space, observation space, and policy. Each of the five components consists of sub-modules with unique properties and behaviors. The connection between modules is provided by RLMM. The general workflow following initial system preparation, simulation, observations drawn from simulation sample, determination of action space (ligand design), AI-policy chooses modification, and the system is rebuilt and re-initialized to continue the simulation. Molecular modifications are small so that systems can be re-initialized without as much warm-up time.
System-building
Typically for lead optimization tasks, tautomers and enantiomers are enumerated for the incoming proposed analog or perturbation to the previous ligand. Conformer generation is performed on the ensemble of structures, generating 200-800 3D conformers for every enantiomer and reasonable tautomer generated. The conformer and placement of the ligand is selected based on the best shape overlay to the previous ligand. We utilize this system preparation method for lead optimization to mimic and interrupted simulation, where the start of the new simulation matches the end of the previous simulation as closely as possible.
Action-space
The action space abstraction in RLMM defines the space, or domain, of available actions available to the policy module. These actions define the transition from state to state in RLMM. To illustrate the strengths of this abstraction, we provide three implemented action spaces, with more robust formulations for synthetic chemistry restrictions coming. In principle, the action space formulation will allow for a robotic laboratory based action space, calculating possible reactions given a set of known reactions and in-stock reagents. During lead optimization, the goal is to modify a ligand to something similar with more desirable properties such as stronger binding or other properties. In order to transition the ligand, we implemented a similarity search, where the action space returns the $n$ most similar molecules in terms of 3D shape overlay based on a user provided database. The action space for a given state is then defines as the set of molecules that are the top $n$ most similar from a given database, such as PubChem. One benefit of this module is that enumerating the actions is exceptionally fast and all actions are synthetically reasonably, at least up to the quality of the database used. A second action space uses the FastRocs toolkit from OpenEye, which utilizes parallel GPUs to search a local database for known active compounds of similar shape to the given ligand, comparing millions of potential compounds per second. It returns a configurable number of sufficiently similar compounds for further analysis as potential modifications to the ligand. A second action space is based on the derivation and models trained in this paper for a scaffold-based navigation model.
Policy
In each episode of the simulation, the ligand structure will be perturbed to look for better binding and/or new ligand structures. The changed structure will then persist as the base structure in the next episode. RLMM supports various policies to allow for flexible choices in how the ligand will be modified in each episode and which modifications will persist.