We developed a structure-based molecular generative model named Topology Molecular Type assignment (TopMT) that generates highly potent molecules while addressing synthetic feasibility, ensuring all generated molecules are achievable through combinatorial parallel synthesis with fragments in the Enamine REAL space. TopMT features two modules: a GAN module and a Matching module. The GAN module efficiently explores interactions and generates novel ligands with 3D structures, while the matching module deconstructs these structures into fragments and searches the fragments within Enamine library to identify the building blocks necessary to recombine into the generated target molecules. Evaluated on diverse protein systems like kinases, GPCRs, and proteases, TopMT has demonstrated up to 50,000-fold enrichment compared to high-throughput screening. Combined with our group’s expertise in medicinal chemistry and molecular dynamics simulation, our workflow ensures that generated ligands are both novel and synthetically feasible.
This approach addresses novelty, diversity, and synthetic feasibility simultaneously, supported by high throughput docking and visual inspections by experienced medicinal chemists.
Step1. De-novo design of potential hits with TopMT-GAN module.
The binding pocket of SETDB1 is well-defined with many known ligands, providing a robust starting point for structure-based design using our TopMT-GAN module. Taking advantage of its speed and high efficiency in exploring relevant chemical space, our model will generate a diverse pool of molecules (50,000) with the potential to form strong interactions within the binding pocket. These molecules are novel and generated through de novo design, thus not constrained by any existing screening library. We will then conduct a preliminary round of high throughput screening to select the most promising ligands (1,000). The structures generated by TopMT-GAN will serve as a basis for exploring their interaction patterns in later stages.
Step 2. Expand hits within Enamine REAL space using TopMT-Matching.
The topologies extracted from the promising poses are fed into the TopMT-Matching module, which is developed to address the synthesis problem of generative molecules. Utilizing the interaction patterns generated from the GAN module, the Matching module uses Enamine in-stock fragments (259K fragments) as building blocks to explore all possible ways to fill the defined topologies. This method avoids the need for docking the enormous chemical libraries, making it highly efficient for exploring the extensive on-demand space. Consequently, the module generates a larger pool of 200,000 potential hits with well-defined synthetic pathways, ensuring the feasibility of subsequent synthesis and testing. This process not only expands the previous chemical space but also guarantees that the generated molecules are readily synthesizable.
Step 3. Hierarchical Virtual Screening of the Generated Hit Library
We will perform an initial round of docking using Glide SP to filter the library based on docking scores, drug-likeness, ADME properties (including solubility, permeability, and metabolic stability), and structural diversity. The most promising ligands from this initial screen will then undergo a second round of docking using Glide XP to further validate their binding affinities and interaction profiles. This hierarchical approach enhances both efficiency and accuracy, ensuring that only the most viable candidates progress to the next stages of visual inspection and validation.
Step 4. Visual Inspection
After the hierarchical virtual screening, we will conduct a thorough visual inspection of the top-ranked molecules. This involves manually reviewing the binding poses and interactions within the binding pocket to ensure that the selected ligands exhibit favorable geometries and interactions. This step helps confirm the quality and potential effectiveness of the candidates before moving on to simulation and experimental validation.
Step 5. MD Simulation Validation
Our group also has expertise in molecular dynamics (MD) simulations. After selecting candidate hits, we will use MD simulations to validate the interactions between ligands and targets. By simulating the dynamic behavior of ligand-target complexes, we can assess the stability and strength of interactions over time. This step helps in confirming the robustness of the binding interactions observed in docking studies and provides additional insights into the potential efficacy of the candidates.