Challenge #6

Hit Identification
Method type (check all that applies)
De novo design
Deep learning
High-throughput docking
Description of your approach (min 200 and max 800 words)

We developed a structure-based molecular generative model named Topology Molecular Type assignment (TopMT) that generates highly potent molecules while addressing synthetic feasibility, ensuring all generated molecules are achievable through combinatorial parallel synthesis with fragments in the Enamine REAL space. TopMT features two modules: a GAN module and a Matching module. The GAN module efficiently explores interactions and generates novel ligands with 3D structures, while the matching module deconstructs these structures into fragments and searches the fragments within Enamine library to identify the building blocks necessary to recombine into the generated target molecules. Evaluated on diverse protein systems like kinases, GPCRs, and proteases, TopMT has demonstrated up to 50,000-fold enrichment compared to high-throughput screening. Combined with our group’s expertise in medicinal chemistry and molecular dynamics simulation, our workflow ensures that generated ligands are both novel and synthetically feasible.

This approach addresses novelty, diversity, and synthetic feasibility simultaneously, supported by high throughput docking and visual inspections by experienced medicinal chemists.

Step1. De-novo design of potential hits with TopMT-GAN module.

The binding pocket of SETDB1 is well-defined with many known ligands, providing a robust starting point for structure-based design using our TopMT-GAN module. Taking advantage of its speed and high efficiency in exploring relevant chemical space, our model will generate a diverse pool of molecules (50,000) with the potential to form strong interactions within the binding pocket. These molecules are novel and generated through de novo design, thus not constrained by any existing screening library. We will then conduct a preliminary round of high throughput screening to select the most promising ligands (1,000). The structures generated by TopMT-GAN will serve as a basis for exploring their interaction patterns in later stages.

Step 2. Expand hits within Enamine REAL space using TopMT-Matching.

The topologies extracted from the promising poses are fed into the TopMT-Matching module, which is developed to address the synthesis problem of generative molecules. Utilizing the interaction patterns generated from the GAN module, the Matching module uses Enamine in-stock fragments (259K fragments) as building blocks to explore all possible ways to fill the defined topologies. This method avoids the need for docking the enormous chemical libraries, making it highly efficient for exploring the extensive on-demand space. Consequently, the module generates a larger pool of 200,000 potential hits with well-defined synthetic pathways, ensuring the feasibility of subsequent synthesis and testing. This process not only expands the previous chemical space but also guarantees that the generated molecules are readily synthesizable.

Step 3. Hierarchical Virtual Screening of the Generated Hit Library

We will perform an initial round of docking using Glide SP to filter the library based on docking scores, drug-likeness, ADME properties (including solubility, permeability, and metabolic stability), and structural diversity. The most promising ligands from this initial screen will then undergo a second round of docking using Glide XP to further validate their binding affinities and interaction profiles. This hierarchical approach enhances both efficiency and accuracy, ensuring that only the most viable candidates progress to the next stages of visual inspection and validation.

Step 4. Visual Inspection

After the hierarchical virtual screening, we will conduct a thorough visual inspection of the top-ranked molecules. This involves manually reviewing the binding poses and interactions within the binding pocket to ensure that the selected ligands exhibit favorable geometries and interactions. This step helps confirm the quality and potential effectiveness of the candidates before moving on to simulation and experimental validation.

Step 5. MD Simulation Validation

Our group also has expertise in molecular dynamics (MD) simulations. After selecting candidate hits, we will use MD simulations to validate the interactions between ligands and targets. By simulating the dynamic behavior of ligand-target complexes, we can assess the stability and strength of interactions over time. This step helps in confirming the robustness of the binding interactions observed in docking studies and provides additional insights into the potential efficacy of the candidates.

What makes your approach stand out from the community? (<100 words)

Our approach employs the innovative TopMT model, integrating both GAN and Matching modules to generate novel and synthetically feasible molecules. The GAN module efficiently explores diverse interaction patterns without library constraints, while the Matching module ensures synthetic feasibility using Enamine in-stock fragments. A major limitation of current generative models is that their generated molecules are often hard to synthesize. We overcome this limitation, making our approach particularly effective and practical for drug discovery.



Our workflow is further enhanced by rigorous virtual screening, visual inspection, and MD simulation validation, thereby improving the efficiency and effectiveness of the drug discovery process.

Method Name
Topology Molecular Type assignment (TopMT)
Commercial software packages used

Schrodinger Molecular Modelling Suite (Glide, QikProp, LigPrep, and Epik modules)

Free software packages used

RDKit, Autodock Vina