Our modeling approach integrates advanced deep learning (DL) techniques with physics-based methods to enhance molecular docking accuracy and efficiency. We leverage the state-of-the-art DiffDock system, which treats molecular docking as a learning problem for predicting ligand poses. This approach utilizes a diffusion generative model (DGM), refined by projecting a diffusion from ambient space onto a submanifold, thus optimizing the training process on this refined manifold for better efficiency and accuracy.
In molecular docking, ligand movements are mapped to mathematical groups—translations are related to the 3D translation group T(3), rigid rotations to the 3D rotation group SO(3), and torsional changes to multiple copies of the 2D rotation group SO(2). These groups help formalize how translations, rotations, and torsional changes influence ligand poses, ensuring minimal structural disturbance.
The DiffDock framework includes two main models: the Score Model and the Confidence Model. The Score Model processes the ligand pose and protein structure, producing outputs for translations, rotations, and rotatable bonds. This model uses SE(3)-equivariant vectors for spatial movements and an SE(3)-invariant scalar for each rotatable bond. Our architectural approach employs SE(3)-equivariant convolutional networks, allowing for multiscale integration and efficient computational processing.
The Confidence Model, on the other hand, evaluates the plausibility and stability of the ligand pose relative to the protein, providing a single scalar output. This aids in refining docking predictions by assessing the joint rototranslations of the ligand and protein structures.
To validate ligand conformations predicted by DiffDock, we utilize Compass[1]—a system we developed to assess physical-chemical and bioactivity features of docked molecules. Compass integrates modules like PoseCheck and AA-Score to analyze steric clashes, strain energy, binding affinity, and the interaction fingerprint of the complex. These assessments help ensure the accuracy of our docking predictions by evaluating essential interaction dynamics and energy considerations within the protein-ligand complex.
PoseCheck evaluates the strain energy and steric clashes, crucial for determining the therapeutic effectiveness and physical plausibility of the binding poses. The AA-Score, an empirical scoring function, enriches this evaluation by quantifying amino acid-specific interactions, providing a detailed assessment of binding interactions, including hydrogen bonds, electrostatic and van der Waals forces, and other interaction types such as hydrophobic contacts and π-π stacking.
Overall, our approach integrates these advanced computational techniques to optimize molecular docking predictions, aiming to enhance the discovery and development of therapeutics through precise and efficient computational models.