Our proposed pipeline consists of five steps. First, we will pass the on-demand ZINC20 database of small molecules through a custom docking pipeline using a machine learning-enhanced consensus module. This machine learning consensus module combines binding affinity and pose predictions from five traditional docking tools to rank the small molecules according to the probability of binding. This method outperforms individual docking tools on test datasets like DUDE and LIT-PCBA in the BEDROC metric. Second, we will run solvated molecular dynamics (MD) simulations on docked complexes of the top 500 ranked molecules using the AMBER14 forcefield for the receptor and Open Force Field parametrizations for the small molecules. These MD simulations will generate both a refined version of the docked poses and a measure of ligand stability in the binding site, as measured by ligand fluctuations in terms of root-mean-square-fluctuation (RMSF). Third, molecular mechanics generalized Born surface area (MM/GBSA) free energy calculations will be performed on the refined binding poses of the top 500 candidate molecules. MM/GBSA free energy calculation provides a strong trade-off between accuracy and computational cost and is widely used for medium size library scans. Fourth, a deep graph neural network based on the SE(3) Transformer architecture will be used to assign point energies to the most frequently sampled small molecule conformations. Finally, in step five, the MD-derived RMSF scores, the MM/GBSA binding free energies, and the point energies predicted by the deep learning model will be aggregated as summed z-scores to assign a final score to each of the top 500 molecules. The best ranked molecules will be selected for experimental validation.
This method has not been published by our group, but builds in separate developments and established best practices that we plan to combine for the CACHE challenge. Related publications include: 1. Learning Small Molecule Energies and Interatomic Forces with an Equivariant Transformer on the ANI-1x Dataset; Bryce Hedelius‡, Fabian B. Fuchs, and Dennis Della Corte; ELLIS Machine Learning for Molecule Discovery Workshop (2021). 2. Engineering and application of a biosensor with focused ligand specificity Dennis Della Corte, Hugo L. van Beek, Falk Syberg, Marcus Schallmey, Felix Tobola, Kai U. Cormann, … Connor J. Morris‡, … (8 other authors); Nat. Commun. 11 (1), 4851 (2020). 3. Context-dependent stabilizing interactions among solvent-exposed residues along the surface of a trimeric helix bundle; Kimberlee L. Stern†, Mason S. Smith†, Wendy M. Billings‡, Taylor J. Loftus‡, Benjamin M. Conover‡, Dennis Della Corte, and Joshua L. Price; Biochemistry 59 (17), 1672-1679 (2020). 4. Evaluation of Deep Neural Network ProSPr for Accurate Protein Distance Predictions on CASP14 Targets; Jacob Stern†, Bryce Hedelius‡, Olivia Fisher‡, Wendy M. Billings‡, and Dennis Della Corte; Int. J. Mol. Sci. 22, 12835 (2021). 5. The whole is greater than its parts: ensembling improves protein contact prediction Wendy M. Billings‡, Connor J. Morris‡, and Dennis Della Corte; Sci. Rep. 11 (1), 8039 (2021). 6. Using molecular docking and molecular dynamics to investigate protein-ligand interactions Connor J. Morris‡ and Dennis Della Corte; Mod. Phys. Lett. B 35 (8), 2130002 (2021). 7. Integrated NMR, Fluorescence and MD Benchmark Study of Protein Mechanics and Hydrodynamics; Christina Möckel, Jakub Kubiak, Oliver Schillinger, Ralf Kuehnemuth, Dennis Della Corte, Gunnar F. Schröder, … (4 other authors); J. Phys. Chem. B 123 (7), 1453-1480 (2018).