Computational methods

Hit Identification

Method type (check all that applies)

Deep learning

Free energy perturbation

High-throughput docking

Physics-based

Hybrid of the above

Multi-stage deep-learning and physics-based hit-finding pipeline

Description of your approach (min 200 and max 800 words)

Our proposed pipeline consists of three steps. As a preliminary step, we will define a binding site around the ADPr site of PDB 7KQB.

To begin molecule ranking, we will first pass the on-demand ZINC20 database of small molecules through a custom docking pipeline using a machine learning-enhanced consensus module. This consensus module combines binding affinity and pose predictions from three traditional docking tools to rank the small molecules according to the probability of binding. This method outperforms individual docking tools on test datasets like DUDE and LIT-PCBA in the BEDROC metric. https://pubs.acs.org/doi/10.1021/acs.jcim.2c00705

Second, binding energies for the top 5000 docking posed will be calculated using the GFN2-xTB quantum chemistry (QM) method. In this method, the protein-ligand complex's geometry is optimized then the single point energies of bound and unbound conformations are compared. This method provides a good tradeoff between computational efficiency and accuracy and has been shown to be well-suited for exploring the conformational space of molecular systems. https://pubs.acs.org/doi/full/10.1021/acs.jctc.8b01176

Third, we will evaluate the 300-500 (dependent on computational resources) most promising candidates from the QM method through Umbrella Sampling combined with Hamiltonian Replica Exchange. We will use 40 Umbrella windows for a total of 45 ns simulations per window. We will then employ WHAM to calculate the free energy profiles. We plan to use OpenForceField parameterizations for the ligands, Amber ff19SB for the Protein, and run the simulations with Gromacs/OpenMM. https://link.springer.com/article/10.1007/s10822-021-00439-w

Finally, the 100 molecules with the most-negative free energy difference will be selected for experimental validation.

What makes your approach stand out from the community? (<100 words)

Our method combines various industry best-practices to produce an efficient and comprehensive hit-finding pipeline. The combination of machine learning and physics-based methods provides the best trade-off between accuracy and efficiency in determining the most likely drug candidates. Combining a variety of metrics allows for a more comprehensive analysis of each ligand and helps us to more confidently assess which compounds will yield positive experimental results.

Method Name

CMOD Design

Commercial software packages used

Gaussian

Free software packages used

OpenMM, OpenForceField, Gromacs, MDAnalysis, AmberTools, Autodock Vina, Ledock, Plants, internally developed machine learning models (MILCDock)

Relevant publications of previous uses by your group of this software/method

This method has not been published by our group, but builds in separate developments and established best practices that we plan to combine for the CACHE challenge. Related publications include:

1. MILCDock: Machine Learning Enhanced Consensus Docking for Virtual Screening in Drug Discovery

Connor J. Morris, Jacob A. Stern, Brenden Stark, Max Christopherson, and Dennis Della Corte*; J. Chem. Inf. Model. 2022, 62, 22, 5342–5350

2. Learning Small Molecule Energies and Interatomic Forces with an Equivariant Transformer on the ANI-1x Dataset;

Bryce Hedelius‡, Fabian B. Fuchs, and Dennis Della Corte; ELLIS Machine Learning for Molecule Discovery Workshop (2021).

3. Engineering and application of a biosensor with focused ligand specificity

Dennis Della Corte, Hugo L. van Beek, Falk Syberg, Marcus Schallmey, Felix Tobola, Kai U. Cormann, ... Connor J. Morris‡, ... (8 other authors); Nat. Commun. 11 (1), 4851 (2020).

4. Context-dependent stabilizing interactions among solvent-exposed residues along the surface of a trimeric helix bundle;

Kimberlee L. Stern†, Mason S. Smith†, Wendy M. Billings‡, Taylor J. Loftus‡, Benjamin M. Conover‡, Dennis Della Corte, and Joshua L. Price; Biochemistry 59 (17), 1672-1679 (2020).

5. Evaluation of Deep Neural Network ProSPr for Accurate Protein Distance Predictions on CASP14 Targets;

Jacob Stern†, Bryce Hedelius‡, Olivia Fisher‡, Wendy M. Billings‡, and Dennis Della Corte; Int. J. Mol. Sci. 22, 12835 (2021).

6. The whole is greater than its parts: ensembling improves protein contact prediction

Wendy M. Billings‡, Connor J. Morris‡, and Dennis Della Corte; Sci. Rep. 11 (1), 8039 (2021).

7. Using molecular docking and molecular dynamics to investigate protein-ligand interactions

Connor J. Morris‡ and Dennis Della Corte; Mod. Phys. Lett. B 35 (8), 2130002 (2021).

8. Integrated NMR, Fluorescence and MD Benchmark Study of Protein Mechanics and Hydrodynamics;

Christina Möckel, Jakub Kubiak, Oliver Schillinger, Ralf Kuehnemuth, Dennis Della Corte, Gunnar F. Schröder, ... (4 other authors); J. Phys. Chem. B 123 (7), 1453-1480 (2018).

Challenge #3