Computational methods

Hit Identification

Method Name

Deep integration of physics and machine learning based methods for accurate molecule ranking

Description of your approach (min 200 and max 800 words)

Our proposed pipeline consists of five steps. First, we will pass the on-demand ZINC20 database of small molecules through a custom docking pipeline using a machine learning-enhanced consensus module. This machine learning consensus module combines binding affinity and pose predictions from five traditional docking tools to rank the small molecules according to the probability of binding. This method outperforms individual docking tools on test datasets like DUDE and LIT-PCBA in the BEDROC metric. Second, we will run solvated molecular dynamics (MD) simulations on docked complexes of the top 500 ranked molecules using the AMBER14 forcefield for the receptor and Open Force Field parametrizations for the small molecules. These MD simulations will generate both a refined version of the docked poses and a measure of ligand stability in the binding site, as measured by ligand fluctuations in terms of root-mean-square-fluctuation (RMSF). Third, molecular mechanics generalized Born surface area (MM/GBSA) free energy calculations will be performed on the refined binding poses of the top 500 candidate molecules. MM/GBSA free energy calculation provides a strong trade-off between accuracy and computational cost and is widely used for medium size library scans. Fourth, a deep graph neural network based on the SE(3) Transformer architecture will be used to assign point energies to the most frequently sampled small molecule conformations. Finally, in step five, the MD-derived RMSF scores, the MM/GBSA binding free energies, and the point energies predicted by the deep learning model will be aggregated as summed z-scores to assign a final score to each of the top 500 molecules. The best ranked molecules will be selected for experimental validation.

Free software packages used

Autodock Vina, Autodock4, Ledock, rDock, PLANTS, OpenMM, internally developed machine learning models (omission of name for blind review)

Challenge #1