Our hit identification technique involves three stages. The first involves aggregating a list of initial compounds. We utilize our in-house database of in-stock and on-demand compounds from various vendors (MCule, Enamine, etc) and aggregators (ZINC) (see citations). We also utilize a database of synthetically accessible compounds created through computationally running known synthetic reaction pathways (SAVI). This selection technique produces over a billion compounds, either in-stock and available for purchase or based on a known reaction pathway from in-stock building blocks. We limit ourselves generally to in-stock compounds but will include simple reaction pathways if the compound scores extraordinarily well. The second stage involves simulating the protein target and targeting a binding site. Utilizing DeepDriveMD, we simulate available structures for microseconds and use anharmonic conformational analysis-enabled autoencoder to sample the state space to produce a series of static conformations to dock against. We also utilize the transition information for eluding a binding site. This produces an ensemble of protein structures. We then utilize state-of-the-art commercial docking protocol using our scalable workflow environment to run docking on HPC systems. After running docking on the initial seed set of in-stock orderable compounds, we train a deep learning model to act as a 50,000x faster surrogate than performing docking. With this fast surrogate model, we screen the remaining billion compounds from a make-on-demand database such as Enamine Real or SAVI. A short list is chosen from these two lists by sampling from clusters of high-scoring surrogate compounds and high-quality and in-stock poses. We dock the deep learning scored compounds to verify the correctness of the model. This is performed across the ensemble of structures. Lastly, compounds from each cluster are resimulated and run through DeepDriveMD to determine if any compounds are causing a significant change in protein dynamics or present decoy-like features (free energy calculations score poorly, flies away from site). This information is used to select compounds that elucidate interesting modifications to the protein state space, indicating interaction is likely.
OpenMM, RDKit, PyTorch