We will use structure-based ultra-large virtual screenings using VirtualFlow 2.0 [Gorgulla 2023]. The procedure will consist of four steps.
Step 1: Protein preparation. Protein structures will be prepared with Maestro from Schrödinger (protonation state assignment, assignment of missing atoms/side chains, hydrogen atoms, ...). MD simulations of the target protein will be carried out using Amber (software). Conformations will be clustered, and representative structures of the clusters will be used for the virtual screens.
Step 2: Hit identification. The hit identification step will consist of two virtual screening stages.
Step 2 - Stage 1 - Primary Virtual Screen: We will use structure-based ultra-large virtual screenings using physics-based docking methods (AutoDock Vina, QuickVina, Smina, PLANTS as well as several deep learning-based docking programs). We will screen a ligand library with 69 billion molecules with VirtuaFlow 2.0, an open-source platform for ultra-large virtual screens. The library we are using is the Enamine REAL Space (version 2022q12). We will use a new adaptive screening technique that we have developed, called Adaptive Target Guided Virtual Screens (ATG-VS). Due to the large-scale computations required for this approach, we will use the AWS Cloud, which is supported by VirtualFlow 2.0. We have extensive experience using the cloud and have used over 5 million CPUs in parallel in the past [Gorgulla 2023]. The protein will be held rigid in stage 1 of the screen. The ligand library which we will be using (Enamine REAL Space) has already been prepared by us into a ready-to-dock format [Gorgulla 2023]. The ligands have been protonated, tautomerized, the 3D conformation has been computed, and the ligands are the ready-to-dock PDBQT format.
Step 2 - Stage 2 - Rescoring: We will rescreen the top 1 million compounds of stage 1 in stage 2, and will allow the protein side chains at the binding site to be flexible. Multiple protein backbone conformations might be used to carry out ensemble dockings in addition, based on the results of the MD simulations in the protein preparation step (see section above).
Step 3: Postprocessing of the results. The screened compounds of Step 2 will be ranked by their docking score. Of the top 1000 compounds, biophysical and pharmacokinetic properties will be computed, visual inspection carried out, both of which will be taken into account during the selection. Compounds with unfavorable properties (e.g. too high logP or PAINS motiv) will be filtered out.
Step 4: Hit optimization. In the second round of the challenge (hit optimization), we will search the chemical space (our available libraries) for the most similar analogs, and screen them with again VirtualFlow as described above in step 2.