Computational methods

Hit Identification

Method type (check all that applies)

High-throughput docking

Physics-based

Hybrid of the above

Fragment-based high-throughput molecule selection

Description of your approach (min 200 and max 800 words)

A computational workflow will be implemented sequentially in order to i) identify the most promising binding sites within the WDR domain of LRRK2 through a fragment-based computational screening approach, ii) identify and select fragments with lead-like properties and/or high ligand efficiencies, iii) screen databases using the identified fragments and Lipinsky/Mozziconacci rules, and iv) perform high-throughput docking of the selected molecules in the proposed binding sites. Fragment-based computational screening will use an in-house developed fragment library for setting up several all-atom molecular systems, therefore enabling simulated annealing Molecular Dynamics (MD) simulations (GROMACS) to identify the most promising binding sites within the target. No less than three replicates per system will be performed, and any available experimental information concerning active sites or important residues will be used to analyze the data. Following, all fragments bound to the target will be identified and analysed using multiple approaches (e.g. ligand efficiencies, relative energies of binding, free energy of adsorption using the probability ratio method) and only those characterized as high-affinity binders will be selected for further usage. Next, the selected fragments substructures will be used as filter criteria to retrieve from Enamine Real Database only those molecules with matching substructures, using mainly the Lipinsky Lenient Filter (orally available) or Mozziconacci (drug-likeliness) filters. Prior to the high-throughput docking, all compounds will be checked against undesirable functional groups and/or PAINS substructures. A high-throughput docking approach will be then performed, in the binding sites previously identified, using Autodock VINA for the initial ranking, and gnina, a docking software with integrated support for scoring and optimizing ligands using convolutional neural network to re-rank the molecules and select the best 100 hits. MD simulations will use the all-atom AMBER forcefield, with the apo WDR domain also evaluated prior to any fragment-based MD run. The simulated annealing protocol will be used in both fragments and solvent to prevent non-specific binding, keeping the targets' backbone spatially constrained to prevent any changes in its secondary structure. A blind docking approach using qvina-w will be set up for faster evaluation of any identified binding sites. As an additional quality control concerning hit selection, all hits with common substructures to known inhibitors that targets the closed form of LRRK2 will also be excluded while those resembling known binders will be prioritized. Standard MD simulations will be used to evaluate the stability of the protein-ligand complex and to obtain absolute free energies of binding.

Method Name

Sequential fragment-based/HTS docking hit identification

Commercial software packages used

none

Free software packages used

GROMACS, gnina, vina, qvina-w, VMD, Pymol, python

Relevant publications of previous uses by your group of this software/method

Simulated-annealing MD/free energies of adsorption: Ferreira et al. Phys. Chem. Chem. Phys., 2015,17, 22023-22034; MD/Molecular docking/Free-energies of binding: Isca et al., ACS Med. Chem. Lett. 2020, 11, 5, 839–845; Molecular docking: Ferreira et al., J. Chem. Inf. Model. 2013, 53, 7, 1747–1760

Hit Optimization Methods

Method type (check all that applies)

High-throughput docking

Machine learning

Physics-based

Description of your approach (min 200 and max 800 words)

The method used for hit optimization will be similar to that used for hit identification if less than 10 hits show KD < 100 µM. In the case of a higher number of hits with reasonable activity, a machine-learning approach will be attempted to generate a regression model, thus allowing a better understanding of any structure-activity-relationships and a swift filtering of larger Enamine databases. All machine-learning models will be built on Weka v3.8.5 (https://waikato.github.io), with all molecular descriptors calculated using PaDEL Descriptor (www.yapcwsoft.com/dd/padeldescriptor) software packages. Initially, the selection of curation of the molecular descriptors will be performed i) by removing all descriptors with zero values in more than 20% of the compounds in the training set, and ii) by using the BestFirst algorithm to select those with higher correlations with the activity. Following, the Auto-Weka feature will be used to identify the top-5 most promising models, ranked by their mean square error. After that, all models will be evaluated concerning a 10-fold cross-correlation using standard evaluation metrics, e.g. as ROC, AUC, true positive rates, mean squared errors and others provided by Weka, and another round of descriptor selection will be used to prevent model overfitting. If the model obtained reveals a q2 > 0.75 (cross-correlation) and a r2 > 0.8, it will be used to select compounds from the larger Enamine database. Lastly, a similar workflow as that depicted for hit identification will be used to further select up to 100 compounds.

Method Name

ML/DK/MD

Commercial software packages used

none

Free software packages used

Weka

Relevant publications of previous uses by your group of this software/method

Ferreira et al. Future Medicinal Chemistry 2018 10:7, 725-741

Challenge #1