De novo hit identification will be pursued using a fragment growing/linking approach, followed by free energy calculations (if time). Designed compounds will be used as targets in a similarity screen of the Enamine Real Database catalog, or synthesized directly in house:
- Fragment growing. Fragment growing will be pursued using our open-source FEgrow software package [1]. For a given ligand core and growth vector, FEgrow allows the user to grow and score functional groups in the context of the protein binding pocket. FEgrow enumerates the bioactive conformations of the grown functional group, discards those that clash with the protein, and optimizes the remainder using hybrid machine learning / molecular mechanics potential energy functions. In particular, the ANI machine learning potential is used to describe the energetics of the ligand, which is significantly more reliable than the use of molecular mechanics force fields that are commonly used for refinement. Low energy structures are scored using the gnina convolutional neural network scoring function [2], and output for binding free energy calculations (see step 3).
- Fragment linking. Fragment linking will be pursued using the DeLinker software package, which is a graph-based deep generative model that combines state-of-the-art machine learning techniques with structural knowledge [3]. Unlinked fragments will be input into DeLinker, and their relative distance and orientation will be used to output an ensemble of linked fragments containing both cores. The gnina docking package will be used to dock and score the linked compounds [2].
- Free energies. FEgrow is specifically designed for the preparation of protein-ligand complex structures for input to rigorous binding free energy calculations. If time allows at this stage of the competition (see also Hit Optimization stage), user-defined congeneric series of ligands designed by FEgrow will be input to the SOMD package for the calculation of protein-ligand relative binding free energies. Previously reported protocols will be followed for free energy calculations [1,4].
- Similarity Searches. The above steps will generate an ensemble of de novo designed compounds with good structural match to the target binding pocket. The team (Madden, Armstrong) have the required expertise for synthesizing compounds in house if necessary, but the favored approach will be to search for similar compounds in the Enamine Real Database catalog that are available for purchase. The catalog will be filtered down, first by physical properties (such as molecular weight), then by Tanimoto similarity between the Morgan fingerprints of the database molecules and the designed hits. A further 3D similarity filter, such as USRCAT [5] will be used if additional triage is required. The gnina docking software will finally be used to assess the binding modes and predicted binding affinities of purchasable compounds.
In summary, we will employ a hierarchy of computational methods for identifying an ensemble of structural hits from the available fragment data. The gnina docking score, which has been shown to outperform traditional empirical scoring functions [6], alongside the structural match to known crystallographic fragment hits will be used to rank the suggestions.
All software packages are fully open source, and the main approach (FEgrow) is developed by us (Bieniek, Cree, Pirie, Horton, Tatum, Cole). We have additional expertise in structural biology (Tatum), ligand-based virtual screening (Pirie, Madden), and organic synthesis (Madden, Armstrong) to ensure drug-likeness and synthetic tractability of the designed hits.
[1] https://doi.org/10.26434/chemrxiv-2022-hr5q4-v2
[2] https://doi.org/10.1186/s13321-021-00522-2
[3] https://doi.org/10.1021/acs.jcim.9b01120
[4] https://doi.org/10.1021/acs.jcim.1c00328
[5] https://doi.org/10.1186/1758-2946-4-27
[6] https://doi.org/10.1021/acs.jcim.0c00411