Challenge #2

Hit Identification

Method type (check all that applies)

De novo design

Free energy perturbation

Physics-based

Description of your approach (min 200 and max 800 words)

De novo hit identification will be pursued using a fragment growing/linking approach, followed by free energy calculations (if time). Designed compounds will be used as targets in a similarity screen of the Enamine Real Database catalog, or synthesized directly in house:

Fragment growing. Fragment growing will be pursued using our open-source FEgrow software package [1]. For a given ligand core and growth vector, FEgrow allows the user to grow and score functional groups in the context of the protein binding pocket. FEgrow enumerates the bioactive conformations of the grown functional group, discards those that clash with the protein, and optimizes the remainder using hybrid machine learning / molecular mechanics potential energy functions. In particular, the ANI machine learning potential is used to describe the energetics of the ligand, which is significantly more reliable than the use of molecular mechanics force fields that are commonly used for refinement. Low energy structures are scored using the gnina convolutional neural network scoring function [2], and output for binding free energy calculations (see step 3).
Fragment linking. Fragment linking will be pursued using the DeLinker software package, which is a graph-based deep generative model that combines state-of-the-art machine learning techniques with structural knowledge [3]. Unlinked fragments will be input into DeLinker, and their relative distance and orientation will be used to output an ensemble of linked fragments containing both cores. The gnina docking package will be used to dock and score the linked compounds [2].
Free energies. FEgrow is specifically designed for the preparation of protein-ligand complex structures for input to rigorous binding free energy calculations. If time allows at this stage of the competition (see also Hit Optimization stage), user-defined congeneric series of ligands designed by FEgrow will be input to the SOMD package for the calculation of protein-ligand relative binding free energies. Previously reported protocols will be followed for free energy calculations [1,4].
Similarity Searches. The above steps will generate an ensemble of de novo designed compounds with good structural match to the target binding pocket. The team (Madden, Armstrong) have the required expertise for synthesizing compounds in house if necessary, but the favored approach will be to search for similar compounds in the Enamine Real Database catalog that are available for purchase. The catalog will be filtered down, first by physical properties (such as molecular weight), then by Tanimoto similarity between the Morgan fingerprints of the database molecules and the designed hits. A further 3D similarity filter, such as USRCAT [5] will be used if additional triage is required. The gnina docking software will finally be used to assess the binding modes and predicted binding affinities of purchasable compounds.

In summary, we will employ a hierarchy of computational methods for identifying an ensemble of structural hits from the available fragment data. The gnina docking score, which has been shown to outperform traditional empirical scoring functions [6], alongside the structural match to known crystallographic fragment hits will be used to rank the suggestions.

All software packages are fully open source, and the main approach (FEgrow) is developed by us (Bieniek, Cree, Pirie, Horton, Tatum, Cole). We have additional expertise in structural biology (Tatum), ligand-based virtual screening (Pirie, Madden), and organic synthesis (Madden, Armstrong) to ensure drug-likeness and synthetic tractability of the designed hits.

[1] https://doi.org/10.26434/chemrxiv-2022-hr5q4-v2

[2] https://doi.org/10.1186/s13321-021-00522-2

[3] https://doi.org/10.1021/acs.jcim.9b01120

[4] https://doi.org/10.1021/acs.jcim.1c00328

[5] https://doi.org/10.1186/1758-2946-4-27

[6] https://doi.org/10.1021/acs.jcim.0c00411

What makes your approach stand out from the community? (<100 words)

FEgrow integrates medicinal chemistry expertise in the design workflow, with state-of-the-art methods for pose prediction, scoring and free energy calculation. By building ligands from the constrained core of a known hit, we maximize the use of input from structural biology, and reduce reliance on docking algorithms. We have benchmarked FEgrow by building and scoring binding poses for ten congeneric series of ligands bound to targets from a standard, high quality dataset of protein-ligand complexes, as well as a series of inhibitors of the SARS-CoV-2 main protease, but this would be the first prospective use of FEgrow in hit identification.

Method Name

FEgrow

Commercial software packages used

N/A

Free software packages used

FEgrow: https://github.com/cole-group/FEgrow

gnina: https://github.com/gnina/gnina

DeLinker: https://github.com/oxpig/DeLinker

RDKit: https://github.com/rdkit/rdkit

SOMD: https://github.com/michellab/Sire

Relevant publications of previous uses by your group of this software/method

FEgrow de novo design:

https://doi.org/10.26434/chemrxiv-2022-hr5q4-v2

SOMD free energy calculations:

https://doi.org/10.1021/acs.jcim.1c00328

3D shape similarity screening (Morgan fingerprints and USRCAT are used here for benchmarking new methods):

https://doi.org/10.48550/arXiv.2201.04230

Hit Optimization Methods

Method type (check all that applies)

De novo design

Free energy perturbation

Physics-based

Description of your approach (min 200 and max 800 words)

Methods used during the Hit Optimization stage will largely follow those employed during Hit Identification, except with more time available we expect to adopt a more accurate approach to binding affinity prediction, namely free energy calculations using bespoke molecular mechanics force fields:

Starting from the predicted structures of known hits, FEgrow [1] will be used to grow congeneric series of ligands in the protein binding pocket for investigation of structure-activity relationships. The gnina scoring function will be used to provide initial guidance of binding affinity at this stage.
Free energy calculations will be employed to predict the relative binding free energies of structures from step 1 to the protein target. Two of us (Horton, Cole) are investigators at the Open Force Field Initiative, and we will use state-of-the-art small molecule force fields for modeling dynamics. Currently, this is the Sage small molecule force field, which has been extensively benchmarked against quantum mechanical energies and experimental protein-ligand binding free energy data [2]. The pmx or SOMD free energy software will be used for this step, using well-established protocols [3].
Despite much improvement in recent years, molecular mechanics force fields still limit the achievable accuracy in prospective molecular design. We have shown that accuracy improvements are achievable by fitting new torsion parameters against quantum mechanical data, specifically for the congeneric series of ligands under study. One of us (Horton) is the main developer of OpenFF-BespokeFit software package, which automates this process [4]. We will derive and employ bespoke torsion parameters, alongside the Sage small molecule force field to improve the accuracy of our binding free energy predictions.

In summary, we will use medicinal chemistry led hit elaboration with FEgrow, followed by relative binding free energy calculations using bespoke molecular mechanics force fields.

[1] https://doi.org/10.26434/chemrxiv-2022-hr5q4-v2

[2] https://openforcefield.org/community/news/general/sage2.0.0-release/

[3] https://doi.org/10.1039/C9SC03754C

[4] https://openforcefield.org/community/news/science-updates/bespokefit-update-2022-05-12/

What makes your approach stand out from the community? (<100 words)

As part of the benchmarking of OpenFF-BespokeFit [manuscript in preparation], we have parameterized a series of 16 inhibitors of the TYK2 protein, and compared predicted binding free energies with experiment. The root mean square error reduces from 0.7 kcal/mol (using the base Open Force Field) to 0.5 kcal/mol with the OpenFF-BespokeFit. This would be the first use of BespokeFit for prospective design.

Method Name

Free Energy Calculations with BespokeFit

Commercial software packages used

N/A

Free software packages used

FEgrow: https://github.com/cole-group/FEgrow 

pmx: https://github.com/deGrootLab/pmx 

BespokeFit: https://github.com/openforcefield/openff-bespokefit 

Relevant publications of previous uses by your group of this software/method

Open Force Field publications are in preparation, but initial data are reported in blog posts:

https://openforcefield.org/community/news/general/sage2.0.0-release/

https://openforcefield.org/community/news/science-updates/bespokefit-update-2022-05-12/