1.Primary screening with evolutionary chemical binding similarity model
We will apply evolutionary chemical binding similarity (ECBS) model (PMID: 31504818) for primary ligand-based virtual screening for MCHR1. The ECBS model is designed to learn chemical properties conserved for evolutionarily-related binding targets. Specifically, evolutionarily-related chemical pairs (ERCPs) and unrelated random pairs are trained by classification similarity learning. If the targets of the given chemical pairs are the same or homologous by an evolutionary annotation (e.g. PFAM), the chemical pairs are considered as ERCPs because of the shared binding property.
Among ECBS variants, we will use Target-ECBS model, which is specifically trained for virtual screening (VS) to search chemical compounds that likely bind to MCHR1. The ECBS model defines the ERCPs from multiple targets that are evolutionarily related to MCHR1 (such as Family A G protein-coupled receptor-like by SUPFAM or Rhodopsin 7-helix transmembrane proteins by Gene3D) to incorporate evolutionary binding information about MCHR1. The trained ECBS model assigns each chemical of REAL database a similarity score to the known active molecules to MCHR1; a higher similarity score represents a higher binding probability to MCHR1.
The ECBS model is designed to encode molecular features enriched in evolutionarily conserved chemical-target binding relationships, and formulated by the likelihood of chemical compounds binding to identical targets. The inclusion of evolutionary information linked to chemical compounds through their binding targets is a unique property of the ECBS method that enables expansion of available chemical-target interaction data, contributing to improvement of ligand-based virtual screening by revealing hidden ERCPs that have evolutionarily-conserved binding features. Our previous work includes the underlying principles and training process for the ECBS models (PMID: 31504818, PMID: 37742003).
2. Secondary screening with a machine-learning model to predict relative binding affinities
The ECBS model provides probabilities for binding rather than predicting specific binding affinity values, making it challenging to prioritize chemical compounds based on their binding strength. To complement this limitation, we will construct a new prediction model capable of predicting relative binding affinity (or ranks) using the IC50 values of the provided compounds. Given the incompatibility of experimental binding affinity values from different bioassays, we will focus on designing a prediction model to intergrate these heterogenous experimental data and predicting relative binding ranks beween chemicals. This secondary prediction model will enable us to filter or re-rank the initial ECBS-screened compounds according to their relative binding affinity.
The binding affinity prediction model will be trained by using various machine-learning approaches, such as DNN, Gaussian Process and Random Forest, and the model showing the highest accuracy will be used. The final chemical candidates will be selected based on Hit-Score which integrates the original ECBS score with the binding affinity prediction score as follows.
Hit-Score = P(Active) x P(Higher_binding | Active)
, where P(Acitve) is estimated from the initial ECBS score and P(Higher_binding | Active) from the binding affinity prediction score.
The candidates will be selected from each of the method (ECBS vs. ECBS+binding affinity model) for the performance comparision.
3. Filtering to ensure chemical novelty for the selected candidates
The three procedures will be applied for filtering out the selected candidates by Hit-Score.
1) Blind docking: blind docking methods (e.g., DiffDock) will be used for filtering out the candidates ranked by the Hit-Score to confirm the appropriate binding site to MCHR1 structural model. Each chemical candidate will be docked to the Alpahfold-predicted MCHR1 structure without prior definition of the binding site, and only those candidates that bind around the known binding pocket will be considered for the next screening procedure.
2) Chemical similarity and prediction uncertainty: To ensure the novelty of the candidates, structure-similarity filters will be applied by comparing them to known active compounds. Chemical compounds with Tanimoto coefficient higher than a certain threshold will be not considered to exclude trivial and un-interesting solutions. Additionally, uncertainty values estimated from the secondary binding affinity prediction model will also be considered to select novel chemical candidates and to improve the performance of the original prediction model for the hit optimization by retraining it with the chemical compounds of high uncertainty. This filtering step aims to prioritize candidates with novel chemical scaffolds and to efficiently refine the original prediction model.
3) Solubility and PAINS filters: Predicted solubility and PAINS filters will be used to exclude insoluble and promiscuous binding compounds, ensuring suitability for experimental validation.