Computational methods

Hit Identification

Method type (check all that applies)

Machine learning

Description of your approach (min 200 and max 800 words)

Considering the availability of the two given sets of known MCHR1 binders we will employ our already developed ML-enforced ligand-based virtual screening tool named PyRMD. This tool implements the Random Matrix Discriminant (RMD) ML (Machine Learning) algorithm at its core which has been demonstrated to stand out for its denoising capabilities.

In the RMD training process, two classes of compounds are considered, active and inactive (i.e., compounds with weak or null activity/binding against the target) molecules. Given that active and inactive compounds found in the literature may possess similar scaffolds that could be deemed crucial for activity interpretation, even if they do not contribute to the desired activity against the biological target, incorporating the inactive set during training helps reduce undersampling noise caused by irrelevant chemical features. This, in turn, facilitates the extraction of meaningful signals.

Our tool PyRMD builds upon this algorithm adding a wealth of features and modes of use that make it a comprehensive VS tool. In this proposal, we will first feed the ML model using the set of compounds provided within cache challenge #5. These will be automatically separated by PyRMD into actives and inactives based on their experimental biological data and converted into molecular fingerprints. Then, benchmarking experiments will be attained through a repeated K-fold cross-validation approach. At the end of this initial stage, PyRMD returns a series of statistical parameters such as true-positive rate (TPR), false-positive rate (FPR), the area under the receiver operating characteristic curve (ROC AUC), Boltzmann-enhanced discrimination of ROC (BED ROC) and F-Score to evaluate the relative algorithm performance and the predictive power of the created ML models (considering different parameters used as inputs).

The second step will be the screening of ultra-large databases (i.e., the ENAMINE Real database consisting of over 6B molecules) using the model that performed better in benchmarking calculations. At the end of this step, PyRMD returns all the compounds deemed to be active, along with a confidence score of its prediction (RMD score). Also, PyRMD automatically returns a Tanimoto similarity score to its closest analogue in the training set thereby allowing to select the compounds identified as being active but structurally unrelated to the already-known MCHR1 ligands. With this screening step, we will provide the wanted 100 potential MCHR1 binders having the highest RMD confidence score.

What makes your approach stand out from the community? (<100 words)

We developed a ML algorithm based on a mathematical framework devised for ligand-based VS, with a high denoising activity that can screen a million compounds in a few hours. Moreover, in our recently published paper, we have demonstrated PyRMD’s predicted power in comparison with widely used AI (Artificial Intelligence) methods (namely, random forest, gradient boosting, logistic regression, and naïve Bayes). By looking at the statistical parameters that are independent of the classification thresholds (ROC AUC, PRC AUC, and BEDROC), averaged over all the considered targets, PyRMD performance was comparable to the random forest algorithm and better than the other considered algorithms.

Method Name

PyRMD

Commercial software packages used

none

Free software packages used

PyRMD

Relevant publications of previous uses by your group of this software/method

1) PyRMD: A New Fully Automated AI-Powered Ligand-Based Virtual Screening Tool. Amendola G, Cosconati S. J Chem Inf Model. 2021 Aug 23;61(8):3835-3845. doi: 10.1021/acs.jcim.1c00653. Epub 2021 Jul 16. PMID: 34270903

2) Streamlining Large Chemical Library Docking with Artificial Intelligence: the PyRMD2Dock Approach. Roggia M, Natale B, Amendola G, Di Maro S, Cosconati S. J Chem Inf Model. 2023 Aug 8. doi: 10.1021/acs.jcim.3c00647. PMID: 37552222

Challenge #5