Considering the availability of the two given sets of known MCHR1 binders we will employ our already developed ML-enforced ligand-based virtual screening tool named PyRMD. This tool implements the Random Matrix Discriminant (RMD) ML (Machine Learning) algorithm at its core which has been demonstrated to stand out for its denoising capabilities.
In the RMD training process, two classes of compounds are considered, active and inactive (i.e., compounds with weak or null activity/binding against the target) molecules. Given that active and inactive compounds found in the literature may possess similar scaffolds that could be deemed crucial for activity interpretation, even if they do not contribute to the desired activity against the biological target, incorporating the inactive set during training helps reduce undersampling noise caused by irrelevant chemical features. This, in turn, facilitates the extraction of meaningful signals.
Our tool PyRMD builds upon this algorithm adding a wealth of features and modes of use that make it a comprehensive VS tool. In this proposal, we will first feed the ML model using the set of compounds provided within cache challenge #5. These will be automatically separated by PyRMD into actives and inactives based on their experimental biological data and converted into molecular fingerprints. Then, benchmarking experiments will be attained through a repeated K-fold cross-validation approach. At the end of this initial stage, PyRMD returns a series of statistical parameters such as true-positive rate (TPR), false-positive rate (FPR), the area under the receiver operating characteristic curve (ROC AUC), Boltzmann-enhanced discrimination of ROC (BED ROC) and F-Score to evaluate the relative algorithm performance and the predictive power of the created ML models (considering different parameters used as inputs).
The second step will be the screening of ultra-large databases (i.e., the ENAMINE Real database consisting of over 6B molecules) using the model that performed better in benchmarking calculations. At the end of this step, PyRMD returns all the compounds deemed to be active, along with a confidence score of its prediction (RMD score). Also, PyRMD automatically returns a Tanimoto similarity score to its closest analogue in the training set thereby allowing to select the compounds identified as being active but structurally unrelated to the already-known MCHR1 ligands. With this screening step, we will provide the wanted 100 potential MCHR1 binders having the highest RMD confidence score.