A computer-implemented method for screening ligand candidates for a target protein. This is done through an in-house developed integrated ensemble machine learning (ML) model for predicting binding affinity with very high speed and precision.
The input into the AI engine are drug candidates in SMILES format generated by our in-house ML-based de novo molecular generator (iGen), and a protein structure in the form of a .pdb file. The binding pocket features are analyzed and ligands capable of fitting into the target pocket are estimated according to matching between the features of the binding pocket and the ligand molecules. The compounds are filtered and screened with our in-house Ultra-Fast Screening approach to end up with the most fitting compounds based on their characteristics. The remaining molecular candidates are ranked according to their predicted binding affinities, obtained using a novel ML-based scoring function (iScore) trained on largest available training sets.
Based on the best performing compounds, key scaffold variants will be derived for each molecule. The in-house developed iterative substitutive scaffold optimization (ISSO) is then implemented, where the scaffold input in SMILES format is decorated to generate a desired number of analogous compounds. Any ‘accessible’ atom site can be decorated. The obtained dataset, for example 10 000 derivatives for a certain scaffold, is then screened and ranked towards the protein active site using the Ultra-Fast Screening of iTripleD and the iScore scoring function as outlined above. The best N derivatives are then used in a second round of decoration, filtering and screening, generating successively improved pKd values in iterative cycles until saturation.
Having worked through the full list of scaffolds generating a final list of top candidates, the compounds as such or their close analogues will be sought in Enamine Real database (and also MolPort and eMolecules, if necessary), and a database of 100 analogues for each of the top 100 candidates generated and screened using iScore for proper ranking of available compounds.