A computer-implemented method for screening ligand candidates for a target protein. This is done through an in-house developed, integrated ensemble machine learning (ML) model for predicting binding affinity with very high speed and precision.
The input into the AI engine are drug candidates in SMILES format generated by our in-house ML-based de novo molecular generator (iGen), and a protein structure in the form of a .pdb file. The binding pocket features are analyzed and ligands capable of fitting into the target pocket are estimated according to matching between the features of the binding pocket and the ligand molecules. The compounds are filtered and screened with our in-house Ultra-Fast Screening approach to end up with the most fitting compounds based on their characteristics. The remaining molecular candidates are ranked according to their predicted binding affinities, obtained using a novel ML-based scoring function (iScore) trained on the largest available training sets from which the best data was handpicked.
iGen has the capacity to produce valid SMILES at 90.0 %, valid molecules at 87.4 %, with compound uniqueness at over 99.0 % and a speed of around 2000 SMILES per second on a single A100 node. If one reduces the speed and does not produce compounds in batches, valid SMILES increase to 98.4 %, valid molecules 95.9 % while uniqueness remains the same.
Having a list of top candidates, the compounds or close analogues will be sought in Enamine Real database (and also MolPort and eMolecules, if necessary), and a database of 100 analogues for each of the top 100 candidates generated and screened using iScore for proper ranking of available compounds.