CACHE

CRITICAL ASSESSMENT OF COMPUTATIONAL HIT-FINDING EXPERIMENTS

DONATE

  • About
    • WHAT IS CACHE
    • Conferences
  • CACHE News
  • CHALLENGES
    • Challenge #1
      • Announcement
      • Computation methods
      • Preliminary results
      • Final results
    • Challenge #2
      • Announcement
      • Computation methods
      • Preliminary results
      • Final Results
    • Challenge #3
      • Announcement
      • Computation methods
      • Preliminary results
      • Final Results
    • Challenge #4
      • Announcement
      • Computation methods
      • Preliminary results
    • Challenge #5
      • Announcement
      • Computation methods
    • Challenge #6
      • Announcement
    • FAQ
  • PUBLICATIONS
  • CONTACT

Challenge #4

Hit Identification
Method type (check all that applies)
De novo design
Machine learning
Description of your approach (min 200 and max 800 words)

A computer-implemented method for screening ligand candidates for a target protein. This is done through an in-house developed, integrated ensemble machine learning (ML) model for predicting binding affinity with very high speed and precision. 

The input into the AI engine are drug candidates in SMILES format generated by our in-house ML-based de novo molecular generator (iGen), and a protein structure in the form of a .pdb file. The binding pocket features are analyzed and ligands capable of fitting into the target pocket are estimated according to matching between the features of the binding pocket and the ligand molecules. The compounds are filtered and screened with our in-house Ultra-Fast Screening approach to end up with the most fitting compounds based on their characteristics. The remaining molecular candidates are ranked according to their predicted binding affinities, obtained using a novel ML-based scoring function (iScore) trained on the largest available training sets from which the best data was handpicked. 

iGen has the capacity to produce valid SMILES at 90.0 %​, valid molecules at 87.4 %​, with compound uniqueness at over 99.0 % and a speed of around 2000 SMILES per second on a single A100 node. If one reduces the speed and does not produce compounds in batches, valid SMILES increase to 98.4 %​, valid molecules 95.9 %​ while uniqueness remains the same.  

Having a list of top candidates, the compounds or close analogues will be sought in Enamine Real database (and also MolPort and eMolecules, if necessary), and a database of 100 analogues for each of the top 100 candidates generated and screened using iScore for proper ranking of available compounds.   

What makes your approach stand out from the community? (<100 words)

Avoiding conformational sampling speeds up the hit identification process considerably as well as produces some of the most accurate affinity predictions to date. In CASF-2016 and CSAR benchmarks and case studies, our tool consistently performs best in scoring power, ranking power, and screening power. With our novel Ultra-Fast Screening approach (UFS), we can furthermore screen compounds several orders of magnitude faster than any current software we came across. Our iGen module takes advantage of the accuracy and speed of our proprietary methods, making the exploration of the whole drug-like chemical space feasible; something that has been elusive thus far.  

Method Name
i-TripleD by ANYO Labs
Commercial software packages used

none

Free software packages used

F-Pocket, D-Pocket, RDKit 

Relevant publications of previous uses by your group of this software/method

The software has been developed, thoroughly tested and refined during the last couple of years. The team behind the project has incorporated and became ANYO Labs AB in December 2022 and the method subject to patent filing after in-depth FTO analysis in January 2023. Because of this, the team has kept the methodology a trade secret and will publish articles related to the method in about a year’s time. We are currently in the process of preparing articles for 2 of our ongoing projects. Professor Leif Eriksson has several publications in theoretical and computational chemistry, but none related to our current method. 

Hit Optimization Methods
Method type (check all that applies)
Machine learning
Hybrid of the above
Machine learning for scaffolding
Description of your approach (min 200 and max 800 words)

A computer-implemented method for screening ligand candidates for a target protein. This is done through an in-house developed integrated ensemble machine learning (ML) model for predicting binding affinity with very high speed and precision. 

The input into the AI engine are drug candidates in SMILES format generated by our in-house ML-based de novo molecular generator (iGen), and a protein structure in the form of a .pdb file. The binding pocket features are analyzed and ligands capable of fitting into the target pocket are estimated according to matching between the features of the binding pocket and the ligand molecules. The compounds are filtered and screened with our in-house Ultra-Fast Screening approach to end up with the most fitting compounds based on their characteristics. The remaining molecular candidates are ranked according to their predicted binding affinities, obtained using a novel ML-based scoring function (iScore) trained on largest available training sets. 

Based on the best performing compounds, key scaffold variants will be derived for each molecule. The in-house developed iterative substitutive scaffold optimization (ISSO) is then implemented, where the scaffold input in SMILES format is decorated to generate a desired number of analogous compounds. Any ‘accessible’ atom site can be decorated. The obtained dataset, for example 10 000 derivatives for a certain scaffold, is then screened and ranked towards the protein active site using the Ultra-Fast Screening of iTripleD and the iScore scoring function as outlined above. The best N derivatives are then used in a second round of decoration, filtering and screening, generating successively improved pKd values in iterative cycles until saturation.  

Having worked through the full list of scaffolds generating a final list of top candidates, the compounds as such or their close analogues will be sought in Enamine Real database (and also MolPort and eMolecules, if necessary), and a database of 100 analogues for each of the top 100 candidates generated and screened using iScore for proper ranking of available compounds.  

What makes your approach stand out from the community? (<100 words)

The iterative substitutive scaffold optimization (ISSO) from ANYO Labs utilizes our i-TripleD software combined with the iGen module for scaffold decoration. Any site of the scaffold can be chosen for generating any number of de novo scaffold decorations aiming to optimize pKd. The generated compound set is screened and ranked as outlined above in the i-TripleD software, and any number from the dataset can be chosen for subsequent iterative rounds, giving maximum flexibility and performance.  In a recent test, a scaffold was optimized from a pKd of 7.1 to 8.6 in 3 consecutive iterations. 

Method Name
ISSO and i-TripleD by ANYO Labs
Commercial software packages used

none

Free software packages used

F-Pocket, D-Pocket, RDKit 

Relevant publications of previous uses by your group of this software/method

The software has been developed, thoroughly tested and refined during the last couple of years. The team behind the project has incorporated and became ANYO Labs AB in December 2022 and the method subject to patent filing after in-depth FTO analysis in January 2023. Because of this, the team has kept the methodology a trade secret and will publish articles related to the method in about a year’s time. We are currently in the process of preparing articles for 2 of our ongoing projects.

Cache

All rights reserved
v5.47.19.49

Footer first

  • Login
  • Applicant Login
  • Terms of Participation
  • Privacy Policy
  • FAQ
  • Docs
This website is licensed under CC-BY 4.0

Toronto website development by Rebel Trail