Challenge #6

Hit Identification
Method type (check all that applies)
Machine learning
Description of your approach (min 200 and max 800 words)

A computer-implemented method for screening ligand candidates for a target protein. This is done through an in-house developed, integrated ensemble machine learning (ML) model for predicting binding affinity with very high speed and precision. 



We will use the three different crystal structures of binders in the three connected active site locations and construct several possible models to target with the screening as well as screen individually for comparison. Structural analysis, including insights into dynamics and interactions, is conducted through molecular dynamics simulations. The final step involves validation against experimental data, if available, and utilizing the refined structure for functional annotation and further investigations, such as ligand-binding studies or contributions to drug discovery endeavors. The known binders will be run in a molecular simulation to validate the protein structure further.

Afterwards, the input into the AI engine are SMILES from the REAL Enamine library. The binding pocket features are analyzed and ligands capable of fitting into the target pocket are estimated according to matching between the features of the binding pocket and the ligand molecules. The compounds are filtered and screened with our proprietary filters for rapid screening. The remaining molecular candidates are ranked according to their predicted binding affinities, obtained using a novel ML-based scoring function (iScore) trained on ChemBL and pdb files curated manually. The initial screening for Compounds will then filter out compounds with cLogP > 5 with the proprietary ML trained iADMET module. We will even run our generative AI model iGen and similarity search the available ligand libraries for complementary candidate selection for choosing the top 150 ligands. 

What makes your approach stand out from the community? (<100 words)

Avoiding conformational sampling speeds up the hit identification process considerably as well as produces some of the most accurate affinity predictions to date. In CASF-2016 and CSAR benchmarks and case studies, our tool consistently performs best in scoring power, ranking power, and screening power. With our novel Ultra-Fast Screening approach (UFS), we can furthermore screen compounds several orders of magnitude faster than any current software we came across.



Our iGen module takes advantage of the accuracy and speed of our proprietary methods, making the exploration of a much more vast drug-like chemical space feasible thereby leading to generation of a higher amount of hits that can be chosen based on synthetic accessibility. iGen has the capacity to produce valid SMILES at 90.0 %​, valid molecules at 87.4 %​, with compound uniqueness at over 99.0 % and a speed of around 2000 SMILES per second on a single NVIDIA A100 node. If one reduces the speed and does not produce compounds in batches, valid SMILES increase to 98.4 %​, valid molecules 95.9 %​ while uniqueness remains the same.  

The research team has many years of molecular modeling experience, and are at the forefront of current drug discovery trends, lecturing on the topic of physical and molecular chemistry at university.

Method Name
i-TripleD
Commercial software packages used

none

Free software packages used

F-Pocket, D-Pocket, RDKit

Relevant publications of previous uses by your group of this software/method