Our approach called DockAI is a new technology that combines docking with a state-of-the-art active learning methodology to significantly improve the efficiency and effectiveness of virtual screening and hit identification.
With the advent of make-on-demand commercial libraries, the number of purchasable compounds available for virtual screening has grown exponentially in recent years, with several libraries containing over one billion compounds. These ultra-large libraries offer a wealth of potential hit compounds, but traditional docking approaches that score every compound individually can be cost-prohibitive and time-consuming. That's where DockAI comes in.
Our advanced active learning methodology enables us to select the most informative compounds from a chemical library for docking and scoring, ensuring that we are focusing on the most promising examples and maximizing the chances of identifying hit compounds. This not only increases efficiency but also enhances the overall performance of the method, as demonstrated in case studies where we outperformed other virtual screening approaches, recovering more than 75% of the best docking compounds with a 100-fold reduction in compute cost.
In addition to our active learning methodology, DockAI also utilizes a robust docking pipeline that has been carefully designed and tested to handle even the largest and most diverse chemical libraries. With DockAI, we can efficiently search ultra-large libraries or virtual compounds for hit compounds, saving time and resources for other important aspects of your drug discovery research.
The pipeline starts by sampling a subset that will be docked and will constitute the first training set. Then each active learning iteration consists of five steps :
– Train the model on the docked compounds,
– Infer the whole library,
– Pick the best-predicted compounds,
– Dock them within the pocket,
– Add them to the training set
Step after step, the distribution of docking scores in the training set tends to move to good scores. Consequently, the model will be better to identify good docking compounds in the library.
Finally, a medical chemist filters the whole set of docked molecules by selecting a sample that maximizes chemical diversity in the molecules that have the best docking scores.
We have also developed a generative AI technology based on synthesis templates. Combined with DockAI, we have a unique capability to explore a huge (1015) chemical space of easy-to-make molecules, with an associated estimated cost of synthesis. The molecules identified using DockAI are not only promising but also readily synthesizable and accessible, making them ideal for further development and optimization.
The top molecules are then pushed into MMGBSA rescoring in a single point.
MMGBSA (Molecular Mechanics Generalized Born Surface Area) is a computational method used in drug discovery projects to rescore chemical compounds. The method combines molecular mechanics and continuum electrostatics to calculate the free energy of binding between a ligand and a protein receptor. The MMGBSA approach uses a molecular dynamics simulation to sample the conformational space of the protein-ligand complex, and then estimates the free energy of binding by summing up the energy components from the molecular mechanic force field and the continuum electrostatics calculations. The final MMGBSA score is a combination of the ligand-receptor interaction energy and the solvation free energy of the complex. The MMGBSA approach is used to rank the predicted binding modes of a set of compounds and identify the most promising candidates for further experimental studies.
And the final listing is reviewed visually by the computational chemist team and the medicinal chemist team.