Our approach follows multiple stages that gradually funnel massive ligand libraries into hits, leads, and optimized leads. The multiple stages combine earlier data-driven and latter principle/physics-driven methods as detailed as follows. 1. Our own meta deep learning-based early-stage screening. We developed DeepAffinity (Bioinformatics 2019) that is semi-supervised: exploiting massive unlabeled protein and ligand data, such as sequences and graphs, for pre-training molecular encoders as well as using affinity-labeled protein-ligand pairs for jointly training encoders and affinity predictors end-to-end. We also introduced joint attention mechanisms to explain affinity predictions with residue-atom non-bonded interactions. DeepAffinity in its earlier form was tested well in a community-wide IDG-DREAM challenge for kinase-inhibitor binding prediction (Nature Communications 2021). We later used additional structure data to regularize and supervise the joint attentions (JCIM 2021). We are now developing advanced pre-training strategies for molecular sequence or graph embeddings (arXiv 2020). 2. Our own target-specific deep learning-based medium-stage screening and optimization. We previously used transfer learning to fine-tune meta affinity/activity predictors for the target of interest, which proved better than target-specific shallow models by using just dozens of ligand data for the target (Bioinformatics 2019). We are now exploiting roto-translationally equivariant transformers to develop higher-resolution activity predictors with the additional input of the targeted protein binding pockets. On top of these models, our attention mechanisms proved to be useful for hit optimization (JCIM 2021), by predicting the decomposed affinity contribution (which functional groups should be replaced, and if replaced by given candidates what the resulting affinity prediction would be). 3. Structure-based docking and energy calculation/decomposition for the late stage. With only hundreds or thousands of ligand candidates screened with aforementioned earlier stages, one can afford using slower, physics-driven methods such as structure-based docking and screening (Autodock Vina) as well as energy calculation and decomposition (MM/PBSA). Molecular dynamics for dozens of selected compounds would be performed for more mechanistic prediction of structures, dynamics, and activities. (4. Our own deep learning-based De Novo Design) We may or may not be able to include this - the computational method is ready but the experimental (chemical synthesis and assay)capability/budget may not be there. Nevertheless we would like to include the method in case we could secure experimental collaboration or outsourcing. Our methods is a deep generative model that uses reinforcement learning to design ligands as graphs (Bioinformatics 2020). The award from the environment includes chemical validity and DeepAffinity-based target activity. As our origical method is to design ligand pairs for a given disease, one modification to make here is to change the disease graph encoder to the protein pocket encoder as described in Sec. 2 (roto-translationally equivariant transformers). The reward will also include (predicted) chemical synthesizability and side effects, for which some deep learning models already exist.