PhD Studentship in Artificial Intelligence for Phenotypic Virtual Screening
About the Project
Predictive models built with artificial intelligence (AI) are increasingly used to identify molecules with the potential to become effective therapies for cancer and other diseases. These models can leverage large-scale datasets describing both the chemical properties of compounds and their biological activities across diverse cellular systems to discover promising drug candidates through computational (virtual) screening of extensive chemical libraries.
In particular, AI models can be trained on complementary sources of information describing molecules and cancer cell lines to predict the responses of previously untested compounds across a broad range of cellular contexts. By learning patterns linking chemical structure, cellular state, and drug sensitivity, such models have the potential to accelerate the discovery of compounds with desirable therapeutic properties.
Despite important advances, several challenges limit the predictive performance and generalizability of these models. Some challenges are specific to this application (e.g. how best to integrate heterogeneous sources of chemical and cellular information). Other challenges are shared with many supervised learning problems (e.g. quantifying predictive uncertainty and anticipating model performance on previously unseen compounds and cellular contexts).
This PhD project aims to develop and evaluate novel predictive modelling approaches that integrate molecular and cellular information to improve the identification of potent compounds across cancer cell lines. The project will make use of both synthetic and real-world datasets and will involve the development, validation and interpretation of predictive models for phenotypic virtual screening. The successful applicant will join the group of Pedro Ballester at Imperial College London and the PhD will be carried out under his direct supervision.
Selection Criteria
Essential
- University degree(s) awarded in an area directly relevant to the project.
- Courses in the application of machine learning algorithms to scientific problems.
- Excellent grades in first and/or master’s degrees, especially in research projects with a strong focus on computational data analysis.
- Skilled in implementing Python or R code for scientific data analysis.
- English language proficiency requirements.
Desirable
- Demonstrated experience in research projects applying supervised machine learning to address real-world biomedical challenges, particularly molecular property prediction.
- Proficiency with open-source cheminformatics toolkits for molecular representation, descriptor generation and data processing (e.g. RDKit, Open Babel).
- Experience using machine learning and deep learning frameworks for molecular modelling and drug discovery applications (e.g. Scikit-learn, DeepChem, TorchDrug, Caret).
- Familiarity with cancer cell line multi-omics databases and pharmacogenomic resources (e.g. CellMiner, CCLE, DepMap).
- Experience working with large-scale cancer cell line panels annotated with activities of molecules (e.g. NCI-60, GDSC, CCLE).
- Familiarity with public medicinal chemistry and bioactivity databases for drug discovery research (e.g. ChEMBL, SureChEMBL, PubChem, ZINC).
- Understanding of multi-task learning approaches for jointly modelling multiple biological or pharmacological endpoints, including neural-network-based multi-task prediction frameworks.
- Experience with representation learning and embedding methods for biological sequences and omics data, including nucleotide language models (e.g. DNABERT-2) and transcriptomic foundation models (e.g. Geneformer).
- Familiarity with transformer-based architectures for molecular representation learning and property prediction (e.g. Molformer, ChemBERTa, ESM-derived approaches).
What We Offer
The studentship covers living expenses at an enhanced tax-free rate of £23,805 per year. PhD tuition fees of £31,100 per year. Funding is for three years, with the possibility of extension to a fourth year.
Unlock this job opportunity
View more options below
View full job details
See the complete job description, requirements, and application process











