MAPS - Machine-learning Assisted Prioritisation of natural product Scaffolds
About the Project
These projects are open to students worldwide, but have no funding attached. Therefore, the successful applicant will be expected to fund tuition fees at the relevant level (home or international) and any applicable additional research costs. Please consider this before applying.
Natural products (NPs) have historically been the foundation of modern medicine, from penicillin to paclitaxel [1]. Yet, traditional NP discovery has become increasingly inefficient. Current pipelines rely on large-scale bioassays that are labour-intensive, frequently rediscover known compounds, and often yield molecules with poor safety profiles [2]. The central challenge lies in efficiently identifying the most promising scaffolds before investing in resource-heavy lab work.
Recent advances offer a paradigm shift. Machine learning models, particularly graph neural networks like the Directed Message Passing Neural Network (DMPNN), can learn directly from molecular structures, capturing complex chemical features without manual descriptors and enabling predictive prioritisation [3-4]. In parallel, genome mining has revealed biosynthetic gene clusters (BGCs) that encode the enzymatic machinery responsible for NP structural diversity, effectively linking compound chemistry to its genetic origins [5]. The potential of this approach is illustrated by cases like hygromycin A, an overlooked compound recently repurposed for selective Lyme disease treatment [6]. The MAPS framework seeks to systematise such discoveries by computationally flagging under-explored scaffolds to guide both repurposing and novel analogue design.
Research Design and Methodology
This research will establish an integrated computational-to-experimental pipeline over a three-year postgraduate study, structured into three phases aligned with annual milestones.
Year 1: The first phase focuses on constructing robust ML models to predict antimicrobial activity and cytotoxicity. A DMPNN model will be trained on carefully curated datasets from public resources such as PubChem and NPASS. Following rigorous data cleaning, model performance will be evaluated using robust statistical measures like receiver operating characteristic (ROC) curves and the Matthews Correlation Coefficient (MCC), which are well-suited to imbalanced biological data. The deliverable will be a validated predictive tool capable of identifying promising scaffolds with high confidence.
Year 2: The second phase applies the trained model to screen extensive repositories of known natural products, such as Dictionary of Natural Products. This will generate ranked lists of scaffolds predicted to have strong antimicrobial potential and low cytotoxicity. This computational prioritisation will be enriched by genome mining. For top-ranked scaffolds, corresponding BGCs will be identified in genomic databases, creating a crucial "scaffold-to-gene" link. This step not only validates predictions biologically but also annotates tailoring enzymes—such as oxidases and glycosyltransferases—that introduce functionally significant modifications, revealing potential for structural diversification.
Year 3: The final phase integrates prediction with experimental validation. Enzymatic annotations from Year 2 will be used to computationally design libraries of plausible analogues—compounds that could be generated by biosynthetic enzymes but are not yet known. These candidates will be re-screened in silico using the ML model, creating an iterative cycle of scaffold optimisation. The most promising candidates will then be selected for experimental validation, focusing on readily accessible microbial strains. Efforts will include isolation, structural elucidation, and standard antimicrobial testing. The successful validation of at least one MAPS-predicted candidate will provide critical proof-of-concept for the entire workflow.
This is a cross-disciplinary project involving elements of organic chemistry, molecular biology, biochemistry, pharmacy and protein chemistry. In this project, you will gain experience in chemical synthesis of small molecules, cloning genes for heterologous expression, as well as using the purified enzymes to carry out biotransformation reactions. You will learn chemical characterization by LC-MS and NMR analyses. You will gain experience in organic chemistry, microbiology.
Decisions will be based on academic merit. The successful applicant should have, or expect to obtain, a UK Honours Degree at 2.1 (or equivalent) in Organic chemistry, microbiology or biochemistry, pharmacy with strong interests in machine learning and basic knowledge of using GitHub.
Informal enquiries can be made my contacting Dr Deng at h.deng@abdn.ac.uk
Application Procedure:
Formal applications can be completed online: https://www.abdn.ac.uk/pgap/login.php.
You should apply for Degree of Doctor of Philosophy in Chemistry to ensure your application is passed to the correct team for processing.
Please clearly note the name of the lead supervisor and project titleon the application form. If you do not include these details, it may not be considered for the project.
Your application must include: A personal statement, an up-to-date copy of your academic CV, and clear copies of your educational certificates and transcripts.
Please note: you do not need to provide a research proposal with this application.
If you require any additional assistance in submitting your application or have any queries about the application process, please don't hesitate to contact us at researchadmissions@abdn.ac.uk
Funding Notes
This is a self-funding project open to students worldwide. Our typical start dates for this programme are February or October.
Fees for this programme can be found here Finance and Funding | Study Here | The University of Aberdeen
Additional research costs of £6,000 per annum will be required in addition to tuition fees.
References
1. J. Nat. Prod., 2020, 83, 770.
2. Nat. Rev. Drug Discov., 2021, 20, 200.
3. Cell, 2025, 188, 1.
4. J. Chem. Inf. Model., 2024, 6.
5. Nucleic Acids Res., 2023, 51, W46.
6. Cell, 2021, 184, 5405.
Unlock this job opportunity
View more options below
View full job details
See the complete job description, requirements, and application process








