PhD Studentship: Cancer Data Driven Detection: Handling missing data in cancer risk prediction models
PhD Studentship: Cancer Data Driven Detection: Handling missing data in cancer risk prediction models
Applications are invited for a non-clinical PhD studentship (starting October 2026 or before) based within the Department of Public Health and Primary Care, University of Cambridge.
Academic supervisor: Professor Angela Wood, Professor of Biostatistics and Health Data Science
Academic supervisor department: Public Health and Primary Care
Project title - Cancer Data Driven Detection: Handling missing data in cancer risk prediction models
Background: Cancer Data Driven Detection (Cancer Data Driven Detection) is a new, multidisciplinary and multi-institutional strategic national research programme dedicated to using data to transform our understanding of cancer risk and enable early interception of cancers. It represents a major, multi-million-pound flagship investment (...) funded through a strategic programme award by Cancer Research UK (...), the National Institute for Health and Care Research (...), Engineering and Physical Sciences Research Council (...), and the Peter Sowerby Foundation (...); in partnership with Health Data Research UK (...) and the Economic and Social Research Council's Administrative Data Research UK programme (...).
Project description: Early cancer diagnosis is often challenging for patients presenting with vague, non-specific symptoms that may be linked to multiple cancer sites. This project aims to improve diagnostic decision-making in such patients by developing advanced, equitable cancer risk prediction models that effectively handle missing and incomplete symptom data recorded in electronic health records (EHRs). Missing symptom codes do not reliably indicate that the symptom did not occur, as symptom data are often incompletely captured. Symptom recording depends on multiple stages - from patient recognition and communication to clinician coding - each introducing opportunities for information loss. This missingness is not random: it is influenced by clinical factors, and additionally varies across demographic and geographic groups and may reflect broader inequalities in healthcare engagement and recording practices.
This project will systematically investigate how patterns of data completeness differ by patient and practice characteristics (e.g., age, sex, ethnicity, deprivation, and geography), how these patterns evolve over time, and how they influence cancer risk estimates. Understanding and addressing these biases is crucial to avoid exacerbating health inequalities through prediction models that disproportionately benefit advantaged groups.
Using large-scale linked electronic health record data, suitable models will be employed to identify determinants of missing data in symptoms, blood test results, and other key variables, accounting for clustering practice. These models will quantify the extent of variation and identify systematic differences in coding practices between providers and patient subgroups. Temporal analyses will assess how these patterns change over time.
Building on these findings, the project will quantify how different patterns of missingness may impact risk prediction model performance and calibration. Novel methods will be developed to incorporate incomplete or uncertain information, including delta-adjustment imputation and other approaches that explicitly model symptom recording probabilities. Emphasis will be placed on ensuring reproducibility, interpretability, and adaptability as data completeness evolves with changing healthcare practices.
The ultimate goal is to produce robust, fair, and clinically useful cancer risk prediction models that account for systematic biases in symptom data recording, ensuring that such models will benefit all patient groups equitably. The work will also contribute to the methodological literature on missing data, with wider applications in predictive modelling across healthcare. The student will gain expertise in statistical modelling, simulation, electronic health record data science, and fairness evaluation - skills directly aligned with modern data-driven cancer research and clinical translation.
Please refer to the attached document for further information on Outcomes and Research Environment
Requirements
Applicants are expected to hold at least a 2:1 undergraduate degree (or equivalent) in a relevant subject such as statistics, mathematics, computer science, engineering, or a related biomedical or population health discipline, and may also have a Master's degree in a quantitative or health data field. Applicants should be able to demonstrate excellent analytical and programming skills (for example in R or Python), experience working with health data, and an enthusiasm for interdisciplinary research that bridges data science, healthcare, and population health. Strong communication and teamwork skills are essential, and international applicants may need to provide evidence of English language proficiency.
We invite applications from UK and non-UK students who meets the UK residency requirements (home fees). International students who are able to confirm that additional costs of all overseas tuition fees will be covered through other scholarships or funding schemes will also be considered.
The studentship provides the UKRI 2026 stipend rate, currently £20,780 annually.
Further information on possible sources of support for non-UK applicants can be found at ... as well as through external funding opportunities.
Applicants must meet the University of Cambridge entrance requirements: see ....
How to apply
To apply please visit https://www.postgraduate.study.cam.ac.uk/courses/directory/cvphpdhpc and click 'Apply Now'
Course Details: PhD in Public Health & Primary Care (Full-time)
Start Date: October 2026, Michaelmas Term (or before)
Academic Supervisor(s): Professor Angela Wood, Department of Public Health and Primary Care
Research Title: Cancer Data Driven Detection: Handling missing data in cancer risk prediction models
In order to apply for this opportunity, you will need:
- Details of two academic referees (references will be taken up immediately).
- Transcript(s)
- CV/resume
- Evidence of competence in English
- Statement of Interest outlining your suitability, why you are interested in a PhD in this area, your background and research interests
Interview and Selection process
The deadline for applications is Monday 9th March 2026
Applicants will be notified of the outcome of their application by 16th March 2026
Shortlisted candidates will be invited to interview in the week commencing 23rd March 2026
Applicants will be notified of the outcome of their interview soon after.
For information about how your personal data is used as an applicant, please see the section on Applicant Data (...) on our HR web pages.
Please quote reference RH48811 on your application and in any correspondence about this vacancy.
The University actively supports equality, diversity and inclusion and encourages applications from all sections of society.
The University has a responsibility to ensure that all employees are eligible to live and work in the UK.
Key information
Department/location
Department of Public Health and Primary Care
Salary
Reference
RH48811
Category
Date published
10 February 2026
Closing date
9 March 2026
Download further information
Find Your Best Opportunity
Tell them AcademicJobs.com sent you!










