About the Project

In this project, we aim to use biostatistical and AI tools to identify key risk factors that protect specific cancer-free individuals from developing cancer compared with individuals at high risk. Typically, this group is unlikely to be found, let alone to be observed longitudinally. The ACCURATE project aims to identify cancer avoiders from population-level cancer screening data that contains health, social status, lifestyle, and genetics-related information. Our approach will address why some high-risk groups will be cancer-free and how they’re ageing with a lower incidence rate of cancer. By constructing longitudinal observations, we can monitor, identify and study a cancer-free cohort, ultimately helping to devise cancer prevention strategies for the general population.

To complete the project, several steps are essential: First, collect and clean the datasets, potentially candidates include the Surveillance, Epidemiology, and End Results (SEER) data in the US, which contains information on cancer patients from 1973. In addition, the National Family Health Survey data from India, starting from 1992, also includes blood analysis information. There might also be other suitable datasets to explore. Second, merge the datasets using different algorithms, including matching algorithms, deep learning, and probabilistic linkage. Third, the merged dataset will be analysed using various biostatistical and AI techniques. The convenience and snowball sampling methods might be needed, as might subsampling. During the process, novel approaches will be developed to mitigate complex patterns (e.g., incomplete observations, competing risks, internal consistency across different countries and participants, issues arising from fuzzy matching in the multimodal data). The outcome will lead to the developing of novel multidisciplinary techniques and decent publications.

Name of primary supervisor/CDT lead:
Peng Liu p.liu4@lboro.ac.uk

Entry requirements:
Ideally, students need to have a background in mathematics or statistics/biostatistics. Applicants should have, or expect to achieve, at least a 2:1 Honours degree (or equivalent) in a mathematics-related degree. A strong computational skill with various software (e.g., Python, R, MATLAB, C, C++, SQL) is also required.

English language requirements:
Applicants must meet the minimum English language requirements. Further details are available on the International website.

Bench fees required: No

Closing date of advert: 1st August 2026

Start date: April 2026, July 2026, October 2026, July 2027

Full-time/part-time availability: Full-time 3 years, Part-time 6 years

Fee band: 2025/26 Band RA (UK £5,006, International £22,360)

How to apply:
All applications should be made online. Under programme name, select Mathematical Sciences. Please quote the advertised reference number: MA/PL-SF1/2026 in your application.
To avoid delays in processing your application, please ensure that you submit a CV and the minimum supporting documents.
The following selection criteria will be used by academic schools to help them make a decision on your application. Please note that this criteria is used for both funded and selffunded projects.
Please note, applications for this project are considered on an ongoing basis once submitted and the project may be withdrawn prior to the application deadline, if a suitable candidate is chosen for the project.

Project search terms:
artificial intelligence, statistics

Email Address Sci:
sci-pgr@lboro.ac.uk

Avoiding Cancer Cells via Unbiased Resampling from Annual Tabulated Electronic screening data (ACCURATE) (Ref: MA/PL-SF1/2026)

Loughborough University

Epinal Way, Loughborough LE11 3TU, UK