Academic Jobs Logo
University of Cambridge  Jobs

PhD Studentship in Monitoring and Increasing LLM Safety

Applications Close:

University of Cambridge

Cambridge

Academic Connect
5 Star Employer Ranking
Is this job right for you? View Vital Job Information and Save Time

PhD Studentship in Monitoring and Increasing LLM Safety

PhD

30 July 2026

Location

Cambridge

University of Cambridge

Type

Fully-funded Studentship

Salary

Fully-funded (fees + maintenance)

Visa Sponsorship

Required Qualifications

First-class degree in Engineering or related
Software development experience
LLM research experience

Research Areas

LLM Safety
Mechanistic Interpretability
Chain-of-Thought (CoT) Faithfulness
Encoded Reasoning
86% Job Post Completeness

Our Job Post Completeness indicates how much vital information has been provided for this job listing. Academic Jobs has done the heavy lifting for you and summarized all the important aspects of this job to save you time.

PhD Studentship in Monitoring and Increasing LLM Safety

PhD Studentship in Monitoring and Increasing LLM Safety

LLMs are becoming more capable and society increasingly relies on them. This makes it important to ensure LLMs are safe. In this PhD you can use a variety of approaches, such as white-box mechanistic interpretability and black-box behavioural research to evaluate the safety of LLMs, monitor their behaviour at inference time, as well as devise strategies for reducing risk from LLMs. Initially, this PhD will focus on increasing CoT faithfulness and mitigating encoded reasoning.

This PhD is funded by Coefficient Giving, which has the following focus areas https://coefficientgiving.org/tais-rfp-research-areas/#6-encoded-reasoning-in-cot-and-inter-model-communication

The first 1.5 years of this PhD are scoped out and will be about investigating and carrying out either project 1 or project 2 (described below). After these projects have been completed to the highest standard, you will together with your supervisor and Coefficient Giving decide how to proceed, and what to investigate next.

Project 1: Test for straightforward meaning of CoT and mitigate deceptive behaviour via "perturbation methods".

First apply a CoT perturbation method (e.g. applying paraphrasing to intermediate outputs). You then compare the final outputs after the CoT is perturbed with baseline final outputs. Performance deterioration after applying perturbation methods, indicates the model was using words in the CoT in a non-straightforward way. If you find performance deterioration after applying perturbation methods, the next step is investigating (for example using mechanistic interpretability) the underlying cause e.g. the model using a secret code or prompt hacking itself.

Project 2: Train for transparency using a human predictor

Use a human (or AI imitating human behavior, e.g. an LLM) to evaluate whether the final model outputs (and counterfactual outputs) can be predicted based on the CoT. The accuracy of this human predictor is a measure of reasoning transparency and can be used as reward during training.

Qualifications required (edit as necessary): Applicants should have (or expect to obtain by the start date) at least a first degree in an Engineering or related subject.

Ideally applicants have some experience with either software development projects or research on LLMs.

This is a fully-funded studentship (fees and maintenance) to cover a home or overseas candidate.

To apply for this studentship, please upload your two page CV and research proposal in this form https://forms.gle/Cm3MWPsWta73J2Gp7. The form responses will be evaluated on a rolling basis

Please note that any offer of funding will be conditional on securing a place as a PhD student. Candidates will need to apply separately for admission through the University's Graduate Admissions application portal; this can be done before or after applying for this funding opportunity. The applicant portal can be accessed via: www.graduate.study.cam.ac.uk/courses/directory/egegpdpeg. University Postgraduate Admissions closing dates are 14 May for October start and 30 July for January start, although it is advisable to apply earlier than this. Please note that there is an application fee of £20 to apply via the Postgraduate Application Portal.

The University actively supports equality, diversity and inclusion and encourages applications from all sections of society.

Key information

Department/location

Department of Engineering

Salary

Reference

NM49585

Category

Studentships

Date published

30 April 2026

Closing date

30 July 2026

Tell them AcademicJobs.com sent you!

Frequently Asked Questions

🎓What qualifications are required for this PhD studentship?

Applicants must have (or expect to obtain) a first-class degree in Engineering or a related subject. Ideally, candidates have experience in software development projects or LLM research. This fully-funded opportunity suits motivated researchers interested in AI safety. Explore more research jobs at AcademicJobs.com.

📝How do I apply for this LLM safety PhD studentship?

Upload your two-page CV and research proposal via the application form: Google Form. Applications are evaluated on a rolling basis. Secure a PhD place separately through the University of Cambridge portal (£20 fee). Check research assistant jobs for similar roles.

🔬What are the research projects in this PhD?

The first 1.5 years focus on Project 1: Testing CoT faithfulness via perturbation methods (e.g., paraphrasing) and mitigating deceptive behavior using mechanistic interpretability. Or Project 2: Training for transparency with a human/AI predictor evaluating reasoning predictability. Funded by Coefficient Giving on encoded reasoning. See related postdoc advice.

🌍Is this PhD studentship open to international students?

Yes, it's a fully-funded studentship covering fees and maintenance for both home and overseas candidates, indicating visa sponsorship potential. Confirm via University admissions. Ideal for global talent in LLM safety research. Browse scholarships for more funding options.

What is the application deadline and start dates?

Funding closes 30 July 2026; University admissions deadlines are 14 May for October start or 30 July for January start—apply early. Rolling evaluation for funding. Reference: NM49585. View Cambridge research jobs for updates.

🚀What happens after the initial PhD projects?

Post-1.5 years (after excelling in Project 1 or 2), collaborate with your supervisor and Coefficient Giving to define next steps in LLM safety monitoring, such as inference-time behavior or risk reduction. Focus areas align with TAIS RFP. Prepare with our research tips.
216 Jobs Found
View More