Scalable Value Alignment as a Foundation for AI Safety in Increasingly Agentic Systems
About the Project
Scalable value alignment as a foundation for AI Safety in increasingly agentic systems
As AI systems evolve from passive tools into autonomous agents capable of long-term planning and strategic behavior, aligning them with human values becomes increasingly complex. Value alignment (ensuring AI systems reliably pursue the outcomes humans actually intend) is a foundational challenge in AI safety.
Current alignment techniques, such as Reinforcement Learning from Human Feedback (RLHF), work well for systems with limited scope and clear tasks. However, these methods may falter as systems gain greater autonomy and planning capabilities over longer time horizons.
This PhD project will explore how alignment strategies must evolve for increasingly agentic AI systems, developing theory and practice to ensure a safe transition from narrow tools to autonomous agents.
Eligibility Criteria:
We welcome applications from candidates with a Computer Science background and technology graduates with (or anticipating) at least a 2.1 honours degree or equivalent. Applicants should have good computing skills, good understanding of machine learning theory and tools and an enthusiasm for coding, designing, and testing. They should be self-motivated and able to work independently and as part of a team.
Essential
Qualifications, Experience and Skills
- An undergraduate degree in Computer Science, or related discipline with at least a 2.1 honours degree or equivalent.
- Proficiency in programming, preferably in Python
- Familiarity with machine learning frameworks (e.g. PyTorch, TensorFlow).
Attitude and Personality
- Effective communication (oral and written) skills, presentation and training skills
- Good interpersonal skills
- Ability to work independently and as part of a team on research programmes
- Ability to initiate, plan, organise, implement and deliver programmes of work
- Willingness to learn new skills
Desirable
- Background in artificial intelligence, machine learning, robotics, or computational cognitive science
- Knowledge of multi-agent systems
- Familiarity with Agentic AI, human-centred design
- Experience on HPC environments for training large machine learning models.
Mode of attendance: Full time. Expected start date: available from now.
Informal enquiries about the studentship should be directed to Dr Marco Ortolani (m.ortolani@keele.ac.uk).
Unlock this job opportunity
View more options below
View full job details
See the complete job description, requirements, and application process


