Scalable Value Alignment as a Foundation for AI Safety in Increasingly Agentic Systems

About the Project

Scalable value alignment as a foundation for AI Safety in increasingly agentic systems

As AI systems evolve from passive tools into autonomous agents capable of long-term planning and strategic behavior, aligning them with human values becomes increasingly complex. Value alignment (ensuring AI systems reliably pursue the outcomes humans actually intend) is a foundational challenge in AI safety.

Current alignment techniques, such as Reinforcement Learning from Human Feedback (RLHF), work well for systems with limited scope and clear tasks. However, these methods may falter as systems gain greater autonomy and planning capabilities over longer time horizons.

This PhD project will explore how alignment strategies must evolve for increasingly agentic AI systems, developing theory and practice to ensure a safe transition from narrow tools to autonomous agents.

Eligibility Criteria:

We welcome applications from candidates with a Computer Science background and technology graduates with (or anticipating) at least a 2.1 honours degree or equivalent. Applicants should have good computing skills, good understanding of machine learning theory and tools and an enthusiasm for coding, designing, and testing. They should be self-motivated and able to work independently and as part of a team.

Essential

Qualifications, Experience and Skills

An undergraduate degree in Computer Science, or related discipline with at least a 2.1 honours degree or equivalent.
Proficiency in programming, preferably in Python
Familiarity with machine learning frameworks (e.g. PyTorch, TensorFlow).

Attitude and Personality

Effective communication (oral and written) skills, presentation and training skills
Good interpersonal skills
Ability to work independently and as part of a team on research programmes
Ability to initiate, plan, organise, implement and deliver programmes of work
Willingness to learn new skills

Desirable

Background in artificial intelligence, machine learning, robotics, or computational cognitive science
Knowledge of multi-agent systems
Familiarity with Agentic AI, human-centred design
Experience on HPC environments for training large machine learning models.

Mode of attendance: Full time. Expected start date: available from now.

Informal enquiries about the studentship should be directed to Dr Marco Ortolani (m.ortolani@keele.ac.uk).

Scalable Value Alignment as a Foundation for AI Safety in Increasingly Agentic Systems

Post My Job

Stoke on Trent, United Kingdom