AcademicJobs.com Jobs

AcademicJobs.com

Applications Close:

Coventry

5 Star University

"PhD Studentship: Reinforcement Learning Tool for Large Language Model in Collaboration Games"

Academic Connect
Applications Close

PhD Studentship: Reinforcement Learning Tool for Large Language Model in Collaboration Games

PhD Studentship: Reinforcement Learning Tool for Large Language Model in Collaboration Games

Coventry University Group

Qualification Type:PhD
Location:Coventry
Funding for:UK Students, EU Students, International Students
Funding amount:Not Specified
Hours:Full Time
Placed On:2nd September 2025
Closes:25th October 2025

Can We Teach AI to Outsmart Humans in the Werewolf Game—Without Changing the AI Itself?

Large Language Models (LLMs) have dazzled us with their ability to converse, code, and create—but they still struggle in areas where humans excel: reasoning about other players, forming alliances, and making long-term strategic decisions. A prime example? The social deduction game Werewolf (also known as Mafia). Even the most advanced AI systems falter against skilled human players.

That’s about to change.

In the same way AlphaGo revolutionised board game AI by teaching itself to play Go at a superhuman level, our project seeks to bring self-learning to LLMs. But there’s a catch—unlike Go, there’s no easy way to score an LLM’s conversational move. In Go, the score is clear. In open-ended language games? Not so much.

The Breakthrough Idea

Instead of trying to score every AI utterance directly, we focus on game outcome—whether the villagers win, whether the werewolves outwit everyone. We break each playthrough into partial game logs and link them to the final result. This gives us grounded, reliable feedback: we know which sequences of actions led to a win or a loss.

From these partial logs, we learn a “hidden” (latent) state representation of the game and a way to map that state to a value—essentially, how good the situation is for the AI at any given point. Once we have this, the AI can sample a range of possible next actions and evaluate their likely impact before deciding which to play.

Here’s the twist: all this happens outside the LLM. We don’t fine-tune the model, retrain it, or alter its weights. We simply wrap it with a clever layer of reasoning and evaluation. It’s like giving the AI a strategic co-pilot that helps it think ahead without changing its core personality.

Why This Matters

The implications reach far beyond a single parlour game. This approach gives LLMs the ability to account for the long-term consequences of their actions—something they currently find challenging. Imagine AI collaborators that:

  • Negotiate more effectively by anticipating the downstream effects of each statement.
  • Support complex decision-making in domains where success depends on multi-step strategy.
  • Learn to adapt through experience without costly retraining.

By proving the method in the challenging, high-interaction world of Werewolf, we create a benchmark for measuring and improving AI strategic reasoning. If it works there, it can work in corporate negotiations, policy simulations, cooperative robotics, and beyond.

Why Werewolf?

It’s a perfect storm for testing AI intelligence: incomplete information, shifting alliances, deceptive moves, and the need to read subtle cues in conversation. Winning isn’t about calculating a single best move—it’s about thinking several moves ahead, predicting how others will respond, and adjusting on the fly. That’s exactly the kind of capability we want LLMs to develop.

Join the Next Leap in AI Learning

This project offers a new pathway to grow AI intelligence—one that builds foresight into systems without modifying their underlying architecture. We’re bridging the gap between raw language ability and deep strategic reasoning.

Just as AlphaGo changed our understanding of what was possible in AI for games, we aim to change what’s possible for AI in collaboration, persuasion, and long-term planning. And it all starts with a simple, deceptively difficult question:

10

Whoops! This job is not yet sponsored…

Pay to Upgrade Listing

Or, view more options below

View full job details

See the complete job description, requirements, and application process

Stay on their radar

Join the talent pool for AcademicJobs.com

Join Talent Pool

Express interest in this position

Let AcademicJobs.com know you're interested in PhD Studentship: Reinforcement Learning Tool for Large Language Model in Collaboration Games

Add this Job Post to FavoritesExpress Interest

Get similar job alerts

Receive notifications when similar positions become available

Share this opportunity

Send this job to colleagues or friends who might be interested

119 Student / Phd Jobs Jobs Found
View All
View More