Mitigating LLM-Generated Disinformation via Machine Unlearning: Toward Safer Generative AI

About the Project

False and harmful narratives propagated by large language models threaten the integrity of information in elections, public health, and crisis response. This PhD addresses that risk by creating principled, auditable ways to remove or neutralise targeted misinformation behaviours while preserving overall model utility. The work moves beyond after-the-fact filtering by changing what the model retains, delivering durable, regulator-aligned mitigations suitable for open-weight and proprietary systems.

Objectives

Define high-risk narrative taxonomies and target selection criteria across key domains.
Establish auditable workflows for precise knowledge removal or attenuation with minimal collateral impact.
Ensure the durability of mitigations under model updates and adversarial attempts to restore harmful content.
Quantify utility retention and safety gains with reproducible, decision-grade metrics.
Produce governance artefacts (risk registers, edit logs, audit trails) aligned to EU AI Act/GDPR.
Translate findings into actionable guidance for developers, safety teams, and regulators.

Expected Outcomes

A safety toolkit enabling precise, auditable updates/removals with reproducible pipelines.
An evaluation suite and red-teaming protocol measuring effectiveness, side-effects, and long-term persistence.
Public releases: code, benchmark reports, and policy guidance; datasets where permissible.

Impact

Equips AI developers, trust-and-safety teams, and regulators with practical, evidence-based controls to reduce real-world harm from LLM-generated disinformation.

Academic qualifications

First degree (minimum 2:1 classification) in Computer Science, Machine Learning, Artificial Intelligence

English language requirement

IELTS score must be at least 6.5 (with not less than 6.0 in each of the four components). Other, equivalent qualifications will be accepted.

Essential attributes:

Fundamental knowledge of Generative AI and Natural Language Processing
Experience in fundamental machine learning
Competent in programming and critical analysis
Knowledge of the security and privacy of machine learning
Good written and oral communication skills
Strong motivation, with evidence of independent research skills relevant to the project
Good time management

Desirable attributes:

Programming experience in Python and Machine Learning frameworks (e.g., TensorFlow or Keras)
Good knowledge of deep learning, natural language processing, etc.
Experience in Generative AI

APPLICATION CHECKLIST

Completed application form
CV
2 academic references, using the Postgraduate Educational Reference Form (download)
Research project outline of 2 pages (list of references excluded). The outline may provide details about
1. Background and motivation of the project. The motivation, explaining the importance of the project, should be supported also by relevant literature. You can also discuss the applications you expect for the project results.
2. Research questions or objectives.
3. Methodology: types of data to be used, approach to data collection, and data analysis methods.
4. List of references.
The outline must be created solely by the applicant. Supervisors can only offer general discussions about the project idea without providing any additional support.
Statement no longer than 1 page describing your motivations and fit with the project.
Evidence of proficiency in English (if appropriate)

To be considered, the application must use the advertised title as project title

For informal enquiries about this PhD project, please contact z.tan@napier.ac.uk

PhD Start Date: October 2026

Funding Notes

International applicants should note that visa application costs and the NHS health surcharge are additional costs to be taken into consideration, and successful applicants will need to cover these expenses themselves.

Mitigating LLM-Generated Disinformation via Machine Unlearning: Toward Safer Generative AI

Post My Job

Edinburgh