About the Project

Project Description

Human evaluation remains a central component of assessing modern natural language processing (NLP) systems, particularly for tasks where correctness, usefulness, or appropriateness cannot be fully captured by automatic metrics. However, existing human evaluation practices are often costly, inconsistent, difficult to reproduce, and poorly adapted to the scale and diversity of contemporary NLP systems. This PhD project investigates how human evaluation can be redesigned as a scalable, reliable, and methodologically rigorous component of NLP research.

The project treats human evaluation itself as a research problem, rather than as a supporting step. It focuses on how evaluation protocols are designed, implemented, and interpreted, and how they interact with automatic metrics and model-based evaluation approaches.

Aims and Methods

The student will design, implement, and empirically test human evaluation frameworks across a range of NLP tasks, for example translation quality assessment, summarisation evaluation, educational feedback generation, or other text production and transformation tasks. These tasks are indicative, and the project is not fixed to any single application domain, allowing the candidate to shape and drive the research in line with their interests.

The research will compare different evaluation settings, including expert-based, crowd-based, and hybrid approaches. It will examine the impact of task design choices such as the use of structured rubrics, pairwise or listwise comparisons, adaptive task assignment, and annotator training and calibration. The project will also study aggregation methods, agreement measures, and quality control strategies, with the goal of improving reliability and interpretability of human judgments.

In addition, the project may explore how human evaluation can be combined with automatic metrics or model-based judges in principled ways, and how the strengths and limitations of each approach can be characterised through meta-evaluation.

Potential application domains include, but are not limited to, education, translation, summarisation, public-sector communication, or social care, and may involve high-resource or low-resource languages.

Deliverables (indicative)

General-purpose human evaluation frameworks applicable across NLP tasks
Empirical studies on evaluation reliability, cost, and scalability trade-offs
Open-source tools for running and analysing large-scale human evaluation
Public datasets with carefully designed and documented human annotations

Keywords

Human evaluation, NLP methodology, annotation design, reliability, scalability, human-AI alignment, human-centred AI methods

How to Apply

This project is accepting applications all year round, for self-funded candidates.

Mode of Study: Full-time or part-time

Please submit your application via Computer Science and Informatics - Study - Cardiff University

In the funding field of your application, indicate “I am applying for a self-funded PhD in Computer Science and Informatics”, and specify the project title and supervisors of this project in the text box provided.

Academic criteria: A 2:1 Honours undergraduate degree or a master's degree, in computing or a related subject. Applicants with appropriate professional experience are also considered. Degree-level mathematics (or equivalent) is required for research in some project areas.

Applicants must demonstrate English language proficiency. Students who do not have English as a first language must prove this by obtaining an IELTS score of at least 6.5 overall, with a minimum of 6.0 in each skills component. A full list of accepted qualifications is available here: https://www.cardiff.ac.uk/study/international/english-language-requirements/postgraduate

If you are interested, please contact Dr Fernando Alva Manchego (alvamanchegof@cardiff.ac.uk) sending your CV in the first instance. The application process requires you to develop an individual research proposal jointly with the supervision team, which builds on the information provided in this advert.

Once you have developed the proposal with support from the supervisors, please submit your application following the instructions provided below.

Please submit your application via Computer Science and Informatics - Study - Cardiff University

In order to be considered candidates must submit the following information:

In the ‘Research Proposal’ section of the application enter the name of the project you are applying to and upload your Individual research proposal. Your research proposal should not exceed 2000 words, including references and bibliography.
A personal statement (as part of the university application form, or as a separate attachment, if you prefer).
A CV. Guidance on CVs for a PhD position can be found on the FindAPhD website.
Qualification certificates and Transcripts - original and English translation, if applicable.
References x 2 which should be academic references. Please note you need to provide the reference documents as part of your application.
Proof of English language (if applicable).

Interview– If the application meets all of the entrance requirements listed above, you will be invited to an interview.

Funding Notes

This project is offered for self-funded students only, or those with their own sponsorship or scholarship award. Where applicable, candidates will be required to cover the cost of a student visa, the healthcare surcharge, and any other costs of moving to the UK to study. These costs will not be covered by the School of Computer Science and Informatics.

Towards Scalable and Reliable Human Evaluation Frameworks for NLP Systems

Post My Job

Cardiff, United Kingdom