Safeguarding Language Models: Engineering Protective Responses to Mental Health Crises
About the Project
Project Description
Large Language Models (LLMs) are increasingly embedded in contemporary digital environments, yet their handling of crisis-related interactions remains inconsistent and carries substantial safety risks. When individuals experiencing suicidal thoughts, self-harm impulses, or acute psychological distress engage with AI systems seeking assistance, models may deliver unsuitable guidance, miss critical severity indicators, or unintentionally reference dangerous approaches. This vulnerability is particularly pronounced among at-risk groups who may depend on AI platforms as their principal—or sole—source of support during acute crises.
This project seeks to construct reliable safeguards enabling LLMs to furnish appropriate, supportive responses when processing suicide and other mental-health crisis related material while retaining performance across typical conversational contexts. The research programme will centre on creating and validating an integrated suite of safety interventions to mitigate harmful model outputs during mental health emergencies. Central to this work will be exploring representation-level steering methods and allied techniques from the representation engineering literature that can modulate model representations to suppress unsafe outputs while preserving supportive, compassionate communication capabilities. Additional approaches will examine targeted fine-tuning, prompt-engineering safeguards, and response-level filtering mechanisms. This layered architecture acknowledges that individual techniques have inherent limitations in high-consequence domains and that effective protection demands multiple overlapping defences.
The empirical approach will draw on heterogeneous data including user-generated content conveying suicidal expressions, professional crisis support dialogues, and structured adversarial testing across multiple crisis presentations. Engagement with domain specialists in Social Care, Medicine, and Social Sciences will ground safety measures in established clinical frameworks for suicide prevention and emergency response. This transdisciplinary partnership proves vital for establishing assessment criteria balancing technical efficacy with clinical alignment.
Comprehensive evaluation constitutes a cornerstone, encompassing diverse test scenarios representing varied crisis manifestations, geographic/linguistic contexts, and demographic groups—with emphasis on populations historically underrepresented in AI research. Assessment will verify not merely the suppression of harmful outputs, but the generation of genuinely beneficial responses: appropriate crisis recognition, authentic emotional responsiveness, and directing individuals toward professional assistance. Investigation will include resilience against adversarial manipulation and recognition of edge cases triggering false protective alerts.
The project aims to produce practical outputs: safety layers for integration into existing LLMs, evaluation frameworks for assessing crisis response capabilities, and guidelines for responsible AI development in high-stakes contexts. Throughout, we will address ethical considerations including privacy, the balance between safety and autonomy, and the appropriate scope of AI involvement, ensuring technical solutions complement rather than replace human support.
How to Apply
This project is accepting applications all year round, for self-funded candidates.
Mode of Study: Full-time or part-time
Please submit your application via Computer Science and Informatics - Study - Cardiff University
In the funding field of your application, indicate “I am applying for a self-funded PhD in Computer Science and Informatics”, and specify the project title and supervisors of this project in the text box provided.
Academic criteria: A 2:1 Honours undergraduate degree or a master's degree, in computing or a related subject. Applicants with appropriate professional experience are also considered. Degree-level mathematics (or equivalent) is required for research in some project areas.
Applicants must demonstrate English language proficiency. Students who do not have English as a first language must prove this by obtaining an IELTS score of at least 6.5 overall, with a minimum of 6.0 in each skills component. A full list of accepted qualifications is available here: https://www.cardiff.ac.uk/study/international/english-language-requirements/postgraduate
If you are interested, please contact Dr Carla Perez Almendros (perezalmendrosc@cardiff.ac.uk) sending your CV in the first instance. The application process requires you to develop an individual research proposal jointly with the supervision team, which builds on the information provided in this advert.
Once you have developed the proposal with support from the supervisors, please submit your application following the instructions provided below.
Please submit your application via Computer Science and Informatics - Study - Cardiff University
In order to be considered candidates must submit the following information:
- In the ‘Research Proposal’ section of the application enter the name of the project you are applying to and upload your Individual research proposal. Your research proposal should not exceed 2000 words, including references and bibliography.
- A personal statement (as part of the university application form, or as a separate attachment, if you prefer).
- A CV. Guidance on CVs for a PhD position can be found on the FindAPhD website.
- Qualification certificates and Transcripts - original and English translation, if applicable.
- References x 2 which should be academic references. Please note you need to provide the reference documents as part of your application.
- Proof of English language (if applicable).
Interview– If the application meets all of the entrance requirements listed above, you will be invited to an interview.
Funding Notes
This project is offered for self-funded students only, or those with their own sponsorship or scholarship award. Where applicable, candidates will be required to cover the cost of a student visa, the healthcare surcharge, and any other costs of moving to the UK to study. These costs will not be covered by the School of Computer Science and Informatics.
Unlock this job opportunity
View more options below
View full job details
See the complete job description, requirements, and application process


