Shrinking LLMs using Principled Training Approaches

About the Project

Large Language Models (LLMs) are machine learning models trained for general-purpose tasks that can be modelled using text, images, or both. These models demonstrate emerging abilities that result mainly from unsupervised pre-training on vast amounts of data. However, with parameter counts in the billions and datasets often exceeding a trillion tokens, they are expensive to train and run, making them inaccessible for small groups with limited resources. Some works suggest that state-of-the-art LLMs may be significantly larger than needed due to inefficiencies in the training process. The Microsoft TinyStories showed that even tiny models can produce coherent short stories provided that the generated text has a limited vocabulary [1]. The Phi1.5 papers then went further, showing that given a high-quality dataset, models with 1-2 billion parameters can sometimes match those 2-3 times larger in size [2,4].

This Ph.D. project aims to investigate if a small model (less than 100M parameters) can be trained that maintains a subset of the emerging abilities (e.g., language understanding, common sense, and reasoning skills) exhibited by modern LLMs [GPT-4o]. This will involve careful modifications to the datasets and training trajectory [3]. Furthermore, the project seeks to establish reliable principles for training such models. It will also focus on developing smaller models tailored for domain-specific tasks, for example, secure code generation.

Academic qualifications

Have, or expect to achieve by the time of start of the studentship a first-class honours degree, or a distinction at master level, ideally in Computer Science, Data Science or Artificial Intelligence, with a good fundamental knowledge of Natural language processing (NLP), Deep learning, Large language models (LLM)

English language requirement

IELTS score must be at least 6.5 (with not less than 6.0 in each of the four components). Other, equivalent qualifications will be accepted. Full details of the University’s policy are available online.

Essential attributes:

Experience of fundamental knowledge of NLP and Deep learning.
Competent in Shell scripting, Python, and PyTorch.
Knowledge of Seq2Seq models, Transformers, LLMs, and Machine learning.
Good written and oral communication skills.
Strong motivation, with evidence of independent research skills relevant to the project
Good time management

Desirable attributes:

Experience of NLP Tools, such as BERT, GPT, NanoGPT, and Phi1.5

APPLICATION CHECKLIST

Completed application form
CV
2 academic references, using the Postgraduate Educational Reference Form (download)
Research project outline of 2 pages (list of references excluded). The outline may provide details about:
1. Background and motivation of the project. The motivation, explaining the importance of the project, should be supported also by relevant literature. You can also discuss the applications you expect for the project results.
2. Research questions or objectives.
3. Methodology: types of data to be used, approach to data collection, and data analysis methods.
4. List of references.
The outline must be created solely by the applicant. Supervisors can only offer general discussions about the project idea without providing any additional support.
Statement no longer than 1 page describing your motivations and fit with the project.
Evidence of proficiency in English (if appropriate)

To be considered, the application must use

the advertised title as project title

When applying click here

For informal enquiries about this PhD project, please contact Dr Md Zia Ullah - M.Ullah@napier.ac.uk

Shrinking LLMs using Principled Training Approaches

Post My Job

Edinburgh, United Kingdom