Computational Neural Networks (CNNs), foundational large language models (LLMs), and pretrained transformers (GPTs) for phylogeny, biodiversity and functional genomics
About the Project
Artificial intelligence will transform the identification, measurement, conservation and exploitation of biodiversity and the understanding and applications that can come from genomics, transcriptomics and genome assemblies. We have a range of opportunities with several directions to co-develop projects using CNNs and transformers to build foundational, multi-modal language models for plant genomics, phylogenetics and functional trait analysis. The projects will involve appropriate collaborators from major Institutions that are appropriate to the project.
We are keen to develop projects in directions of interest to you, as a prospective PhD student. We will work with you in developing the project (with you and others, including funding agencies). You will be expected to have ownership of the project, shaping the research direction, and developing your own ideas during the course of your PhD. While the focus of the work will involve developing computational approaches, there will also be opportunities for some plant collection and genomics laboratory work.
One area would envisage working with the digitized herbarium and CNNs of images to extract imaging layers with the maximum information regarding taxonomy, phylogeny, phenotype, genotype, biogeography, biodiversity for all plant species. The results of this CNN/analysis would give a model trained on curated scientific knowledge and datasets. Typically, with more than a million specimens. These provide a multimodal, structured dataset with images, text labels, geographic metadata, dates, collectors, and known taxonomies. The results will develop phylogenies and identification of species in the digital herbarium, giving information about functions traits of conservation and functional importance.
Another area would focus on genomics, linking using RNNs, transformers, and tokenization approaches to develop foundational large language models and pretrained transformers (GPTs) for genomics, identifying genes and regulators. As well as improved assembly approaches, using long molecule sequencing, we will aim to identify the complex aspects of genomes, such as structural variants, interactions of genomes in polyploids, epigenetic and methylation modulation, and evolution of rapidly evolving genome components. We would aim to develop AI approaches for finding ‘missing heritability’ in crop plant breeding, and for analysis of multi-genome systems – the soil microbiome, forage plants, digestive system microbiomes and animal genetics for example.
Training will be focussed around your interests, aptitude, and career development objectives, whether in academia, National Agricultural Research Organizations, industry, or government and policy. In particular, we will build on your current expertise and experience (expected to include substantial knowledge of Linux and computational systems), broadening it to cover the full range of work in a modern bioinformatics and genomics laboratory, and covering aspects of laboratory or project management and intellectual property, maintaining our commitment to open science. The Agricultural Genomics group is very international, with collaborations throughout the world giving a global perspective to our work.
Outputs will include publications in high-profile international scientific journals, presentations to international conferences, discussions with end-users, and outreach with media opportunities to disseminate the research and its importance to a wider audience. The work can underpin extraction of functionally applicable data from herbarium specimens. Other areas will lead to development of strategies to conserve and exploit biodiversity, and aim to identify key traits and the underlying genetics to improve sustainability and reduce the environmental impact of agriculture, ensuring food security.
Unlock this job opportunity
View more options below
View full job details
See the complete job description, requirements, and application process


