About the Project

The development of AI-based methodology and architectures to improve financial, environmental and health care prediction accuracy, enhance data reliability, and support evidence-based policy has become the priorities of many countries including the UK.

This project aims to provide novel time series machine learning (TSML) methodology for imputation of missing values and forecasting high-dimensional data. The research is particularly innovative for the discrete-valued case, because there is a lack of such research in literature, to the best of our knowledge. We will work on the following two aspects of modelling high-dimensional time series.

(i) Imputation of missing data in high-dimensional time series

High-dimensional data are common in fields such as finance, healthcare, and environmental science. They are normally recorded in time order to form high-dimensional time series datasets, for example, air pollution data across many locations. But they contain inevitably missing values, which hinder application of many analytical and statistical methods. Effective handling of these gaps is therefore essential before model development. In ultra-high-dimensional settings, filling in missing entries presents significant challenges for machine learning and statistical approaches. While existing techniques (e.g. Obata et al. (2024)) only suit the low-dimensional and continuous-valued case, the research for the high-dimensional or/and discrete-valued cases is in demand recently. In this project, relationships between components (so-called network structure) and temporal dependency are used jointly to obtain accurate imputation. We will introduce a novel framework that models evolve inter-correlations through Markov regime-switching network with large number of nodes, temporal dynamics by a state-space formulation (e.g. Fan et al (2020)), dimension reduction via factor models. We will also develop self-exciting spatio-temporal models for imputation, under assumption that the imputed data follows a nested family of continuous and discrete distributions, not only normal distributions.

(ii) Machine learning architectures for accurate and robust forecasting of high-dimensional time series

For imputed data, we develop new machine learning and statistical models for forecasting high-dimensional time series. We will develop deep learning models related to temporal convolutional networks and transformers, improve existing methods (e.g. Fan et al (2020), Obata et al. (2024)) and extend them to high-dimensional and discrete-valued cases by using transformer-based architectures, factor models (ref. Liu et al. (2025), Pan and Yao (2008)) and recent advances in probabilistic and statistical hybrid approaches. We also propose dynamic uncertainty quantification, combining Bayesian inference and quantile regression to enhance robustness and achieve probabilistic forecasting (ref. Dvijotham et al. (2023)).

The objectives are: To develop machine learning architectures for high-dimensional time series modelling to improve accuracy and robustness in forecasting; To detect anomalies and impute missing data in high-dimensional time series with minimal errors.

We will validate the proposed models through real-world case studies in forecasting, anomaly detection, and decision support. Application areas include: financial forecasting and risk modelling across interconnected markets; public health monitoring across multiple regions and large healthcare systems; and trend analysis of air and water pollution across districts. Relevant datasets will be drawn from public sources, industrial partners, and healthcare collaborators to ensure sufficient high-dimensional data for model evaluation. We will show that deployment of the proposed techniques makes imputing and forecasting possible and accurate in each of the applications.

The following outcomes are expected: (i) Publications in top-tier journals and conferences. (ii)Open-source time series AI models for imputation and forecasting. (iii) Deployment-ready prototypes for selected applications.

Novel Time Series Machine Learning Methodology for High-Dimensional Data

Post My Job

Glasgow, United Kingdom