Unified Theory of Efficient Sequential Architectures: Structured Representations, Approximation Bounds, and Scaling Laws

About the Project

Standard transformer attention mechanisms compute all pairwise interactions between sequence elements, resulting in quadratic computational complexity that becomes prohibitive for long sequences. While numerous efficient alternatives have emerged including sparse attention, state-space models, and long convolutions, most lack theoretical foundations. This project develops a unified mathematical framework for efficient attention mechanisms that operate in transformed representation spaces while maintaining formal approximation guarantees and characterizing fundamental limits.

The research unifies efficient architectures by analyzing attention computation in mathematically-motivated spaces. Sequential phenomena naturally exhibit complementary structure: spectral representations capture periodicity and multi-scale temporal patterns, while path signatures from rough path theory encode geometric trajectory properties including shape and reparametrization invariance. The framework encompasses polynomial projections underlying state-space models and learned representations discovered through end-to-end training. The core contribution derives both approximation guarantees and fundamental lower bounds, characterizing not only what efficient methods can represent but also information-theoretic impossibility results for compressed representations.

A key deliverable establishes scaling laws linking approximation error, sample complexity, and computational requirements to context length, model size, and effective memory dimension (the information capacity preserved in compressed representations). These laws provide quantitative guidance for architecture design analogous to neural scaling laws, formally characterizing when different efficiency approaches succeed or fail. The research explores hybrid architectures combining multiple representation spaces that provably preserve universality while achieving near-linear scaling.

Validation demonstrates competitive performance on long-context benchmarks across text, audio, and sensor data, emphasizing retrieval, reasoning over dispersed evidence, and robustness to irrelevant context. These evaluation criteria directly test whether compressed representations preserve task-specific information structure as predicted by theory. Applications to quantitative finance provide demanding test cases where financial time series exhibit the complementary structures (multi-scale patterns, geometric properties) that different representations target, while production trading systems require the provable guarantees this framework delivers.

The project targets NeurIPS, ICML, ICLR, and AISTATS, with domain applications suitable for quantitative finance journals and conferences.

Unified Theory of Efficient Sequential Architectures: Structured Representations, Approximation Bounds, and Scaling Laws

Post My Job

Bristol, United Kingdom