Postdoctoral Associate: Computer Vision Department – Vision-Language Models

Applications Close: Jun 14, 2026

Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)

Residential Building (Biofuel Block) - Masdar City - SE45 05 - Abu Dhabi - United Arab Emirates

5 Star Employer Ranking

Postdoctoral Associate: Computer Vision Department – Vision-Language Models

Job Details

Mohamed bin Zayed University of Artificial Intelligence: Academic Appointments: School of Computing: Computer Vision

Description

Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) is a research-focused university in Abu Dhabi, and the first university dedicated entirely to the advancement of science through AI. The university empowers the next generation of AI leaders, driving innovation and impactful applications of AI through world-class education and interdisciplinary research.

About the role:

MBZUAI is recruiting a Research Scientist / Postdoctoral Fellow to advance research in vision-language models, foundation models, and generative systems. The position focuses on creating intelligent multimodal architectures that jointly understand, describe, and act upon the world — forming the basis of next-generation interactive AI agents.

Key Responsibilities:

Model Development and Training:

Design and train large-scale vision-language and vision-language-action models.
Develop new architecture for generative reasoning and multimodal alignment.
Establish robust training pipelines and evaluation benchmarks for vision-language interaction.
Lead experiments on multimodal data curation, pretraining, and model interpretability.

Research & Collaboration:

Conduct high-quality research with outcomes suitable for top-tier conferences (NeurIPS, CVPR, ICLR, ACL).
Collaborate with interdisciplinary teams spanning vision, language, and robotics.
Mentor junior researchers and contribute to institutional knowledge-sharing.

Innovation & Deployment:

Prototype systems that demonstrate generative and embodied intelligence.
Support deployment of efficient inference pipelines for interactive applications.
Translate research into usable tools for downstream multimodal tasks.

Qualifications

• PhD in Machine Learning, Computer Vision, NLP, or a related discipline.

Minimum:

Experience with large-scale model training using PyTorch or TensorFlow.
Solid understanding of transformer architectures and multimodal learning.
Expertise in foundation model architectures (LLMs, diffusion models, multimodal transformers).
Publications or research contributions in generative or vision-language domains

Preferred:

Experience in vision-language-action, agents or embodied AI.
Familiarity with diffusion models and large-scale multimodal datasets.
Strong programming and software engineering background.

What we offer:

A stimulating research environment at MBZUAI with leading experts in AI.
Competitive compensation and benefits package aligned with top-tier academic markets.
Significant opportunities for professional development and travel to major conferences.
A one-year appointment with the possibility of extension up to three years based on performance and project needs.

Application Instructions

Interested candidates should submit the following documents:

Cover letter
Current Curriculum Vitae (C.V.)
A link to your Google Scholar profile

Applications will be reviewed on a rolling basis, and the position will remain open until filled.

For more information or to apply, please visit MBZUAI Careers or contact Dr. Salman Khan .

Unlock this job opportunity

View more options below

View full job details

See the complete job description, requirements, and application process

16 Jobs Found

Advanced Search