PhD in Data Infrastructure and AI Workflows for Self-Driving Laboratories

Job Summary

Accelerating the discovery of clean energy materials requires autonomous experimentation environments known as Self-Driving Laboratories (SDLs). These labs depend on robust data infrastructures that support automation, reproducibility, and integration with machine learning tools. At DIFFER, we are developing such infrastructure to support a remote physical SDL dedicated to AI-driven experimentation in energy materials. The goal is to design and implement a machine-learning-ready data infrastructure to manage and structure experimental data, including data from synthesis, characterization, and analytical instruments. This involves developing standardized data schemas, processing pipelines, metadata management, and interfaces for machine learning tools. Additionally, the project includes exploring basic machine learning tasks to evaluate data quality and provide insights for future applications in energy materials research.

Responsibilities

Design and implement data pipelines to transform experimental outputs into structured, machine-learning-ready formats using standardized schemas and metadata models.
Facilitate the use of structured data by machine learning tools through appropriate access and formatting strategies.
Explore basic machine learning tasks such as trend detection, clustering, or dimensionality reduction to assess data quality and infrastructure readiness.
Collaborate with researchers in the SDL consortium to align infrastructure design with experimental workflows and project objectives.
Supervise BSc/MSc student projects when appropriate.
Contribute to scientific dissemination, including research publications, presentations at conferences, and stakeholder meetings.
Complete a PhD thesis based on the research within four years.

Qualifications and Requirements

A Master’s degree in computer science, data science, artificial intelligence, or a related field with a strong focus on data engineering or applied machine learning.
Experience with data structuring, transformation, and pipeline development, including database design (SQL/NoSQL), data preprocessing, and data integration.
Proficiency in Python programming and familiarity with relevant libraries for data handling and processing (e.g., pandas, NumPy, h5py, xarray).
Knowledge of graph databases and their application for managing and querying interconnected datasets.
Experience in applying machine learning techniques (e.g., clustering, dimensionality reduction) for insight generation and exploratory data analysis.
Awareness of FAIR data principles or experience handling simulation or experimental data in a reproducible and structured way.
Good communication skills and the ability to work effectively in a multidisciplinary and collaborative environment.
Proficiency in written and spoken English.

What the Employer Offers

This position is for 1 FTE, for a period of 4 years, and is graded in pay scale PhD (gross €2,968 per month in year 1 up to gross €3,801 per month in year 4). The position will be based at DIFFER and the working location will be at TU Eindhoven. Employees have an employee status at NWO, with benefits supporting work-life balance. DIFFER promotes a diverse workforce.

Dutch Institute for Fundamental Energy Research (DIFFER)

De Zaale 20, 5612 AJ Eindhoven, Netherlands

"PhD in Data Infrastructure and AI Workflows for Self-Driving Laboratories"

Applications Close