Research Associate (Data Engineering)
Job Description
We are seeking a skilled and motivated Research Associate to join our environmental informatics team. In this role, the candidate will build and maintain the data infrastructure that underpins our environmental monitoring and early-warning systems. The work will involve diverse, high-volume data streams — including rainfall records, temperature sensors, radar imagery, and computer vision outputs — to deliver a unified, query table, and secure data platform that drives research, operational decision-making, and stakeholder dashboards.
Key Responsibilities
- Environmental Data Platform
- Design, build, and maintain a unified database to ingest and store diverse environmental data streams: rain gauge records, gridded temperature data, rainfall radar (e.g. OPERA, NEXRAD), satellite imagery, and computer vision model outputs.
- Define and enforce common data schemas and ontologies across heterogeneous source formats (NetCDF, HDF5, GeoTIFF, CSV, JSON, REST/API feeds).
- Implement scalable ingestion pipelines supporting real-time streaming and batch historical loads.
- Ensure data traceability with robust metadata, provenance tracking, and versioning.
- Data Processing & Quality Assurance
- Develop and maintain automated pipelines for data cleaning, outlier detection, and quality flagging.
- Implement missing-data imputation methods appropriate to environmental time-series and spatial fields (e.g. interpolation, climatological fill, ML-based gap-filling).
- Apply noise-removal algorithms (e.g. signal filtering, radar clutter suppression, spike detection) across sensor and remote-sensing data types.
- Document processing logic and maintain reproducible workflow configurations.
- Visualisation & Dashboards
- Design and develop interactive dashboards for operational and research users, displaying spatial maps, time-series plots, and aggregated statistics.
- Integrate visualisation tools (e.g. Grafana, Superset, Plotly Dash, or custom web front-ends) with the data backend.
- Collaborate with domain scientists to translate monitoring requirements into effective visual analytics.
- Ensure dashboards remain performant and responsive under live data load.
- Data Security & Governance
- Implement and maintain role-based access control (RBAC) for all data assets.
- Enforce data encryption at rest and in transit; manage secrets and credentials securely.
- Support compliance with relevant data governance policies and institutional data-sharing agreements.
- Maintain audit trails and access logs; respond to security reviews and risk assessments.
- Infrastructure & Operations
- Manage cloud or on-premise database services (e.g. PostgreSQL/PostGIS, TimescaleDB, InfluxDB, or equivalent); tune for time-series and geospatial query performance.
- Maintain CI/CD pipelines for data pipeline code; apply version control best practices.
- Monitor pipeline health, set up alerting for failures, and respond to incidents.
- Contribute to infrastructure-as-code practices (Docker, Kubernetes, Terraform or equivalent)
Job Requirements
- Possess at least a Master’s degree in computer science, Data Engineering, Environmental Informatics, or a closely related field.
- 3–6 years of professional experience in data engineering, with demonstrable work on time-series or geospatial data.
- Proficiency in Python (pandas, NumPy, xarray, or similar) and SQL; experience with at least one workflow orchestration tool (Airflow, Prefect, Luigi, etc.).
- Hands-on experience with geospatial or scientific data formats: NetCDF, HDF5, GeoTIFF, GeoJSON, or similar.
- Working knowledge of relational and time-series databases, with practical experience in data modelling.
- Familiarity with cloud platforms (AWS, GCP, or Azure) and containerisation (Docker).
- Solid understanding of data security principles: encryption, RBAC, secrets management.
- Open to fixed-term contract.
Find Your Best Opportunity
Tell them AcademicJobs.com sent you!

