OpenBind is a UK-led open-science consortium hosted at Diamond Light Source, generating massive protein-ligand structure-affinity datasets to power AI models in drug discovery.

What does the first OpenBind dataset contain?

It features 494 compounds with paired structures and affinities for EV-A71 2A protease, including 925 binding events from diverse fragment libraries, available on Zenodo .

How was the OpenBind v1 model trained?

Using the inaugural dataset on the Isambard-AI supercomputer, focusing on binding prediction to outperform baselines in pose and affinity tasks.

What role does Diamond Light Source play?

Diamond provides high-throughput X-ray crystallography via beamlines like I03 and I04-1, enabling rapid structure determination essential for the pipeline.

Which universities are involved in OpenBind?

Key partners include University of Oxford (AI and statistics), with input from Columbia University; ideal for European higher ed research collaborations.

Why is structure-affinity data crucial for AI drug discovery?

It trains models to predict how drugs bind proteins, akin to PDB for AlphaFold, reducing trial-and-error and speeding therapeutics for diseases like malaria.

How can researchers access OpenBind resources?

Dataset via Zenodo, benchmarks on GitHub ( repo ), visualization in Fragalysis, and community via Discord.

What future targets will OpenBind address?

Expanding to COVID-19, malaria, dengue, Zika, and cancer proteins, with blind challenges for model validation.

How does OpenBind impact higher education?

Provides datasets for teaching AI/pharma courses, PhD projects, and interdisciplinary training at universities like Oxford.

What career opportunities arise from OpenBind?

Roles in structural biology, AI modeling, data science for drug discovery; check research jobs in Europe for postdocs and faculty.

Is OpenBind data free to use?

Yes, under CC0 1.0 license, promoting open science for global researchers and startups.

OpenBind AI Drug Discovery Dataset from Diamond Light Sourc…

A close up of a word written in sand — Photo by Immo Wegmann on Unsplash

The release of OpenBind's inaugural public dataset and predictive AI model represents a pivotal advancement in AI-enabled drug discovery, spearheaded by the UK's Diamond Light Source facility. This milestone not only addresses a critical shortage of high-quality experimental data but also equips researchers across Europe and beyond with tools to revolutionize structure-based drug design. By providing atomic-level insights into protein-ligand interactions, OpenBind paves the way for faster, more accurate development of therapeutics targeting pressing global health challenges.

Diamond Light Source, the United Kingdom's national synchrotron science facility located at the Harwell Science and Innovation Campus in Oxfordshire, serves as the operational hub for this ambitious project. As a powerhouse for structural biology, it leverages cutting-edge X-ray crystallography beamlines to capture detailed molecular structures at unprecedented speeds. Funded initially with £8 million from the Department for Science, Innovation and Technology in 2025, OpenBind brings together structural biologists, AI specialists, and computational experts from leading institutions, including the University of Oxford's Department of Statistics and international collaborators like Columbia University.

Understanding the Data Drought in AI Drug Discovery

Traditional drug discovery has long relied on trial-and-error approaches, where chemists synthesize thousands of compounds hoping a few bind effectively to disease-causing proteins. This process is time-consuming and costly, often taking 10-15 years and billions of pounds per successful drug. Artificial intelligence promises to transform this by predicting binding affinities and poses from protein structures, much like AlphaFold revolutionized protein folding predictions using vast Protein Data Bank (PDB) datasets.

However, a major bottleneck persists: the lack of paired structure-affinity data. While the PDB holds millions of protein structures, comprehensive binding measurements—essential for training robust AI models—are scarce. OpenBind tackles this head-on, aiming to generate over 500,000 protein-ligand complexes over five years, creating the largest open dataset tailored for machine learning in structure-based drug design.

The Birth of OpenBind: From Vision to Operational Pipeline

Launched in June 2025, OpenBind emerged from recognition that synchrotron facilities like Diamond could produce structures at industrial scale if paired with automated chemistry and standardized protocols. The consortium's pipeline integrates microlitre-scale chemical synthesis, high-throughput fragment screening, and rapid affinity assays, all feeding into AI model refinement.

Key to its success is the two-way collaboration: partners contribute chemical libraries and targets, while Diamond delivers processed, FAIR-compliant (Findable, Accessible, Interoperable, Reusable) datasets. This open-science ethos ensures data flows freely, fostering community-driven improvements through blind prediction challenges.

Unpacking the First Dataset: EV-A71 2A Protease Focus

The debut dataset centers on the 2A protease from Enterovirus A71 (EV-A71), a surrogate using Coxsackievirus A16 (CVA16) 2A protease due to near-identical active sites. It encompasses 601 compounds screened from diverse fragment libraries like DSi-Poised, SpotXplorer, and FragLites, yielding 925 crystallographic binding events.

After rigorous quality control, 494 compounds remain, paired with 732 structures (265 newly released via OpenBind, plus prior PDB deposits). Affinity data from grating-coupled interferometry (GCI) on Creoptix WAVE systems provides precise IC50 values, even for weak binders exceeding 90 µM. This dataset, deposited under CC0 license, is viewable via the Fragalysis platform for interactive analysis. Diamond Light Source X-ray crystallography beamline used in OpenBind data generation

OpenBind v1: The Predictive AI Model in Action

Trained on this dataset using the UK's Isambard-AI supercomputer—one of Europe's most powerful AI clusters—OpenBind v1 predicts protein-ligand binding affinities and structures. Benchmarks available on GitHub allow researchers to evaluate its performance against baselines, demonstrating improvements in generalization across targets.

Early results highlight v1's edge in handling diverse chemical spaces, guiding hit-to-lead optimization. As more data accrues, iterative retraining will enhance accuracy, potentially slashing drug design timelines from years to months.

Behind the Scenes: The Automated Experimental Pipeline

Protein production starts with E. coli expression of His6-SUMO-tagged protease, purified via affinity and size-exclusion chromatography, then biotinylated for assays. Crystals are soaked with 50-100 mM fragments or 2-10 mM follow-ups—over 7,600 soaks in total.

Data collection at Diamond's I03 and I04-1 beamlines processes thousands weekly via automated pipelines and XChemExplorer. Hits are identified with PanDDA2, models refined in COOT/REFMAC/Buster. Affinity protocols, optimized through design-of-experiments (DoE), use HEPES pH 7 buffers with detergents like DDM for weak binders, generating 2,000+ sensorgrams. Atomic structure of protein-ligand complex from OpenBind dataset

Protein stability screening: NanoDSF tests 224 conditions.
Buffer optimization: Creoptix screens 24, long-injection tests 5.
QC: Manual sensorgram review ensures reproducibility.

Academic Collaborations Driving Innovation

European universities play a central role. The University of Oxford contributes AI expertise, with Dr. Fergus Imrie noting the dataset's role in accelerating discovery. Diamond's proximity to Oxford fosters seamless integration of experimental and computational biology.

This higher education involvement trains the next generation of researchers in interdisciplinary skills—crystallography, AI modeling, and data science—vital for Europe's competitiveness in biotech. Programs at Oxford and partner institutions now incorporate OpenBind data into curricula, preparing students for pharma careers.

Benchmarks and Community Validation

GitHub repositories host benchmarks for pose prediction and affinity regression, enabling global teams to test models blindly. Initial evaluations show OpenBind v1 outperforming generalist tools on enterovirus targets, validating its utility.

Community Discord fosters collaboration, with planned challenges mirroring CASP for proteins, ensuring models evolve through rigorous, unbiased testing.

Transformative Impacts on Drug Discovery and Research

By closing the structure-affinity data gap, OpenBind could save £100 billion in UK drug development costs alone. For neglected diseases like dengue and malaria, AI predictions prioritize promising leads, democratizing access for under-resourced labs.

In Europe, it bolsters the pharma ecosystem, from SMEs to giants like AstraZeneca, enhancing ROI on synchrotron investments. Academic researchers gain free tools to prototype inhibitors, accelerating publications and grants.

Explore the dataset on Zenodo or visualize via Fragalysis.

Future Horizons: Scaling Up for Global Challenges

Next phases target broader panels—COVID proteases, malaria kinases—scaling to thousands of structures monthly. Integration with EU initiatives like Euro-BioImaging amplifies impact across the continent.

Blind challenges will benchmark progress, while ethical AI guidelines ensure equitable benefits. For higher education, this heralds a new era of data-driven curricula, with PhD projects leveraging OpenBind for real-world impact.

Aerial view of buildings and trees in a town

Photo by Annie Spratt on Unsplash

Career Opportunities in AI Drug Discovery

This breakthrough opens doors in computational biology, structural bioinformatics, and AI ethics. European universities are ramping up programs; roles in model training, data curation, and wet-lab automation abound.

From postdocs analyzing OpenBind data to lecturers developing courses, the field demands interdisciplinary talent. Check platforms for research positions bridging academia and industry.

Understanding the Data Drought in AI Drug Discovery

The Birth of OpenBind: From Vision to Operational Pipeline

Unpacking the First Dataset: EV-A71 2A Protease Focus

OpenBind v1: The Predictive AI Model in Action

Behind the Scenes: The Automated Experimental Pipeline

Academic Collaborations Driving Innovation

Benchmarks and Community Validation

Transformative Impacts on Drug Discovery and Research

Future Horizons: Scaling Up for Global Challenges

Career Opportunities in AI Drug Discovery

AI-Enabled Drug Discovery Breakthrough: OpenBind Releases First Public Dataset and Predictive AI Model from Diamond Light Source UK

Milestone in Structure-Based Drug Design

Frequently Asked Questions

🔬What is OpenBind?

📊What does the first OpenBind dataset contain?

🤖How was the OpenBind v1 model trained?

⚛️What role does Diamond Light Source play?

🎓Which universities are involved in OpenBind?

💊Why is structure-affinity data crucial for AI drug discovery?

🔗How can researchers access OpenBind resources?

🎯What future targets will OpenBind address?

📚How does OpenBind impact higher education?

💼What career opportunities arise from OpenBind?

🆓Is OpenBind data free to use?

Casual Clinical Tutor Register - Medical Program

Casual Academics – Occupational Therapy

Chief General Neurology (RFP 202511-2077)

Professor | Associate Professor | Assistant Professor - Bond University Medical Program

Clinical Faculty - SHAMP/CMHC

Academic Discipline Coordinator - IT and Computer Science

Research Fellow - Implementation Science

Lecturer in Medical Imaging

Browse by Faculty

Browse by Subject

Trending Research & Publication News

Cognitive Learning Theories: Latest Research Papers & Insights | AcademicJobs

WHO Raises Alarm Over Rapid Spread of Rare Ebola Strain in Congo

Brazil Maintains 13th Global Position in Scientific Publications Amid Challenges

Rising Demand for Clinical Research Coordinators in Brazil's Job Market in 2026

Brazil's $4 Billion AI Healthcare Initiative: Advancing Precision Medicine

UAEU Leadership and Flexibility: Insights from the UAE Model Academic Session

UAE Dh1 Billion Space Cooperation Programme Sparks New R&D Research Initiatives

Promote Your Research… Share it Worldwide