Academic Jobs Logo

MBZUAI DP-Fusion: Safeguarding AI Data Privacy in UAE Innovation

UAE's MBZUAI Advances Token-Level Privacy for Secure LLMs

Be the first to comment on this article!

You

Please keep comments respectful and on-topic.

A couple of men standing next to each other
Photo by Ayano Tosin on Unsplash

Promote Your Research… Share it Worldwide

Have a story or a research paper to share? Become a contributor and publish your work on AcademicJobs.com.

Submit your Research - Make it Global News

MBZUAI's DP-Fusion Ushers in a New Era of Secure AI Inference

In the rapidly evolving landscape of artificial intelligence, ensuring data privacy has become a paramount concern, especially as large language models (LLMs) increasingly interact with sensitive information. Researchers at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) in Abu Dhabi have introduced DP-Fusion, a groundbreaking token-level differentially private inference method that safeguards user data while maintaining high model performance. This innovation not only addresses critical vulnerabilities in LLM outputs but also positions the United Arab Emirates as a frontrunner in trustworthy AI development.

DP-Fusion arrives at a pivotal moment for UAE's higher education sector, where MBZUAI, the world's first dedicated graduate university for AI established in 2019, continues to drive national ambitions under the UAE AI Strategy 2031. By blending advanced privacy mechanisms with practical utility, this research exemplifies how UAE universities are tackling global challenges head-on.

Understanding Differential Privacy in the Context of AI

Differential privacy (DP) is a mathematical framework that quantifies privacy risks by ensuring that the output of an algorithm changes negligibly whether or not any single individual's data is included in the input dataset. Defined formally, a mechanism satisfies (ε, δ)-differential privacy if, for any two adjacent datasets differing by one record, the probability of any output is bounded by e^ε with probability 1-δ.

In AI applications, particularly LLMs like GPT or Qwen, privacy risks escalate during inference when users input prompts containing personally identifiable information (PII) such as names, dates, or medical records. Traditional DP methods applied at training time, like DP-SGD, are insufficient for inference scenarios common in agentic AI systems—where models query external tools or databases. DP-Fusion bridges this gap by providing provable guarantees at the inference stage.

The Privacy Paradox in LLM Inference

LLMs excel at generating coherent text but can inadvertently leak sensitive context through paraphrasing or pattern memorization. For instance, prompting an LLM with a legal document containing client names might result in outputs where attackers recover PII via token recovery attacks or perplexity-based guessing. Existing defenses like DP-Prompt (noise on entire prompt) or DP-Decoding (sampling from noised logits) suffer from poor utility: strong privacy (low ε) yields gibberish outputs, while weak privacy fails against sophisticated attacks.

DP-Fusion resolves this paradox by operating at the token level, focusing protection on labeled sensitive tokens rather than the whole input. This granular approach yields 6x lower perplexity than baselines, making outputs readable and useful even under stringent privacy budgets.

How DP-Fusion Works: A Step-by-Step Breakdown

DP-Fusion's elegance lies in its post-hoc, training-free design. Here's the process:

  • Token Labeling: Use named entity recognition (NER) to tag sensitive tokens into privacy groups (e.g., PERSON, DATE, ORG). MBZUAI's in-house NER module achieves high accuracy on diverse entities.
  • Baseline Generation: Run the LLM on a public version of the input with all sensitive tokens masked, producing a baseline logit distribution P_public.
  • Private Runs: For each privacy group g, generate a private logit distribution P_g by including only that group's tokens.
  • Fusion and Noising: Sample next tokens by blending: with probability proportional to privacy parameters α (global) and β_g (group-specific), draw from P_public or P_g, then apply Gaussian noise calibrated to ε. The output distribution remains ε-close to the baseline, bounding sensitive influence.

This fusion mechanism ensures mathematical privacy while preserving semantic flow. A live demo at documentprivacy.com showcases real-time document sanitization.

Diagram illustrating DP-Fusion workflow from token labeling to blended output

Rigorous Experiments Validate Superior Performance

Tested on the TAB-ECHR dataset (European Court of Human Rights cases annotated for PII), DP-Fusion used Qwen2.5-7B-Instruct. Utility metrics showed perplexity of 1.42-1.46 (vs. higher for baselines), with LLM-as-judge win rates confirming naturalness. Privacy via token recovery attacks: adversaries guessing at 26-29% accuracy (near random 20%), far below non-private 80%+.

At ε=0.1 (strong privacy), DP-Fusion outperformed DP-Decoding by generating coherent paraphrases, e.g., masking "John Doe, born 01/01/1980" yields sanitized yet informative text. Code available on GitHub enables replication.

The Team Driving UAE's AI Privacy Frontier

Lead author Rushil Thareja, a PhD candidate in NLP at MBZUAI, spearheaded development alongside Assistant Professors Nils Lukas and Praneeth Vepakomma in Machine Learning, and NLP Chair Professor Preslav Nakov. Their interdisciplinary expertise—from theoretical DP proofs to practical NER—fuels this work. Presented at ICLR 2026 in Rio, the paper (arXiv:2507.04531) has sparked global interest.

MBZUAI's faculty, drawn globally, embody UAE's vision: 100% PhD-holding, with alumni at Google DeepMind and Meta. Thareja notes, "DP-Fusion formalizes safeguards essential for real-world AI trust."

Implications for Agentic AI and Beyond

In agentic systems—LLMs orchestrating tools like databases or APIs—DP-Fusion prevents cascade leaks, e.g., a healthcare agent querying patient records outputs sanitized summaries. It mitigates prompt injection (0% success at low ε) and jailbreaks, vital for UAE sectors like finance (ADGM regulations) and health (UAE Genomics).

For UAE, this aligns with Federal Law No. 45/2021 on Personal Data Protection, akin to GDPR, positioning MBZUAI as ethical AI hub.

UAE's Thriving AI Ecosystem and MBZUAI's Role

UAE invests AED 112 billion in AI by 2031, with MBZUAI central: partnerships with G42, IBM; Falcon LLM; AI Campus. Ranked top MEA AI uni (QS 2026), it graduates AI specialists amid 40% UAE GDP AI-boosted by 2031.

DP-Fusion exemplifies UAE's shift from oil to AI leadership, fostering secure innovation in smart cities, healthcare.

MBZUAI campus in Abu Dhabi symbolizing UAE's AI future

Future Outlook: Scaling Privacy in UAE AI Research

Future work: integrate with multimodal LLMs, federated learning; deploy in UAE gov apps. PyPI library (dp-fusion-lib) accelerates adoption. For students, MBZUAI's MSc/PhD programs offer hands-on trustworthy AI.

As UAE universities like Khalifa, NYUAD advance, DP-Fusion sets benchmark for privacy-preserving AI, ensuring ethical growth.

Close-up of sand dunes with soft curves

Photo by Shengnan Gao on Unsplash

Career Opportunities in UAE's AI Privacy Field

  • Research roles at MBZUAI focusing DP mechanisms.
  • Industry: G42, Bayzat applying inference privacy.
  • Academia: faculty positions in NLP/ML.

UAE's visa reforms attract global talent, with salaries AED 30k-60k/month for AI experts.

Portrait of Jarrod Kanizay

Jarrod KanizayView full profile

Founder & Job Advertising Guru

Visionary leader transforming academic recruitment with 20+ years in higher education.

Discussion

Sort by:

Be the first to comment on this article!

You

Please keep comments respectful and on-topic.

New0 comments

Join the conversation!

Add your comments now!

Have your say

Engagement level

Frequently Asked Questions

🔒What is DP-Fusion?

DP-Fusion is a token-level differentially private inference method developed at MBZUAI for protecting sensitive data in LLM outputs without retraining models.

📊How does differential privacy work in AI?

Differential privacy adds calibrated noise to outputs, ensuring no single data point influences results disproportionately, quantified by ε parameter.

🤖Why is privacy critical for LLM inference?

Inference with sensitive prompts risks PII leakage; DP-Fusion bounds token influence, vital for agentic AI in healthcare/finance.

👥Who developed DP-Fusion at MBZUAI?

Rushil Thareja (lead PhD), Nils Lukas, Praneeth Vepakomma, Preslav Nakov—experts in NLP and ML.

📈What results show DP-Fusion's effectiveness?

6x lower perplexity than baselines, near-random attack success (26-29%), accepted ICLR 2026. Paper here.

💻How to implement DP-Fusion?

Use GitHub repo and PyPI dp-fusion-lib; NER tags tokens, blend logits post-inference. Demo: Try it.

🇦🇪MBZUAI's role in UAE AI strategy?

World's first AI uni, drives UAE AI 2031 with research in privacy, vision; partnerships G42, IBM.

🏢Implications for UAE industries?

Enhances secure AI in finance, health; aligns PDP law, supports ethical AI hub ambitions.

🚀Future of DP-Fusion research?

Multimodal extension, federated integration; scalable for UAE gov apps.

🎓Study AI privacy at UAE universities?

MBZUAI MSc/PhD programs; explore research jobs.

⚖️Compare DP-Fusion to other methods?

Superior utility-privacy tradeoff vs. DP-Prompt/Decoding; token-level focus beats document-level.