What is Falcon Perception?

Falcon Perception is a 0.6B parameter early-fusion multimodal AI model from TII Abu Dhabi for open-vocabulary referring expression segmentation, processing images and text in a unified dense transformer. 70

How does Falcon Perception outperform SAM 3?

On SA-Co benchmark, it scores 68.0 Macro-F1 vs SAM 3's 62.3, with stronger gains in attributes and dense scenes. PBench shows +14.2 in Dense split. 73

What is the architecture of Falcon Perception?

Unified dense autoregressive transformer with hybrid attention: bidirectional for images, causal for text. Chain-of-perception: , , tokens. 95

What is PBench benchmark?

TII's new diagnostic for compositional prompts: L0-L4 levels plus Dense. Tests attributes, OCR, spatial, relations in crowded scenes.

Tell me about Falcon-OCR

0.3B companion model for document OCR, 80.3% olmOCR, 88.64% OmniDocBench. Handles handwriting, formulas, tables with layout awareness.

Is Falcon Perception open source?

Yes, Apache 2.0 on Hugging Face and GitHub with PyTorch/MLX inference, vLLM server. 72

What applications suit Falcon Perception?

Robotics (language-guided manipulation), manufacturing inspection, document processing, UAE sovereign AI deployments.

How was Falcon Perception trained?

54M images, three stages: listing, alignment, long-context. Hard negatives, ensemble consensus.

What is TII's role in UAE AI?

Abu Dhabi's research institute driving Falcon series, sovereign AI for robotics, vision.

Where to read the Falcon Perception paper?

Full technical report on arXiv . 95

Future plans for Falcon models?

TII expanding multimodal capabilities, collaborations like NVIDIA lab for UAE robotics.

Falcon Perception TII Abu Dhabi Beats SAM 3

a close up of a typewriter with a paper on it — Photo by Markus Winkler on Unsplash

UAE's TII Unveils Falcon Perception: Revolutionizing Multimodal AI with Efficiency

The Technology Innovation Institute (TII) in Abu Dhabi has launched Falcon Perception, a groundbreaking multimodal artificial intelligence (AI) model that integrates vision and language processing in a single, streamlined architecture. Announced on March 31, 2026, this 0.6 billion parameter model marks a significant advancement in open-vocabulary referring expression segmentation (RES), where AI identifies and segments objects in images based on natural language descriptions like "the red car on the left" or "count the soup tins." By outperforming Meta's Segment Anything Model 3 (SAM 3) on key benchmarks, Falcon Perception positions the United Arab Emirates (UAE) as a leader in sovereign AI development.

This launch extends TII's Falcon family, previously known for language models like Falcon 3 and Falcon-H1 Arabic, into dense visual perception. In a compute-constrained era, Falcon Perception challenges multi-stage pipelines, proving a unified dense transformer can handle complex tasks efficiently for robotics, manufacturing, and document processing in the UAE's burgeoning tech ecosystem.

Understanding Multimodal Perception: From Vision Backbones to Unified Transformers

Traditional perception systems rely on modular designs: a vision backbone extracts features, followed by separate decoders for tasks like segmentation or detection. Referring Expression Segmentation (RES) specifically grounds natural language queries to pixel-level masks, crucial for human-robot interaction. Segment Anything Model 3 (SAM 3), Meta's latest, excels in zero-shot segmentation but uses late-fusion, limiting dense scene handling.

Falcon Perception rethinks this with early-fusion: image patches and text tokens enter a shared parameter space from layer one. A hybrid attention mask—bidirectional for images (global context) and causal for text (autoregressive generation)—enables variable-length instance outputs without fixed queries. The chain-of-perception interface sequences predictions: <coord> for centers, <size> for extents, <seg> for masks via dot-product with upsampled features. This lightweight design scales to crowded scenes with hundreds of objects.

Diagram illustrating Falcon Perception's unified dense transformer with hybrid attention and chain-of-perception decoding

The Falcon Legacy: TII's Journey in Open-Source AI

TII, Abu Dhabi's applied research powerhouse under the Advanced Technology Research Council (ATRC), has built the Falcon series since 2023. Early models like Falcon 40B topped open LLM leaderboards, followed by Falcon 2 11B VLM (vision-language), Falcon 3 (1B-10B SLMs), Falcon-H1 Arabic (3B-34B), and Falcon Mamba 7B (state-space architecture). Falcon Perception shifts to vision-centric multimodal, reinforcing UAE's open AI strategy amid global closed models.

Dr. Najwa Aaraj, TII CEO, emphasized: "Falcon Perception advances practical AI for industries while bolstering sovereign capabilities." This aligns with UAE's UAE Centennial 2071 vision for tech self-reliance.

Training Falcon Perception: Curated Data and Three-Stage Pipeline

Trained on 54 million images with 195 million positives and 488 million hard negatives, data emphasized uniform coverage via DINOv3 clustering, VLM-generated descriptions, and ensemble consensus (SAM 3, Qwen3-VL-30B). A 1:1 positive-negative ratio combats hallucinations. Three stages: in-context listing (full causal), task alignment (isolated queries), long-context finetuning (600-token limit). Initialized via multi-teacher distillation from DINOv3 and SigLIP2.

Hierarchical clustering ensures diverse concepts.
Negative mining for semantic/visual challenges.
Human verification for edge cases.

Superior Benchmarks: Outpacing SAM 3 and Larger VLMs

Falcon Perception shines on SA-Co (open-vocabulary RES): 68.0 Macro-F1 vs SAM 3's 62.3, with gains in attributes (+8.2), food/drink (+12.2). Presence calibration lags (MCC 0.64 vs 0.82), but overall mask quality leads.

TII's new PBench probes levels: L0 (basic), L1 (attributes), L2 (OCR), L3 (spatial), L4 (relations), Dense (crowded). Falcon dominates Dense (72.6 vs SAM 3 58.4, Qwen3-VL-30B 8.9).

PBench Level	Falcon Perception	SAM 3	Qwen3-VL-30B
L0 (Simple)	65.1	64.3	-
L1 (Attributes)	63.6	54.4	-
L2 (OCR)	38.0	24.6	-
L3 (Spatial)	53.5	31.6	-
L4 (Relations)	49.1	33.3	-
Dense	72.6	58.4	8.9

Explore detailed results in the arXiv paper.

PBench benchmark comparison table showing Falcon Perception outperforming SAM 3 across levels

Falcon-OCR: Compact Companion for Document Intelligence

Paired with Falcon-OCR (0.3B params), trained on PDFs, scans, handwriting, formulas. Achieves 80.3% on olmOCR (leads open-source), 88.64% OmniDocBench. Supports LaTeX/HTML outputs, layout-aware via PP-DocLayoutV3. High throughput: 2.9 img/s on A100.

Open-Source Release: Demos, Code, and Accessibility

Apache 2.0 licensed on Hugging Face and GitHub. Features PyTorch/MLX inference, vLLM server, Streamlit demos. Runs on H100/A100 GPUs or Apple Silicon.

Real-World Applications: Powering UAE's Robotics and Industry

In UAE's robotics push (e.g., TII-NVIDIA lab), Falcon enables natural-language instructions for manipulation. Manufacturing: defect detection; infrastructure: visual inspection. Supports sovereign AI, reducing foreign model reliance.

Community Buzz: Trending Reactions on X

Launch trended on X with posts from TII's Yasser Dahou (491 likes), Hugging Face, and AI enthusiasts praising SAM 3 outperformance. "Kudos from day 1," noted LocalLLaMA Reddit.

Future Horizons: Scaling Multimodal AI in Abu Dhabi

TII plans expansions, leveraging UAE's AI ecosystem. Dr. Hakim Hacid: "Opening doors to scalable multimodal systems." Implications for global research: simpler architectures via better data/training.

Implications for UAE Research and Higher Education

As TII collaborates with UAE universities, Falcon Perception boosts computer vision research, attracting talent to Abu Dhabi. Enhances job markets in AI, aligning with UAE's tech vision.

Photo by Karl Solano on Unsplash

UAE's TII Unveils Falcon Perception: Revolutionizing Multimodal AI with Efficiency

Understanding Multimodal Perception: From Vision Backbones to Unified Transformers

The Falcon Legacy: TII's Journey in Open-Source AI

Training Falcon Perception: Curated Data and Three-Stage Pipeline

Superior Benchmarks: Outpacing SAM 3 and Larger VLMs

Falcon-OCR: Compact Companion for Document Intelligence

Open-Source Release: Demos, Code, and Accessibility

Real-World Applications: Powering UAE's Robotics and Industry

Community Buzz: Trending Reactions on X

Future Horizons: Scaling Multimodal AI in Abu Dhabi

Implications for UAE Research and Higher Education

TII Abu Dhabi Launches Falcon Perception: New Unified Dense Transformer Outperforms Meta's SAM 3 on Key Benchmarks

UAE Advances Sovereign AI with Falcon Perception's Vision-Language Breakthrough

Frequently Asked Questions

🔍What is Falcon Perception?

📈How does Falcon Perception outperform SAM 3?

⚙️What is the architecture of Falcon Perception?

🧪What is PBench benchmark?

📄Tell me about Falcon-OCR

💻Is Falcon Perception open source?

🤖What applications suit Falcon Perception?

🎓How was Falcon Perception trained?

🇦🇪What is TII's role in UAE AI?

📖Where to read the Falcon Perception paper?

🚀Future plans for Falcon models?

Browse by Faculty

Browse by Subject

Junior Research Scientist or Engineer

Assistant/Associate/Full Professor in Computer Science-Quantum Computing and Artificial Intelligence

Post-Doctoral Associate in the Center for Interdisciplinary Data Science and Artificial Intelligence

Postdoctoral Associate in Theoretical Foundations of Data Science and AI

Post-Doctoral Associate or Research Associate

Junior Research Scientist

Human-Computer Interaction - Open Rank Faculty

Computer Science - Open Rank Faculty Positions

Why Is My Dog Eating Grass? Understanding This Common Behavior

How to Prepare for the TOEFL Test: Proven Strategies for University Aspirants Worldwide

Why Does My Eye Keep Twitching? Common Causes and Relief Strategies

Why Does My Eye Keep Twitching? What Research Reveals About This Common Annoyance

Historic Discoveries That Have Defined Aboriginal Art in Australia

Mubadala and WHOOP Launch Groundbreaking UAE Health Research Initiative for Performance Science

Promote Your Research… Share it Worldwide