Spatial Intelligence in Vision–Language Models

A comprehensive survey and evaluation toolkit for understanding how VLMs perceive and reason about spatial relationships, 3D geometry, and geometric understanding.

Novel Structured Taxonomy
of Spatial Intelligence in VLMs
37 Models
Evaluated
20+Datasets; 50+ Benchmarks
Collected
120+ Methodology Papers
Curated

Key Resources

Everything you need to explore spatial intelligence in vision-language models

Research Collection

Browse our curated collection of papers, datasets, and benchmarks — continuously updated. Submit your work via this link .

Main Section Methodology Venue Published Title Institution Paper Code Checkpoint
Loading papers...
Dataset Venue Cognitive Level Fundamental Task Size Image Source Modality Link
Loading datasets...
Benchmark Venue Cognitive Level Fundamental Task Size Image Source Modality Link
Loading benchmarks...

Cite This Work

If you find this survey useful, please consider citing it:

BibTeX
@article{liu2025spatialintelligence,
    title        = {Spatial Intelligence in Vision-Language Models: A Comprehensive Survey},
    author       = {Liu, Disheng and Liang, Tuo and Hu, Zhe and Peng, Jierui and Lu, Yiren and Xu, Yi and Fu, Yun and Yin, Yu},
    year         = {2025},
    month        = nov,
    publisher    = {Institute of Electrical and Electronics Engineers (IEEE)},
    doi          = {10.36227/techrxiv.176231405.57942913/v2},
    url          = {http://dx.doi.org/10.36227/techrxiv.176231405.57942913/v2}
  }