A comprehensive survey and evaluation toolkit for understanding how VLMs perceive and reason about spatial relationships, 3D geometry, and geometric understanding.
Everything you need to explore spatial intelligence in vision-language models
Comprehensive survey of spatial intelligence in VLMs. TechRxiv preprint 2025
Full project repository with curated papers, code implementations, and resources
Reproducible evaluation scripts and benchmark leaderboard for all 37 models
Spatial VQA benchmarks and dataset resources for evaluation and research
Browse our curated collection of papers, datasets, and benchmarks — continuously updated. Submit your work via this link .
| Main Section | Methodology | Venue | Published | Title | Institution | Paper | Code | Checkpoint |
|---|---|---|---|---|---|---|---|---|
| Loading papers... | ||||||||
| Dataset | Venue | Cognitive Level | Fundamental Task | Size | Image Source | Modality | Link |
|---|---|---|---|---|---|---|---|
| Loading datasets... | |||||||
| Benchmark | Venue | Cognitive Level | Fundamental Task | Size | Image Source | Modality | Link |
|---|---|---|---|---|---|---|---|
| Loading benchmarks... | |||||||
If you find this survey useful, please consider citing it:
@article{liu2025spatialintelligence,
title = {Spatial Intelligence in Vision-Language Models: A Comprehensive Survey},
author = {Liu, Disheng and Liang, Tuo and Hu, Zhe and Peng, Jierui and Lu, Yiren and Xu, Yi and Fu, Yun and Yin, Yu},
year = {2025},
month = nov,
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
doi = {10.36227/techrxiv.176231405.57942913/v2},
url = {http://dx.doi.org/10.36227/techrxiv.176231405.57942913/v2}
}