Awesome Spatial VLMs
Spatial Intelligence in Vision–Language Models
Official site for Awesome Spatial VLMs project, a curated resource and evaluation toolkit for spatial intelligence in Vision–Language Models (VLMs).
Highlights
- Structured taxonomy of spatial intelligence in VLMs
- 20+ datasets, 50+ benchmarks, and 120+ method papers
- Comprehensive evaluation across 37 methods
Key resources
Jump directly to the main artifacts of the project.
📄 Survey paper
Comprehensive survey PDF
TechRxiv preprint · 2025
💻 GitHub repository
Awesome-Spatial-VLMs
Project repository with papers and code
🏆 Evaluation toolkit
Evaluation code & leaderboard
Reproducible evaluation scripts
📦 Benchmarks dataset
Spatial VQA Benchmarks
Dataset resources used in survey paper
BibTeX
@article{Liu_2025,
title={Spatial Intelligence in Vision-Language Models: A Comprehensive Survey},
url={http://dx.doi.org/10.36227/techrxiv.176231405.57942913/v2},
DOI={10.36227/techrxiv.176231405.57942913/v2},
publisher={Institute of Electrical and Electronics Engineers (IEEE)},
author={Liu, Disheng and Liang, Tuo and Hu, Zhe and Peng, Jierui and Lu, Yiren and Xu, Yi and Fu, Yun and Yin, Yu},
year={2025},
month=nov
}