MiniGPT-3D:

Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors 


Huazhong University of Science and Technology South China University of Technology
Corresponding author
ACM MM 2024

MiniGPT-3D takes the first step in efficient 3D-LLM, we hope that MiniGPT-3D can bring new insights to this community.

Overall



Contributions

  • We present MiniGPT-3D, an efficient and powerful 3D-LLM that aligns 3D points with LLMs using 2D priors. It is trained with 47.8M learnable parameters in just 26.8 hours on a single RTX 3090 GPU.
  • We propose an efficient four-stage training strategy in a cascaded way, gradually transferring the knowledge from 2D-LLMs.
  • We design the mixture of query experts to aggregate multiple features from different experts with only 0.4M parameters.
  • Extensive experiments show the superior performance of MiniGPT-3D on multiple tasks while reducing the training time and parameters by up to 6x and 260x, respectively.


Pipeline



Experiment Results





Dialogue Examples





BibTeX


@article{tang2024minigpt,
  title={MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors},
  author={Tang, Yuan and Han, Xu and Li, Xianzhi and Yu, Qiao and Hao, Yixue and Hu, Long and Chen, Min},
  journal={arXiv preprint arXiv:2405.01413},
  year={2024}
}

Acknowledgement

This website is adapted from MiniGPT-v2 and Nerfies, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.