Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors

Lihe Ding*^1,4, Shaocong Dong*², Zhanpeng Huang³, Zibin Wang†³,
Yiyuan Zhang¹, Kaixiong Gong¹, Dan Xu², Tianfan Xue†¹
¹The Chinese University of Hong Kong, ²The Hong Kong University of Science and Technology,
³SenseTime, ⁴Shanghai AI Laboratory

Abstract

Most 3D generation research focuses on up-projecting 2D foundation models into the 3D space, either by minimizing 2D Score Distillation Sampling (SDS) loss or fine-tuning on multi-view datasets. Without explicit 3D priors, these methods often lead to geometric anomalies and multi-view inconsistency. Recently, researchers have attempted to improve the genuineness of 3D objects by directly training on 3D datasets, albeit at the cost of low-quality texture generation due to the limited texture diversity in 3D datasets. To harness the advantages of both approaches, we propose Bidirectional Diffusion (BiDiff), a unified framework that incorporates both a 3D and a 2D diffusion process, to preserve both 3D fidelity and 2D texture richness, respectively. Moreover, as a simple combination may yield inconsistent generation results, we further bridge them with novel bidirectional guidance. In addition, our method can be used as an initialization of optimization-based models to further improve the quality of 3D model and efficiency of optimization, reducing the process from 3.4 hours to 20 minutes. Experimental results have shown that our model achieves high-quality, diverse, and scalable 3D generation.

@article{ding2023text, title={Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors}, author={Ding, Lihe and Dong, Shaocong, and Huang, Zhanpeng, and Wang, Zibin and Zhang, Yiyuan and Gong, Kaixiong and Xu, Dan and Xue, Tianfan}, journal={arXiv preprint arXiv:2312.04963}, year={2023}, }

Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors

Generatived 3D Objects

"An eagle head."

"A GUNDAM robot."

"A Nike sport shoes."

"A house in Van Gogh style."

Abstract

Decoupled Geometry and Texture Control

More Results

"An ancient Chinese tower."

"An ancient Gothic tower."

"A strong muscular man."

"A blue and white porcelain teapot."

"A black and white cow."

"A black and white cow style elephant."