One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion

CVPR 2024

1UC San Diego, 2Zhejiang University, 3Tsinghua University, 4UCLA, 5Stanford University
* Equal contribution.
^ Work done during internship at UC San Diego.
One-2-3-45++ is capable of transforming a single RGB image of any object into a high-fidelity textured mesh in under one minute . The generated meshes closely mirror the of the original input image. One-2-3-45++ can be trained with only 8 A100 GPUs.

Abstract

Recent advancements in open-world 3D object generation have been remarkable, with image-to-3D methods offering superior fine-grained control over their text-to-3D counterparts. However, most existing models fall short in simultaneously providing rapid generation speeds and high fidelity to input images - two features essential for practical applications. In this paper, we present One-2-3-45++, an innovative method that transforms a single image into a detailed 3D textured mesh in approximately one minute. Our approach aims to fully harness the extensive knowledge embedded in 2D diffusion models and priors from valuable yet limited 3D data. This is achieved by initially fine-tuning a 2D diffusion model for consistent multi-view image generation, followed by elevating these images to 3D with the aid of multi-view conditioned 3D native diffusion models. Extensive experimental evaluations demonstrate that our method can produce high-quality, diverse 3D assets that closely mirror the original input image.

Method


Starting with a single image as input, we initially produce consistent multi-view images by fine-tuning a 2D diffusion model. These multi-view images are then elevated into 3D through a pair of 3D native diffusion networks. Throughout the 3D diffusion process, the generated multi-view images act as essential guiding conditions. After extracting the 3D mesh from the denoised volume, we further enhance the texture by employing a lightweight optimization with multi-view images as supervision. Our One-2-3-45++ is capable of producing an initial textured mesh within 20 seconds and delivers a refined one in roughly one minute .

More Results (DreamFusion Prompts)


More Results (GSO dataset)


More Results (One-2-3-45 test set)


User Study


Results of a user study involving 53 participants. Each cell displays the probability or preference rate at which one method (row) outperforms another (column).

Applications


One-2-3-45++ can significantly enhance the efficiency and creativity of 3D game artists. Every 3D asset featured in the video was created by our AI.

BibTeX

   
@article{liu2023one2345++,
  title={One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion},
  author={Minghua Liu and Ruoxi Shi and Linghao Chen and Zhuoyang Zhang and Chao Xu and Xinyue Wei and Hansheng Chen and Chong Zeng and Jiayuan Gu and Hao Su},
  journal={arXiv preprint arXiv:2311.07885},
  year={2023}
}