demonstration

3D Creation at Your Fingertips: From Text or Image to 3D Assets

Authors:
Yang Chen

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China

0009-0001-9058-5051
View Profile

,
Jingwen Chen

Sun Yat-sen University, Guangzhou, China

Sun Yat-sen University, Guangzhou, China

0000-0002-7917-6003
View Profile

,
Yingwei Pan

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China

0000-0002-4344-8898
View Profile

,
Xinmei Tian

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China

0000-0002-5952-8753
View Profile

,
Tao Mei

HiDream.ai Inc., Beijing, China

HiDream.ai Inc., Beijing, China

0000-0002-5990-7307
View Profile

MM '23: Proceedings of the 31st ACM International Conference on MultimediaOctober 2023Pages 9408–9410https://doi.org/10.1145/3581783.3612678

Published:27 October 2023Publication History

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 9408–9410

ABSTRACT

We demonstrate an automatic 3D creation system, which can create realistic 3D assets solely from a text or image prompt without requiring any specialized 3D modeling skills. Users can either describe the object they envision in natural language or upload a reference image that records what they have seen with the phone. Our system will generate a high-quality 3D mesh that faithfully matches the users' input. We propose a coarse-to-fine framework to achieve this goal. Specifically, we first obtain a low-resolution mesh instantly by utilizing a pre-trained text/image conditional 3D generative model. Using such coarse mesh as the initialization, we further optimize a high-resolution textured 3D mesh with fine-grained appearance guidance from large-scale 2D diffusion models. Our system can create visually-pleasing results in minutes, which is significantly faster than existing methods. Meanwhile, the system ensures that the resulting 3D assets are precisely aligned with the input text or image prompt. With these advanced capabilities, our demonstration provides a streamlined and intuitive platform for users to incorporate 3D creation into their daily lives.

References

Yang Chen, Yingwei Pan, Ting Yao, Xinmei Tian, and Tao Mei. 2019a. Animating Your Life: Real-Time Video-to-Animation Translation. In ACM MM Demo.Google Scholar
Yang Chen, Yingwei Pan, Ting Yao, Xinmei Tian, and Tao Mei. 2019b. Mocycle-gan: Unpaired video-to-video translation. In ACM MM.Google ScholarDigital Library
Heewoo Jun and Alex Nichol. 2023. Shap-e: Generating conditional 3d implicit functions. arXiv preprint arXiv:2305.02463 (2023).Google Scholar
Yehao Li, Ting Yao, Yingwei Pan, and Tao Mei. 2022. Contextual transformer networks for visual recognition. IEEE TPAMI (2022).Google ScholarCross Ref
Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. 2023. Magic3D: High-Resolution Text-to-3D Content Creation. In CVPR.Google Scholar
Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, and Carl Vondrick. 2023. Zero-1-to-3: Zero-shot One Image to 3D Object. ArXiv, Vol. abs/2303.11328 (2023).Google Scholar
Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin, and Mark Chen. 2022. Point-E: A System for Generating 3D Point Clouds from Complex Prompts. arXiv preprint arXiv:2212.08751 (2022).Google Scholar
Yingwei Pan, Zhaofan Qiu, Ting Yao, Houqiang Li, and Tao Mei. 2017. To create what you tell: Generating videos from captions. In ACM MM.Google Scholar
Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. 2023. Dreamfusion: Text-to-3d using 2d diffusion. In ICLR.Google Scholar
Tianchang Shen, Jun Gao, Kangxue Yin, Ming-Yu Liu, and Sanja Fidler. 2021. Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. Advances in Neural Information Processing Systems, Vol. 34 (2021), 6087--6101.Google Scholar
Ting Yao, Yehao Li, Yingwei Pan, Yu Wang, Xiao-Ping Zhang, and Tao Mei. 2023. Dual vision transformer. IEEE TPAMI (2023).Google ScholarDigital Library
Ting Yao, Yingwei Pan, Yehao Li, Chong-Wah Ngo, and Tao Mei. 2022. Wave-vit: Unifying wavelet and transformers for visual representation learning. In ECCV.Google Scholar

Index Terms

3D Creation at Your Fingertips: From Text or Image to 3D Assets
1. Information systems
  1. Information systems applications
    1. Multimedia information systems
      1. Multimedia content creation

Recommendations

3D Character Model Creation from Cel Animation
CW '04: Proceedings of the 2004 International Conference on Cyberworlds

When creating a cel animation, the animators often use 3D character models to add some effects on the character or to generate intermediate images between the key frames. However, it is a troublesome and time-consuming task to create a 3D model. In this ...
Read More
3D puppetry: a kinect-based interface for 3D animation
UIST '12: Proceedings of the 25th annual ACM symposium on User interface software and technology

We present a system for producing 3D animations using physical objects (i.e., puppets) as input. Puppeteers can load 3D models of familiar rigid objects, including toys, into our system and use them as puppets for an animation. During a performance, the ...
Read More
Creating a 3D Animated CGI Short: The Making of the Autiton Archives Fault Effect - Pilot Webisode
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '23: Proceedings of the 31st ACM International Conference on Multimedia
October 2023
9913 pages
ISBN:9798400701085
DOI:10.1145/3581783
General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA
Copyright © 2023 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 October 2023
Check for updates
Author Tags
cross-modal generation
image-to-3d
text-to-3d
Qualifiers
- demonstration
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 136
  Total Downloads
- Downloads (Last 12 months)136
- Downloads (Last 6 weeks)18
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

3D Creation at Your Fingertips: From Text or Image to 3D Assets

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

3D Character Model Creation from Cel Animation

3D puppetry: a kinect-based interface for 3D animation

Creating a 3D Animated CGI Short: The Making of the Autiton Archives Fault Effect - Pilot Webisode