The new Meta 3D Gen can generate high-quality 3D assets from text in less than a minute

Creating 3D assets is one of the most time-consuming and challenging creative tasks. If AI assistants can generate 3D content based on text input, it will democratize 3D content creation and be immensely helpful for the video game and movie industries, as well as for developing AR and VR apps.

Meta's AI research team recently introduced Meta 3D Gen (3DGen), a new state-of-the-art system for text-to-3D asset generation. Meta claims that this new system can generate high-quality 3D assets in less than a minute. The generated 3D assets will have both high-resolution textures and material maps. 3DGen also supports physically-based rendering (PBR) and generative retexturing of previously generated 3D assets.

Meta 3D Gen combines two main components: text-to-3D generation and text-to-texture generation. Here's how it works:

Stage 1: 3D asset generation - Given a text prompt provided by the user, Stage I creates an initial 3D asset using Meta's 3D AssetGen model (AssetGen). This step produces a 3D mesh with texture and PBR material maps. The inference time is approximately 30 seconds.
Stage 2:
- Use Case 1: Generative 3D texture refinement - Given a 3D asset generated in Stage 1 and the initial text prompt, Stage 2 produces higher-quality texture and PBR maps for this asset. It utilizes Meta's text-to-texture generator, Meta 3D TextureGen. The inference time is approximately 20 seconds.
- Use Case 2: Generative 3D (re)texturing - Given an untextured 3D mesh and a prompt describing its desired appearance. Stage 2 can also generate a texture for this 3D asset from scratch (the mesh can be previously generated or artist-created). The inference time is approximately 20 seconds.

You can read the full technical paper of Meta 3D Gen here. Meta also published technical papers on the approaches they used for high-quality 3D model generation and texture generation based on text prompts that are behind the Meta 3D Gen system.

Source: Meta