Generative Artificial Intelligence (AI) and AI more broadly, has launched a new technology hype cycle that, on the surface, appears to have rather unceremoniously ended during the previous metaverse hype cycle. AI, however, is a critical enabling technology of a future metaverse, from networking to data/workflow management and content creation. AI should not be viewed as a “metaverse replacement,” but rather as a critical pillar of that future, regardless of what that future is called; metaverse is simply a convenient term to codify pre-existing trends with a vision of the future.
Elements of the metaverse, namely the transition from Two-Dimensional (2D) to Three-Dimensional (3D) interfaces and the merging of virtual and real/physical worlds, will continue to play an increasingly stronger role in the consumer, enterprise, and industrial markets. To get there, 3D content generation needs to be highly accessible, which is where generative AI will play a critical role.
While much of the hype around generative AI for content creation has focused on other types of media (text, images, video, music), the next step will be making the creation of 3D assets and spaces more accessible. One could draw a parallel between AI’s impact on photos and video to 3D. Digital photo editing (and later, video production) were once reserved for professionals and extreme hobbyists/prosumers, and now anyone with a smartphone can make edits and customizations to photos and videos at the touch of the screen. Accessibility of these content creation tools enabled the rise of social media influencers and YouTubers, which brought about a whole new category of video and social content.
3D content generation with generative AI already has a growing role in commercial markets (design, advertisers, etc.), but it is still early for broader accessibility. The metaverse will require significantly more 3D content and accessible tools to populate the virtual spaces and engender a vibrant marketplace for 3D assets to span both the virtual and real-world spaces.
While this blog posting focuses on generative AI’s role in 3D content creation, it (and AI more broadly) will also play significant roles across the consumer, enterprise, and industrial markets. Within the media and entertainment space alone, the list extends to personalization of content, digital humans that could serve as video game and virtual world Non-Playable Characters (NPCs), workflow management and optimizations, video compression and transcoding efficiencies, and more.
For a bigger picture of the market opportunities for generative AI, here are some suggested reports:
How Generative AI Models Work for Content Creation
Generative AI-based content creation is done through either Generative Adversarial Networks (GANs) or diffusion models. While GANs can create entirely new content, such as virtual replicas of products for the fashion industry, diffusion models are more practical and, thus, have earned more interest recently.
GANs create content samples with a generator and leverage a discriminator to create image-based content. This approach ensures that the asset samples are not fabricated. When everything goes well, this can effectively turn real-world samples of objects, people, etc., into images, but that’s not always the case. This approach to generative AI is prone to problems like model collapse, lower diversity of output, and a high level of complexity in training the models.
Diffusion models enable content creation by applying noise (e.g., Gaussian noise) to datasets before recovering the data by reconstructing or denoising the inputs. Once the AI models are trained, they can be used for new data generation. ABI Research assesses that, due to GANs’ limited range, diffusion models are more proficient at producing outputs that accurately match the distribution of real-world asset counterparts. This translates to a generative AI solution with top-notch results concerning content.
3D Content Enablement through Generative AI
From DALL·E to Stable Diffusion, several notable text-to-image AI models have hit the market in the last couple of years. These generative AI-based content tools allow users to turn text, or sometimes voice prompts, into accurate images. The companies behind AI-generated images are also prospecting text-to-3D generative AI models. The bullet list below touches upon some of the latest developments with generative AI solutions that enable 3D content creation.
- OpenAI: In December 2022, OpenAI released its 3D model generator called Point-E. Using point clouds converted to meshes, Point-E generates 3D models in just 1 to 2 minutes using a single NVIDIA V100 Graphics Processing Unit (GPU). The two models leveraged for Point-E are a text-to-image model that translates text/words into images and an image-to-3D model trained on pictures paired with 3D objects.
- Google: Dream Fields, announced in 2021, was Google’s first generative 3D AI system. It leans on OpenAI’s neural network Contrastive Language-Image Pre-training (CLIP) to generate 2D images from text-based prompts, as well as a neural network Neural Radiance Field (NeRF) to convert those 2D images into 3D assets. Google has since updated this generative AI solution in the form of Dream Fusion, which has similar 3D content generation capabilities as Point-E, but without the need for vigorous 3D training sets. For immersive Google Maps features, Google is currently trialing NeRF to construct 3D representations of landmarks and areas of select cities.
- NVIDIA: As a pioneer in the generative AI space, it comes as no surprise that NVIDIA is probing the feasibility of generative AI and neural graphics being able to generate 3D assets with NVIDIA Instant NeRF. Moreover, the tech giant is looking at how generative AI and neural graphics bring 3D content to life with physics and animation.
- Roblox: Generative AI is at the heart of Roblox’s intentions to make 3D modeling accessible to less experienced users. This technology is a cornerstone of Roblox Studio as it will promote low-code 3D content creation solutions. Consequently, it will allow developers to construct virtual material using Natural Language Processing (NLP) prompts and to create code via text inputs.
- Adobe: For Adobe, generative AI is a common theme throughout its content creation solutions, such as Sensei and Firefly. While marketers and content creators can create 3D assets and materials using real-life pictures through Adobe Substance 3D Sampler, more robust solutions are available that can mirror real-world objects and people with higher accuracy and detail. While still early, Adobe has been the most significant player in the content creation market for generative AI.
Addressing the Hindrances to Generative AI Adoption
Extended Reality (XR) and similar growth factors, along with greater accessibility to generative AI-based content creation tools, will act as catalysts for the 3D content market. With more content creation tools at the disposal of developers, 3D content creation is a realistic scenario for a much broader audience. Generative AI should be viewed as a key driver, but the market, including regulations and best practices, needs significant development before more heavily-driven AI content creation is fully commercially viable.
Before widespread use takes place, particularly in the enterprise segment, there is a long road ahead for the tech industry and world governments to address the social, ethical, and security concerns surrounding generative AI. More specifically, enterprises worry about how generative AI models use/collect/share data, possibly infringing upon copyrights/Intellectual Property (IP). The table below outlines the direst issues for generative AI-based content creation, as well as ABI Research’s recommendations for overcoming these challenges.
Table 1: Key Issues with Content Created with Generative AI and Recommendations
(Source: ABI Research)
Risk of generating content that infringes on intellectual property; while studies have reported low rates of generative AI models producing content that matches training inputs, it is a potential concern if training sets include copyright-protected content.
Until precedents are in place to more clearly define derivative work under intellectual property laws (in the context of generative AI-created content), the best solution is to use in-house assets (if available) to train models—Adobe has employed this strategy to avoid potential issues. For those companies without large libraries of assets, training data/assets should be sourced from free repositories or licensed.
With content created using generative AI models, content rights and ownership can be difficult to establish (if at all). Requirements such as "human authorship" are often not enough to preclude the use of AI tools like generative AI models, and it is unclear what level of human input is necessary to copyright creative works. Relatedly, it is not yet clear who would be able to claim ownership of said generated content: the company offering the AI model/content creation service, the user, or the rights holders of the data/assets used to train the model.
Given the uncertainties surrounding content rights and works created with
generative AI, the best way to use this content is as part of a larger workflow
where the created content is not the end product. Companies also need to ensure the generated content does not infringe on any intellectual property rights (see above) if licensed (or un-licensed) content is used. If in-house/owned data is used for training and the outputs are created by the same entities, then these assets could be suitable for 3D marketplaces and more likely to be copyrightable.
With generative AI models that employ adaptive learning based on human-generated data/interactions, the risks are high for corrupted outputs that stray far from the desired results. It’s the proverbial "garbage in, garbage out" problem.
Training datasets either need to be sanitized before use or stricter filters and protections put in place to limit what types of information are allowed to be used for training purposes. Safeguards that more closely monitor output for quality are also advisable when external sources of data are used for training.
This content was derived from ABI Research’s 3D Content Creation and Enabling Technologies research report, part of the company’s Metaverse Markets & Technologies Research Service. Learn more about the future of content generation, notably in the buildup to the metaverse, by downloading the report today. You can also get a free sample of the report by reading the accompanying Research Highlight.