In the previous post about building metaverse with AI (link), I discussed the generic points of view, what we need and all the stuff like that. In this post, I am going to discuss about AI models we have which can be helpful in order to build the metaverse using AI and also possible pipelines.
Also, remember that in this particular post, I only will be discussing the AI models which I think can be helpful in building a virtual universe. So if your favorite AI isn’t in the list, accept my apologizes.
AI models for Metaverse
First of all, I think for building a virtual universe or metaverse using AI, we need these models:
- Image generation models: These models will help us build everything imaginable. These are essential in pretty much every AI art project and of course, very useful in order to make the concept of our supposed Metaverse.
- Music/SFX generation models: Imagine walking in a jungle. The landscape is pictured in your mind right? Now go a little deeper. You hear the sounds in your head, too. This is what we call soundscape in Ambient or minimalistic music (as I wrote about it before). Now let’s consider a metaverse we’re building, right? This newly made universe needs sounds. Without sounds, metaverse doesn’t mean anything. We need AI models in order to generate music, sounds and soundscapes for us.
- Vision Language Models: These are important as well. In building the metaverse, we need everything to be as automated as possible. Basically, we need the matrix but in a good way. So a vision model can easily analyze a scene and generate respective prompts for sound generators.
- 3D Generation models: And the question is why not? We try to make a complete 3D universe and we need to make 3D objects which let people make their desired universe, right? With AI, this will be a reality.
Now, let’s dive a little more in depth and look at what models we have access to!
Image Generators
If you ask me, this is the easiest type of model to find for this particular project. We have tons of proprietary options such as Dall-E 3 or Midjourney or even FLUX Pro. Which are all considered the best in the business.
In the open source side, we’ve got Mann-E, Stable Diffusion and other useful models as well, right? This means with a small search on the web, we can find out the best way of visualizing our dreams of a made-up universe.
Also, due to my research about different models and hosting services, hosting models on replicate or modal is very easy. For other types of hosting we may explore possibilities on CivitAI or Runware as well.
Music and Sound Effects generators
This is also not a rare thing. Although I am not really familiar with the music generation space and I only know Stable Audio LM and Meta’s Music Gen in open space, and Suno AI in proprietary space, I guess we already have the best in the business.
Vision Models
Well, I personally use Open Router to find out about the possibilities of these models, and being honest, the best model I could find for vision task was nothing but GPT-4o.
Although there are good vision models out there, but most of them are very generic or very specific and GPT-4o is right at the middle. We can use this model in order to describe different scenes in our metaverse. Also, we may utilize this model in order to be a guide through the metaverse or just help us build 3D objects or soundscapes.
3D Generation Models
Well these models are currently the rarest models in the list. We may need two approaches for this specific task:
- Text to 3D: very similar to text to image, you just describe your scene or object, and get the 3D object. Although it may be a little buggy, but it will be a fun experiment to implement a model or pipeline for text to 3D. It will help the residents of our metaverse to generate assets of their choice as easy as typing what they have in their minds.
- Image to 3D: This is also a possibility. Currently, I use TripoSR a lot for making different 3D objects, but I still couldn’t find the best input images or the best settings or hyper-parameter tuning for getting the best results.
With 3D generators, our workflow will become much much easier than what you may think. So we need another step, right?
What’s next?
Well, in the previous post we discussed the whole idea of metaverse and what we need to build one. In this one, we just discovered the AI tools we may be able to utilize. The next will be a study on how we can make a metaverse AI model at all.
It will be the most challenging part of the project, but in my honest and unfiltered opinion, it is also the best part!