Let’s build Metaverse with AI: Building asset generator

Look at this:

How do you think this apple has been made? Excellent question. After the previous post, I said we should put LLMs out of the picture for now. Also we needed to talk about 3D, because it is important in whole metaverse space, right? Today I just did it. I trained a LoRA on FLUX and then tried to make 3D objects from what an AI model is capable of generating.

The Image Generator

In this part, I specifically talk about the image generation procedure. It will be a good experience sharing procedure and the open source models created in this process will be linked in the topic as well.

For making an image generator model, we need a base model. Since the whole Generative Metaverse project for me was a fun project and not a serious commercial one, I chose FLUX. However, if I try to go to the blockchain/crypto side of things (probably on TON network) I may consider SDXL as base in order to have no problems in terms of commercial use.

Anyway, everything here is pretty standard. Pretty much every step I took in order to make early versions of Mann-E. So I guess it will be worth sharing one more time, right?

The Dataset

AI models are just a bunch of boring mathematical functions and they become amazing when they are fed with good data. So we needed to create a dataset. As always, the best data generator I could use was Midjourney and of course, I headed over to their website and recharged my account.

I played with a good bunch of prompt combinations to find what is the best one fitting what I have in mind. So after tweaking a lot, I got this: <subject>, lowpoly, 3d illustration, dark background, isometric camera angle.

Here is a sample of what generated with this prompt formula:

After that, I used ChatGPT in order to generate a list of objects we may use or see everyday. After that, I made a prompt list and automated the image generation procedure and got around 800 pictures. Now it was time for training!

The training

First, I was thinking about using Replicate or fal.ai in order to train the LoRA. Honestly they provide easy and affordable ways of training LoRA on FLUX (and to my knowledge, you also may be able to have SD 1.5 and SDXL LoRA’s trained on replicate) but there is one big problem.

These websites are usually not suitable for large scale training or if they offer large scale training systems, you should negotiate with them and as I said, this is a fun project. Not a big OpenAI scale commercial product!

So I was looking for another way. As you may know, Google Colab’s free tier subscription is also no good for FLUX training. So I used AI Toolkit template on RunPod in order to train the said LoRA. I used an 80GB A100 and it took around 3 hours on 100 pictures.

The files

If you’re interested in the dataset, I uploaded the whole dataset and pictures here. You can see there is a folder called minimized images which is 100 hand picked images from the original dataset.

And if you’re looking for the LoRA, you can download and even test it here.

The 3D Generation

Well, after making the image generator, we needed a way of turning single images to 3D files and of course the 3D format must be something acceptable for all devices.

OBJ and FBX are great formats when it comes to game development (specially if you’re using Unity game engine) but for WebGL and WebXR, gLTF or GLB formats are usually preferred.

The best option for this, is fal.ai’s TripoSR API. You upload your image, the model is being called and BOOM you have a GLB file which can be used on every WebGL or WebXR project you can think of.

What’s next?

Since I personally am working on another project with Mann-E’s proprietary models, I may stop this particular project right here. I almost did everything I had in mind.

Although we still have the important topic of world generation using AI, but I guess it needs a more in depth study and will not be this easy at all. Also the commercializing process of the whole thing is also a topic of thought and for now, I just want to keep the project fun.

Maybe in a few weeks, I return with a more commercial approach and also some ideas about the whole blockchain or crypto space.

November 25, 2024

Let’s build Metaverse with AI : LLaMA Mesh is out of picture

In the previous post I mentioned that I could not get LLaMA Mesh to work, right? So I could and in this particular post, I am going to explain what happened and why LLaMA Mesh is not a good option at all.

First, I will explain the workflow of the model’s deployment. Because I think it is important to know the flow. Then, I will tell you what I asked it and why I am very disappointed in this model (although I thought it might be a promising one).

The Flow

In this part, I’m explaining what flows I chose in order to make LLaMA Mesh work. First flow I chose was an absolute failure, but this morning I was thinking about every place I could host a custom model, so I managed to deploy and test the model and pretty much get disappointed.

The failed flow

First, I paid a visit to my always goto website RunPod and tried to use their serverless system and deploy the model using vLLM package. I explained this in the previous post.

First, it didn’t work and I decided to go with a quantized version. It didn’t work either. I know if I could spend a few hours on their website, I’d be successful in terms of running the model but to be honest, it wasn’t really a priority for me at the moment.

The second failure

This wasn’t quite a failure tough. After I couldn’t deploy the model in one possible way I knew, I just headed over to Open Router. I guessed they may have the model but I was wrong.

I also didn’t surrender here. I paid a visit to Replicate as well. When I was there, I noticed there are good models labeled as 3D but non of them are LLaMA Mesh, my desired one.

The Successful One

Well after a few unsuccessful tests, I was thinking of Google Colab. But I remembered that their free tier subscription is not suitable for eight billion parameter models which are not quantized.

What is another option then? Well it all is because of an email I received this morning. I was struggling to wake up as usual and I saw my phone vibrating. I picked my phone up and saw an email from GLHF. They have a quite good bunch of models on their always on mode and also they let you run your own models (if hosted on hugging face) and then I decided to go with them!

The Disappointment

Now, this is the time I’m going to talk about how disappointed I got when I saw the results. The model is not really different from other LLMs I covered in the previous post and just had one advantage: quantization in the output 3D objects.

The integer quantization however is just good for speeding up the generation and make the output a little more “lowpoly”. Otherwise the final results were good only if you asked for basic shapes such as cubes or pyramids.

Should we rely on LLMs for 3D mesh generation at all?

Short answer is No. Long answer is that we need to work more on the procedures, understand formats more and then try to work on different formats and ways of generating 3D meshes.

Mesh generation in general is only one problem. We also have problems such as polishing and materializing the output 3D object which can’t be easily done by a large language model.

What’s next?

Now, I’m more confident about the idea I discussed before. Taking existing image models, fine tune them on 3D objects and use an existing image to 3D model in order to make the 3D objects needed.

But I have another problem, what happens when we generate items and not having a place to put them? So for now I guess we need a world generator system which we should be thinking about.

November 24, 2024

Let’s build Metaverse with AI: We need to talk about 3D

In the previous post about building metaverse with AI, we discussed different possibilities and AI models we can access in order to make the virtual world. Although I personally am a big fan of 2D worlds, but let’s be honest, a 2D world is basically a perfect choice for a low budget indie game and nothing more.

In this post, I am going to talk about different models and ways I found about making 3D objects from text or image inputs. It was a fun experiment and I guess it’s worth sharing with the outside world in form of a blog post.

My discoveries

The very first thing I want to discuss is about my own discoveries in the field of 3D generation using AI. I always wondered what are 3D objects? And I got my answer.

The simplest way of discovering this was that make different 3D files using a 3D creation/editing tool such as Blender and do further investigation on the outputs. While working with different files, I discovered OBJ files are just simple text based explanations of the vertices and dots forming a shape.

Also, recently I found out about a research paper called LLaMA mesh. If I want to make it short, I should say that these people found out that LLaMA models are capable of generating OBJ files, then they fine-tuned the model further on 3D and OBJ files data in order to make the model better in making more coherent results when asked about 3D obj files.

Well, in order to find out the best metaverse base model, I just did a bunch of tests on different models and here, I am explaining every single test I’ve done.

Models I’ve tested

ChatGPT

Yes. ChatGPT is always my first goto for AI specially when it’s about text. Since OBJ files are basically text files with information about the desired shape, I made a stop on ChatGPT’s website and tested its capabilities in making 3D objects.

I used GPT-4o mini, GPT-4o and o1 models. They have understandings of the OBJ creation, but this understanding was very basic. The best shape I could get from OpenAI’s flagship models was just a simple cube. Which you don’t need any design skill to make in different 3D design programs.

Claude

Anthropic’s Claude, was nothing better than ChatGPT. I personally got much better code output from this model in the past and I had it in mind that this model will perform better in case of code generation.

But I was wrong. I still couldn’t get anything better than basic shapes from this one as well. Cubes, Cylinders or Pyramids. These shapes aren’t really complicated and even without any 3D design knowledge, you can make them since blender, 3ds max, maya, etc. all have them as built-in tools.

LLaMA

Since I read the paper and understood that this whole game of LLaMA Mesh has started since the researches found out LLaMA is capable of generating 3D OBJ files. It wasn’t surprising for me, since LLaMA models are from Meta and Meta is the company starting the whole metaverse hype.

In this particular section, I’m just talking about LLaMA and not the fine-tune. I used 8B, 70B, 1B, 3B and 405B models from 3.1 and 3.2 versions. Can’t see they performed better in order to generate the results, but they showed a better understanding which was really hopeful for me.

At the end of the day, putting their generations in test, again I got the same result. These models were great when it comes to basic shapes and when it gets more complicated, the model seems to understand, but the results are far from acceptable.

LLaMA Mesh

I found an implementation of LLaMA Mesh on huggingface which can be accessed here. But unfortunately, I couldn’t get it to work. Even on their space on HF, the model sometimes stops working without any errors.

It seems due to high traffic this model can cause, they limited the amount of requests and tokens you can get from the model and this is the main cause of those strange errors.

The samples from their pages seem so promising, and of course we will give this model the benefit of the doubt.

Image to 3D test

Well as someone who’s interested in image generation using artificial intelligence, I like the image to 3D approach more than text to 3D. Also I have another reason for this personal preference.

Remember the first blog post of this series when I mentioned that I was a cofounder at ARmo? One of the most requested features from most of our customers was this we give you a photo of our product and you make it 3D. Although we got best 3D design experts to work, it was still highly human dependent and not scalable at all.

Okay, I am not part of that team anymore, but it doesn’t mean I don’t care about scalability concerns in the industry. Also, I may be working in the same space anytime.

Anyway, In this part of the blog post, I think I have to explain different image generators I used for finding out what models have the best results.

Disclaimer: I do not put example images here, just explain about the behavior of the model. The image samples will be uploaded in the future posts.

Midjourney

When you’re talking about AI image generation, the very first name people usually mention is Midjourney. I personally use it a lot for different purposes. Mostly comparing with my own model.

In this case, with the right prompting and right parameters, it made pretty good in app screenshots of 3D renders. Specially my most favorite one “lowpoly”. Although I still need more time and study to make it better.

Dall-E

Not really bad, but has one big down side. You cannot disable prompt enhancement in this model while using it. This basically made me put Dall-E out of the picture.

Ideogram

It is amazing. Details and everything is good, you can turn prompt enhancement off, you can tune different parameters, but still has problems in understanding the background colors. This was the only problem I could face with this model.

Stable Diffusion XL, 3 and 3.5

SD models perform really well, but you need to understand how to use them. Actually when it comes to XL or 1.5, you must have a big library of LoRA adapters, text embeddings, controlnets, etc.

I am not interested in 3 or 3.5 models that much but without any special addition, they perform well.

Something good about all stable diffusion models is that all of them are famous for being coherent. Specially the finetunes. So something we may consider for this particular project might be a finetune of SD 1.5 or XL as well.

FLUX

FLUX has good results, specially when using Ultra model. There are a few problems with this model (mostly licensing) and also sometimes, it loses its coherency. I don’t know how to explain this, it seems like the times you press the brake pedal but it doesn’t stop your car and there’s nothing wrong with the brake system.

Although it has these problems, seemed to be one of the best options for generating images of 3D renders. It still needs more study.

Mann-E

Well, as the founder and CEO of Mann-E, I can’t leave my own platform behind! But since our models are mostly SDXL based, I guess the same goes here. Anyway, I performed the test on all of our 3 models.

I have to say it is not really any different from FLUX or SD, and the coherency is somehow stable. What I have in mind is basically a way to fine tune this model in order to generate better render images of 3D objects.

Converting images to 3D objects

I remember almost two years ago, we used a technique called photogrammetry in order to make 3D objects from photos. It was a really hard procedure.

I remember we needed at least three cameras in three different angles, a turning table and some sort of constant lighting system. It needed its own room, its own equipment and wasn’t really affordable for a lot of companies.

It was one step forward in making our business scalable but it also was really expensive. Imagine just making a 3D model of a single shoe, takes hours of photography with expensive equipment. No, it’s now what I want.

Nowadays, I am using an artificial intelligence system called TripoSR which can convert one single image to a 3D object. I tested it and I guess it has potentials. I guess we have one of our needed ingredients in order to make this magical potion of metaverse.

Now we need to make a way for building the metaverse using AI.

What’s next?

It is important to find out what is next. In my opinion, the next step is to find a way to make image models perform better in terms of generating 3D renders. Also, designing a pipeline for image to 3D is necessary.

Also, for now I am thinking of a different thing. You enter the prompt as a text, it generates images, images fed to TripoSR and then we have 3D models we need.

I guess the next actual step will be finding potentials of the universe/world generation by AI!

November 22, 2024

Let’s build Metaverse with AI: What we have?

In the previous post about building metaverse with AI (link), I discussed the generic points of view, what we need and all the stuff like that. In this post, I am going to discuss about AI models we have which can be helpful in order to build the metaverse using AI and also possible pipelines.

Also, remember that in this particular post, I only will be discussing the AI models which I think can be helpful in building a virtual universe. So if your favorite AI isn’t in the list, accept my apologizes.

AI models for Metaverse

First of all, I think for building a virtual universe or metaverse using AI, we need these models:

Image generation models: These models will help us build everything imaginable. These are essential in pretty much every AI art project and of course, very useful in order to make the concept of our supposed Metaverse.
Music/SFX generation models: Imagine walking in a jungle. The landscape is pictured in your mind right? Now go a little deeper. You hear the sounds in your head, too. This is what we call soundscape in Ambient or minimalistic music (as I wrote about it before). Now let’s consider a metaverse we’re building, right? This newly made universe needs sounds. Without sounds, metaverse doesn’t mean anything. We need AI models in order to generate music, sounds and soundscapes for us.
Vision Language Models: These are important as well. In building the metaverse, we need everything to be as automated as possible. Basically, we need the matrix but in a good way. So a vision model can easily analyze a scene and generate respective prompts for sound generators.
3D Generation models: And the question is why not? We try to make a complete 3D universe and we need to make 3D objects which let people make their desired universe, right? With AI, this will be a reality.

Now, let’s dive a little more in depth and look at what models we have access to!

Image Generators

If you ask me, this is the easiest type of model to find for this particular project. We have tons of proprietary options such as Dall-E 3 or Midjourney or even FLUX Pro. Which are all considered the best in the business.

In the open source side, we’ve got Mann-E, Stable Diffusion and other useful models as well, right? This means with a small search on the web, we can find out the best way of visualizing our dreams of a made-up universe.

Also, due to my research about different models and hosting services, hosting models on replicate or modal is very easy. For other types of hosting we may explore possibilities on CivitAI or Runware as well.

Music and Sound Effects generators

This is also not a rare thing. Although I am not really familiar with the music generation space and I only know Stable Audio LM and Meta’s Music Gen in open space, and Suno AI in proprietary space, I guess we already have the best in the business.

Vision Models

Well, I personally use Open Router to find out about the possibilities of these models, and being honest, the best model I could find for vision task was nothing but GPT-4o.

Although there are good vision models out there, but most of them are very generic or very specific and GPT-4o is right at the middle. We can use this model in order to describe different scenes in our metaverse. Also, we may utilize this model in order to be a guide through the metaverse or just help us build 3D objects or soundscapes.

3D Generation Models

Well these models are currently the rarest models in the list. We may need two approaches for this specific task:

Text to 3D: very similar to text to image, you just describe your scene or object, and get the 3D object. Although it may be a little buggy, but it will be a fun experiment to implement a model or pipeline for text to 3D. It will help the residents of our metaverse to generate assets of their choice as easy as typing what they have in their minds.
Image to 3D: This is also a possibility. Currently, I use TripoSR a lot for making different 3D objects, but I still couldn’t find the best input images or the best settings or hyper-parameter tuning for getting the best results.

With 3D generators, our workflow will become much much easier than what you may think. So we need another step, right?

What’s next?

Well, in the previous post we discussed the whole idea of metaverse and what we need to build one. In this one, we just discovered the AI tools we may be able to utilize. The next will be a study on how we can make a metaverse AI model at all.

It will be the most challenging part of the project, but in my honest and unfiltered opinion, it is also the best part!

November 21, 2024

Let’s build Metaverse with AI : Introduction

It was 2021, the whole products under the flag of Facebook, went down for a few hours. I remember that most of my friends just started messaging me on Telegram instead of WhatsApp and also no new post or story was uploaded on Instagram.

A few hours passed, everything went back to normal, except one. Zuckerberg made a huge announcement and then told the whole world Facebook will be known as Meta and he also announced the Metaverse as a weird alternate life game where you can pay actual money and get a whole lot of nothing.

I personally liked the idea of metaverse (and at the same time, I was a co-founder of ARMo, an augmented reality startup company) so you may guess, it was basically mu job to follow the trends and news about metaverse and what happens around it.

It’s been a few days I am thinking about metaverse again. Because I have a strong belief about the whole thing becoming a hype again. Specially with this bull run on bitcoin and other currencies. I also concluded that metaverse has a big missing circle, which I’m going to discuss in this post.

A little backstory

Since I started Mann-E, as an AI image generation platform, a lot of people messaged me about connecting the whole thing to the blockchain. Recently, I just moved the whole payment system to cryptocurrencies and I’m happy of what I’ve done, not gonna lie.

But for being on the chain, I had different thoughts in mind and one of them was an ICO, or even an NFT collection. They may seem cool but they also always have the righteous amount of criticism and skepticism as well. I don’t want to be identified as a bad guy in my community of course, so I left those ideas for good.

As you read prior to this paragraph, I have a history in XR (extended reality) business and currently, I have my own AI company. I was thinking about the connection of Metaverse and AI, and opportunities of both!

Before going deep, I have to ask a question…

What did we need to access the metaverse?

In 2021, when it was the hot topic of every tech forum, if you asked Okay then, how can I enter the metaverse? No one could answer correctly. At least in Iranian scene, it was like this.

I did a lot of research and I found these to enter a metaverse of choice:

A crypto wallet: Which is not a big deal. Pretty much everyone who’s familiar with tech and these new trends, owns a crypto wallet. They’re everywhere. You can have them as web apps, native apps, browser extensions and even in hardware form. If you want to waste a few hours of your life, you also can build one from scratch.
Internet browser: Are you kidding me? We all have it. Currently most of the applications we’ve used to install on our computers turned into SaaS platforms. We need to have a good browser.
A bit of crypto: The problem in my opinion starts here. . Most of these projects however had a token built on ETH network (or accepted Ethereum directly) but some of them had their native currencies which were impossible to buy from well-known exchanges and as you guessed, it increased the chance of scam! But in general it was a little odd to be forced to pay to enter the verse without knowing what is happening there. I put an example here for you. Imagine you are in Dubai, and you see a luxurious shopping center. Then you have to pay $100 in order to enter the center and you just do window-shopping and leave the shopping center disappointed. It’s just a loss, isn’t it?

But this is not all of it. A person like me who considers him/herself as a builder needs to explore the builder opportunities as well, right? Now I have a better question and that is…

What we need to build on Metaverse?

In addition to a wallet, a browser and initial funds for entering the metaverse, you also need something else. You need Metaverse Development Skills which are not easy to achieve.

If we talk about programming side of things, most of the stuff can be easily done by using libraries such as ThreeJS or similar ones. If you have development background and access to resources such as ChatGPT, the whole process will not take more than a week to master the new library.

But there was something else which occupied my mind and it was 3D Design Skills which are not easily achievable to anyone and you may spend years to master it.

And this is why I think Metaverse needs AI. And I will explain in the next section.

The role of AI in metaverse

This is my favorite topic. I am utilizing AI since 2021 in different ways. For example, I explained about how I could analyze electrical circuits using AI. Also if you dig deeper in my blog, you may found I even explained my love of YOLOv5 models.

But my first serious Generative AI project was the time GitHub’s copilot becoming a paid product and I was too cheap to pay for it, so I build my own. In that particular project, I have utilized a large language model called BLOOM in order to generate code for me. It was the beginning of my journey in generative artificial intelligence.

A few months after that, I discovered AI image generators. It lead me to the point I could start my own startup with just a simple ten dollars fund. Now, I have bigger steps in mind.

Generative AI for metaverse

There is a good question and that is How can generative artificial intelligence be useful in the metaverse? And I have a list of opportunities here:

Tradebots: Since most of metaverse projects offer their own coin or token, we may be able to utilize AI to make some sort of advice or prediction for us. Honestly, this is my least favorite function of AI in the metaverse. I never was a big fan of fintech and similar stuff.
Agents: Of course when we’re entering the matrix, sorry, I meant metaverse, we need agents helping us find a good life there. But jokes aside, Agents can help us in different ways such as building, finding resources or how to interact with the surrounding universe as well.
Generating the metaverse: And honestly, this is my most favorite topic of all time. We may be able to utilize different models to generate different assets for us just in order to build our metaverse. For this particular one, we need different models. Not only LLMs, but image generators, sound generators, etc.

What’s next?

The next step is doing a study on every resource or model which can be somehow functional or useful in the space. Also we may need to explore possibilities of different blockchains and metaverses in general. But first, the focus must be on AI models. The rest will be made automatically 😁

November 1, 2024

Privacy-focused AI is all we need

I remember in 2020 and 2021, due to Elon Musk’s interest in crypto and also The Metaverse Hype people, specially the ones who had no idea about crypto or blockchain, started investing in the crypto markets. Although it seemed a little bit of a failure, people made profit out of it.

It is not the case, what I’m going to talk about here is that we need crypto as a form of secure payment for AI services and platforms. I guess I will do a little bit of over explanation in this video, but I promise it won’t be that much of over explanation.

My AI background

It was in March 2023 when I founded Mann-E platform, an AI image generation platform letting people make images from their ideas. Just like good old midjourney. We developed our own models, we did bootstrapping and made a community of early adopters.

I personally tried to get in touch with different AI companies, develop different models, make different products. Everything in Generative AI space, has a special place in my heart.

But in the other hand, I also have a background of FLOSS (Free/Libre and Open Source Software) activism. Something felt off for me, while working on all these AI products.

Privacy and AI

Being honest with you, pretty much non of major AI platforms (OpenAI, Anthropic, Midjourney, etc.) are private. They all collect the data, they use it to improve their models, and in return, they give you basically nothing but fancy images or LLMs which are terrible at making a dad joke.

The platform we need is a platform with these details or characteristics:

Sign up/Sign in as normal
No email verification (in order to make it possible for people who are using weird mail servers or fake email addresses)
Crypto only payments.

So now you may ask isn’t it alienating people who are paying in fiat? Well I have to say a lot of platforms alienated people from different corners of the world where they have no access to paypal or any other payment services. So I guess it won’t be a big deal!

In the other side, there are enough platforms accepting fiat currency. If you want to pay in fiat currencies, there are tens of thousands of options in front of you. But what happens when you want to pay in crypto? You will face a whole lot of nothing.

Now what I’m going to do?

Well, more than a year ago, in an event, I was talking about how OpenAI, Midjourney, Meta, Microsoft and NVIDIA are in a way of becoming the big blue of AI industry. But thinking to myself, my approach wasn’t really different from those guys as well.

Now, I decided to make a new platform, which is absolutely privacy focused, not recording prompt, not making you confirm your email and do all the payments in crypto (BTC, ETH and TRX are for the start seem good).

Become an early adopter

As always, I need people to become early adopters. So I made this Google Form (link) to ask you become a part of this project (for this one, please provide a real email address 😂). Also, you can support this project and accelerate the process of making it.

Conclusion

The project currently has no name, so I’d be happy to hear your suggestions. Naming aside, I personally think this concepts becomes more popular in the following years. Specially with the growth of Telegram airdrops and meme coins, crypto will have a new life.

I guess it is the time we have to act and make crypto a great payment tool for modern technology!