Look at this:
How do you think this apple has been made? Excellent question. After the previous post, I said we should put LLMs out of the picture for now. Also we needed to talk about 3D, because it is important in whole metaverse space, right? Today I just did it. I trained a LoRA on FLUX and then tried to make 3D objects from what an AI model is capable of generating.
The Image Generator
In this part, I specifically talk about the image generation procedure. It will be a good experience sharing procedure and the open source models created in this process will be linked in the topic as well.
For making an image generator model, we need a base model. Since the whole Generative Metaverse project for me was a fun project and not a serious commercial one, I chose FLUX. However, if I try to go to the blockchain/crypto side of things (probably on TON network) I may consider SDXL as base in order to have no problems in terms of commercial use.
Anyway, everything here is pretty standard. Pretty much every step I took in order to make early versions of Mann-E. So I guess it will be worth sharing one more time, right?
The Dataset
AI models are just a bunch of boring mathematical functions and they become amazing when they are fed with good data. So we needed to create a dataset. As always, the best data generator I could use was Midjourney and of course, I headed over to their website and recharged my account.
I played with a good bunch of prompt combinations to find what is the best one fitting what I have in mind. So after tweaking a lot, I got this: <subject>, lowpoly, 3d illustration, dark background, isometric camera angle.
Here is a sample of what generated with this prompt formula:
After that, I used ChatGPT in order to generate a list of objects we may use or see everyday. After that, I made a prompt list and automated the image generation procedure and got around 800 pictures. Now it was time for training!
The training
First, I was thinking about using Replicate or fal.ai in order to train the LoRA. Honestly they provide easy and affordable ways of training LoRA on FLUX (and to my knowledge, you also may be able to have SD 1.5 and SDXL LoRA’s trained on replicate) but there is one big problem.
These websites are usually not suitable for large scale training or if they offer large scale training systems, you should negotiate with them and as I said, this is a fun project. Not a big OpenAI scale commercial product!
So I was looking for another way. As you may know, Google Colab’s free tier subscription is also no good for FLUX training. So I used AI Toolkit template on RunPod in order to train the said LoRA. I used an 80GB A100 and it took around 3 hours on 100 pictures.
The files
If you’re interested in the dataset, I uploaded the whole dataset and pictures here. You can see there is a folder called minimized images which is 100 hand picked images from the original dataset.
And if you’re looking for the LoRA, you can download and even test it here.
The 3D Generation
Well, after making the image generator, we needed a way of turning single images to 3D files and of course the 3D format must be something acceptable for all devices.
OBJ and FBX are great formats when it comes to game development (specially if you’re using Unity game engine) but for WebGL and WebXR, gLTF or GLB formats are usually preferred.
The best option for this, is fal.ai’s TripoSR API. You upload your image, the model is being called and BOOM you have a GLB file which can be used on every WebGL or WebXR project you can think of.
What’s next?
Since I personally am working on another project with Mann-E’s proprietary models, I may stop this particular project right here. I almost did everything I had in mind.
Although we still have the important topic of world generation using AI, but I guess it needs a more in depth study and will not be this easy at all. Also the commercializing process of the whole thing is also a topic of thought and for now, I just want to keep the project fun.
Maybe in a few weeks, I return with a more commercial approach and also some ideas about the whole blockchain or crypto space.