Let’s build Metaverse with AI : Introduction

It was 2021, the whole products under the flag of Facebook, went down for a few hours. I remember that most of my friends just started messaging me on Telegram instead of WhatsApp and also no new post or story was uploaded on Instagram.

A few hours passed, everything went back to normal, except one. Zuckerberg made a huge announcement and then told the whole world Facebook will be known as Meta and he also announced the Metaverse as a weird alternate life game where you can pay actual money and get a whole lot of nothing.

I personally liked the idea of metaverse (and at the same time, I was a co-founder of ARMo, an augmented reality startup company) so you may guess, it was basically mu job to follow the trends and news about metaverse and what happens around it.

It’s been a few days I am thinking about metaverse again. Because I have a strong belief about the whole thing becoming a hype again. Specially with this bull run on bitcoin and other currencies. I also concluded that metaverse has a big missing circle, which I’m going to discuss in this post.

A little backstory

Since I started Mann-E, as an AI image generation platform, a lot of people messaged me about connecting the whole thing to the blockchain. Recently, I just moved the whole payment system to cryptocurrencies and I’m happy of what I’ve done, not gonna lie.

But for being on the chain, I had different thoughts in mind and one of them was an ICO, or even an NFT collection. They may seem cool but they also always have the righteous amount of criticism and skepticism as well. I don’t want to be identified as a bad guy in my community of course, so I left those ideas for good.

As you read prior to this paragraph, I have a history in XR (extended reality) business and currently, I have my own AI company. I was thinking about the connection of Metaverse and AI, and opportunities of both!

Before going deep, I have to ask a question…

What did we need to access the metaverse?

In 2021, when it was the hot topic of every tech forum, if you asked Okay then, how can I enter the metaverse? No one could answer correctly. At least in Iranian scene, it was like this.

I did a lot of research and I found these to enter a metaverse of choice:

  • A crypto wallet: Which is not a big deal. Pretty much everyone who’s familiar with tech and these new trends, owns a crypto wallet. They’re everywhere. You can have them as web apps, native apps, browser extensions and even in hardware form. If you want to waste a few hours of your life, you also can build one from scratch.
  • Internet browser: Are you kidding me? We all have it. Currently most of the applications we’ve used to install on our computers turned into SaaS platforms. We need to have a good browser.
  • A bit of crypto: The problem in my opinion starts here. . Most of these projects however had a token built on ETH network (or accepted Ethereum directly) but some of them had their native currencies which were impossible to buy from well-known exchanges and as you guessed, it increased the chance of scam! But in general it was a little odd to be forced to pay to enter the verse without knowing what is happening there. I put an example here for you. Imagine you are in Dubai, and you see a luxurious shopping center. Then you have to pay $100 in order to enter the center and you just do window-shopping and leave the shopping center disappointed. It’s just a loss, isn’t it?

But this is not all of it. A person like me who considers him/herself as a builder needs to explore the builder opportunities as well, right? Now I have a better question and that is…

What we need to build on Metaverse?

In addition to a wallet, a browser and initial funds for entering the metaverse, you also need something else. You need Metaverse Development Skills which are not easy to achieve.

If we talk about programming side of things, most of the stuff can be easily done by using libraries such as ThreeJS or similar ones. If you have development background and access to resources such as ChatGPT, the whole process will not take more than a week to master the new library.

But there was something else which occupied my mind and it was 3D Design Skills which are not easily achievable to anyone and you may spend years to master it.

And this is why I think Metaverse needs AI. And I will explain in the next section.

The role of AI in metaverse

This is my favorite topic. I am utilizing AI since 2021 in different ways. For example, I explained about how I could analyze electrical circuits using AI. Also if you dig deeper in my blog, you may found I even explained my love of YOLOv5 models.

But my first serious Generative AI project was the time GitHub’s copilot becoming a paid product and I was too cheap to pay for it, so I build my own. In that particular project, I have utilized a large language model called BLOOM in order to generate code for me. It was the beginning of my journey in generative artificial intelligence.

A few months after that, I discovered AI image generators. It lead me to the point I could start my own startup with just a simple ten dollars fund. Now, I have bigger steps in mind.

Generative AI for metaverse

There is a good question and that is How can generative artificial intelligence be useful in the metaverse? And I have a list of opportunities here:

  • Tradebots: Since most of metaverse projects offer their own coin or token, we may be able to utilize AI to make some sort of advice or prediction for us. Honestly, this is my least favorite function of AI in the metaverse. I never was a big fan of fintech and similar stuff.
  • Agents: Of course when we’re entering the matrix, sorry, I meant metaverse, we need agents helping us find a good life there. But jokes aside, Agents can help us in different ways such as building, finding resources or how to interact with the surrounding universe as well.
  • Generating the metaverse: And honestly, this is my most favorite topic of all time. We may be able to utilize different models to generate different assets for us just in order to build our metaverse. For this particular one, we need different models. Not only LLMs, but image generators, sound generators, etc.

What’s next?

The next step is doing a study on every resource or model which can be somehow functional or useful in the space. Also we may need to explore possibilities of different blockchains and metaverses in general. But first, the focus must be on AI models. The rest will be made automatically 😁

FrontBricks, my LLM-based weekend project which is inspired by Vercel’s V0

Since 2022, there is a hype of generative artificial intelligence and it resulted in a bunch of cool projects. Although a lot of us may remember that Github’s copilot was much older. Those days, I wrote an article about how I was too cheap to pay $10 a month for copilot, so I made my own!

That was somehow the beginning of my interest in AI field. I spent around four years in this field and like most of us, I tried to utilize different tools and products. In this article, I’m talking about FrontBricks which is my newest product and how it started as a weekend project!

A little bit of history

In 2023, I launched Mann-E which is an AI image generator based on its own models (and more information is provided in the website). A few months ago, I also launched Maral, which is a 7 billion parameter LLM specialized for the Persian language (the language I speak).

Also, around a month ago, I did some tests with brand new LLMs such as LLaMa 3, in order to make Mann-E Search which can be somehow an alternative to Perplexity but with a little difference (it doesn’t provide a chat interface).

I guess this can clarify how I am drowned in AI space and how much I love generative AI! Now we can talk about FrontBricks!

What is FrontBricks?

You may be familiar with Vercel’s V0 which is a generative AI tool helping people generate frontend components. I liked their idea, and I joined their waitlist and a couple days later, I got access to the platform.

It was a cool experience, and some sparks formed in my head. I found out that pretty much all LLMs are really good at the task of code generation, and we can utilize one to generate the code and use another one in order to find out if the code is valid or not.

This was my whole idea so I sat at my desk and started to code a basic tool to send my prompts to OpenAI’s API in order to generate and then another one to do the validation using LLaMa 3 70B and GPT-4 as well (I used OpenAI again).

I also found another bottleneck, which was JSX code generation. I did a little bit of research and I found that is not really a big deal and using the power of Regex and text manipulation, it’s easily possible to turn pure HTML to JSX!

I wrote pretty much everything, so I just switched to my work environment, created a simple rails app and then connected it to my backend module. Now, I have a platform which can be an alternative to Vercel’s V0!

Today, I am just announcing frontbricks, but I have to say before this post around 211 people gave me their email addresses to put them in the list of early adopters and I gave them access to the platform earlier this week!

My birthday (May 30th) was in this week, so I guess it can also be a bit of surprise for my friends and the community.

How can I access FrontBricks?

Well, it is easy. You just need to go to frontbricks.com and create an account (sign up link). Then you just need to confirm your email and boom, you have unlimited access to FrontBricks, completely free of charge!

You can generate a component, then improve it and every time you felt you need a new component, you easily can choose to create a new code snippet. It is as easy as drinking a cup of tea.

Future Plans

Since this project isn’t monetized yet, the very first thing coming to my mind is a way to monetize it (you still can donate in crypto through this link). A good business model can help this project be much better.

I also am thinking of releasing an open source model based on the data provided on FrontBricks, because one of the reasons I coded this project is just that I couldn’t find a model specialized for front-end generation!

These are my concerns for now. If you have any other ideas, I’m open to here.

Conclusion

I have a haystack of ideas in my mind, and if I find enough time, I implement them. Mann-E and FrontBricks are just two of projects I just made and to be honest, Mann-E with around 6000 users and more than 50,000 generated images, is somehow one my most successful projects.

FrontBricks has potential, but I guess I can’t keep it up alone. I’m open to technical and business ideas as well. So if you have any ideas in mind, feel free to send me a message, my email is haghiri75@gmail.com 😁

You don’t owe money to the brush company if you sell your art

In my previous post, I explained how the future of the content is AI. Also, in an older post, I was talking about how AI generated content can revolutionize the world of interior design/architecture. In this post however, I’m not talking about these topics and I’m going to talk about legal issues and questions about AI generated art, and there will be a twist at the end. Wait for it 😁

AI content creators are concerned about legal stuff

Yes, they are. If they are not, they are making a very very big mistake. When you create any form of content, one of the most important aspects of publishing it is the legal issues.

These legal stuff are usually about the rights of content creators over their content and also the rights of companies who develop the tools for content creation.

In this part of the article, I am talking about what I guess is the important legal topic in this new generation of content creation.

The Ownership

The very first time I posted about my own AI art generator model Voyage in a Telegram chat room, one of my friends asked Who owns the generated art? You? Or us? and I explained since you have to run the generator on your own computer, you are the owner of the generated art and you don’t owe me anything.

By the way, most of them gave me huge credits when they posted their artwork on social media or even on the very same chat room.

But I found out most of those proprietary art generators like Midjourney don’t act like that. They make you pay them if you want to own what is your own. Let me make this a little bit clear.

Imagine you are going to buy a nice set of brushes and colors. You paid for it, right? Now you made a beautiful piece of art with those tools and now you want to sell it. Now imagine the brush company asks for shares! Isn’t it hilarious? of course it is. I believe this must be considered by AI Artists who use these proprietary tools to generate content.

Use by and for minors

another important topic in the new generation of content creation tool is always how minors will use it? and it also concerns me a lot (specially since Stable Diffusion 2.0 has no NSFW filtering). So what should we do for our younger friends? A lot of content creation platforms like YouTube, Pinterest, Instagram, DeviantArt, etc have their own policies and filters for public content distribution.

For example, I’m a big fan of horror movies and when I search about content about them such as reviews, fan arts and even scripts, I usually face the age confirmation pages and modals. Now you can understand where will I go with this topic.

AI is dumb, it cannot understand what it generates and we need a little more human observation on the generated content. For example in Stable Diffusion’s discord, I remember reacting to NSFW content by a certain emoji, could mark it as potentially harmful and then they could improve their NSFW filtering system.

Plagiarism

I guess you thought I don’t give a fine F about copyrights, right? No it’s not true. I believe artists and content creators should be credited well. So let’s talk about another topic which seems very important.

The very first day I started AI content generation, there only was a good free (in any sense of the word free) and it was VQGAN+CLIP. It was a great tool to make art and even today it has a unique quality of art comparing to other tools.

But even those days, I had a huge concern. What if I plagiarize another artist’s work? and this concern was at its highest form when I figured out adding names of well known artists such as Greg Rutkowski, James Gurney, Thomas Kinkade, Salvador Dali and thousands more can alter the result for us! So as both AI generator developers and artists, we should pay attention to this matter as well!

And last but not the least: Fake Art!

One of my most favorite activities is trying new artist names in my prompts. I love to see how their minds would paint what I’m thinking of. But there is a problem, What if I say this is an unreleased painting by a well known artist? and this can lead us to a huge money fraud.

I never could stop thinking about these matters, and as a person who developed a model and generated tons of content with AI, I never want to be classified as a fraud or scammer or even a person who disrupts the work of other artists.

I guess we talked enough about legal issues, let’s get to the big plot twist of this blog!

Big Twist!

The young blonde woman in the picture is beautiful. Isn’t she? I made it using my model Voyage which I introduced earlier in this blog post. You want to use Voyage and create your own art? Fine. You won’t owe me anything if you do. And if you want to use it in Google Colab, here is the link to the notebook!

Voyage is trained on the data crawled from OpenArt and as you can see, it is a model which can work with a very artistic feel comparing to other models which are available.

Conclusion

In this blog post, we discussed about one of the important aspects of AI content creation/generation which is legal stuff. We also have to fight for our rights of ownership as content creators. In my personal opinion, it is okay to ask for money for a service. We pay a lot for infrastructure and computing power as developers or companies but if we make our users pay us shares, I guess it’s not fair.

In the other hand, we need more and more open source tools for AI content creation. Big tech companies are ruling the market in this world as well and it never is good.

I hope this article was useful and if you like more content like this, please consider sharing it with your friends 🙂

Severus does the magic

It is not too long after I told you that I was too cheap to pay $10 a month for github copilot and I came up with the idea for Severus, my own AI pair programmer. It was something that went boom. My blog usually doesn’t have more than 20 or 30 viewers a day (at its best) and for almost a week, I had more than 200 views per day. Since people showed interest in yet another AI pair programmer, I have decided to continue working on severus, more seriously.

Severus code generation
Severus is now capable of being accessed as an API

My plans for Severus

So in this article, I may discuss a bunch of problems I may face in the long path of creating Severus and making it available as an end-user software. There are some serious concerns, for example when I talked about the idea of Severus with one of my colleagues, he told me he is concerned about the confidential codes he has written.

Almost all of your concerns are valid (except the one who thinks this whole process is handled by the Illuminati) and those are my concerns as well. The next problem I may face is for the scaling, so I perhaps need to hire a well-educated DevOps engineer.

In this section, I explain all of my serious concerns and needs, and I expect some help from you, the kind readers of the article.

The Community

Creating a community around something which is honestly a weekend project, doesn’t seem like a good idea. You may say this thing happened for the Linux kernel as well. You’re right, but this is a little bit different. There are tons of tools which may work much better than Severus.

Also, it is important to know the place for creating the community. A subreddit? A discord server? A room on Matrix? An internet forum? I have no idea honestly.

So this is the biggest concern for me. The community!

Performance and text-generation glitches

The performance is good, thanks to huggingface inference API. Actually, knowing the fact that huggingface API exists, helped me with the implementation. But I still have some concerns here.

My main concern is that BLOOM starts generating some text which is not or cannot be classified as code. I tried different ways to get better results, but I still need some ways to verify the generated result is code and it’s not a text which includes the code. And this is really the hard part I guess.

For this purpose, I may need some help. Validation must be done on the results in order to get a good AI pair programmer, otherwise it’ll become more like an annoying colleague or an intern who knows something, but can’t gather his/her mind.

The Product

And final concern/plan is the product. For current use, I only have a simple application which runs on port 5000 on my laptop. Nothing more. There is no authentication and no user validation system, no monitoring, no scaling, no infrastructure. Basically a MacBook Pro which runs tons of programs daily and severus is currently one of them.

I had a VS Code extension in mind, also I thought of a web app as the MVP, when you can easily copy your code and then use it in your very own projects (and of course it won’t be the best choice for a confidential piece of code).

Although I have ideas in mind, I still need more brainstorming about how this project should be delivered to you as a product.

Conclusion

I still have a lot to do with this project. There might be some language detection to detect if the generated output is the code or not, and also there might be some more code validation to avoid mixing different programming languages.

Overall, this is one of the most difficult and at the same time the funnest projects I’ve ever done. I won’t give up on this, even if it seems like a painful and expensive hobby to people around me 🙂

 

I was too cheap to pay $10 a month for copilot, so I made my own

In mid 2021, there was a revolution in coding. As a lazy programmer who always needed a fast and smart assistant, I was really happy to have Github Copilot in my arsenal of coding tools. So I was one of the early adapters of the whole idea of AI pair programmer.

Everything was fine with Copilot. I wrote tens of thousands of lines of code in last year and I could code a lot of projects which were impossible with a good, smart and fast pair programmers, but everything has been changed since last week I got an email from github, telling me I can’t have free access to Copilot anymore.

It was a sad moment in my life, but I had different ways of adapting and accepting the reality. First, I was thinking of paying $10 a month for a github premium account, but since I won’t use most of github’s premium options, it wasn’t a suitable solution for me. I also checked tabnine or kite as well, and those didn’t work out for me, as well.

My own copilot!

Say hello to Severus, my new AI pair programmer!

First, let me talk about the name a little bit. I was watching Harry Potter franchise recently, and my favorite character in whole franchise is non other than Severus Snape. So I named my AI pair programmer after him. But I know you might be curious about how I made it. So let’s find out!

The language model

First, I needed a language model which could be capable of generating code. At first, I had OpenAI’s GPT-3 in my mind but I remembered that for some reasons, I can’t use it. Then, I fell for free language models. I used GPT-J and although it could understand the code, it didn’t seem a very high-accuracy model to me.

Then, I realized that Meta has released OPT-175B model. I put some of its functionalities to the test. It is a really perfect language model, but it works well when you use it as a core for a chatbot or a blog-post generator (or maybe a prompt engineering tool for Text-To-Image models) but not a great code generator.

Then, I found my saving angel. A lot of open-source engineers and enthusiasts of the world and it’s non other than BigScience’s BLOOM.

Code tests and inference

Like what most of you may have done, first I tried to complete a love story with the model. It was cool. Then I tried to create a friendly, a helpful, an idiot and an evil chatbot with the model. All worked out perfectly. Back then, I did not have any limitations to Copilot, so I didn’t care about the code generation.

When I found out myself in misery of not having my beloved AI pair programmer, I tried some basic python code generation with BLOOM. It was fine, then I have tested PHP, Ruby and JavaScript as well. I found that it works pretty well, so I have decided to write a simple inference code over the API.

Code generation may go wrong

Since I didn’t fine-tune the model (and I don’t have resources to) it may glitch sometimes. For example, when you don’t really pay attention to your code formatting, it might generate explanation of the code.

For me, what happened was that it started explaining the code in a tutorial format (and I bet the whole python codes were from towardsdatascience website since it had pretty similar literature).

In general, I may need a solution for this, as well.

Will it be open source?

Yes. At least it’ll be partly open sourced in near future. But more than being open source, it will be free (as in non-paid) and I guess it may be a pro for the tool. I haven’t even paid a single penny on the model, so why should I make you pay for it? By the way I will be open for donations and technical helps from the community.

Future Plans

  • The API
  • VSCode extension
  • A community website (or discord server)

Conclusion

At the end, it seems we have a lot to do with these brand new language models. I found my way to create a free, reliable and smart AI pair programmer and of course I need some help in this way.

I have to warmly thank you for the time you’ve spent to read my article, and I openly accept your comments and ideas.

A to Z of making an intelligent voice assistant

It was 2011, a sad year for a lot of apple fans (me included) because Steve Jobs, one of original co-founders of Apple Computers died October that year. Also, it could become sadder if there was no iPhone 4S and its features that year.

A few years prior to the first introduction of Siri (which introduced with iPhone 4S), a movie called Iron Man came out from Marvel Studios. Unlike comic books, Jarvis wasn’t an old man in this movie. Jarvis was an A.I. I’m not sure if the movie inspired companies to add the voice assistant to their systems or not, but I’m sure a lot of people just bought those phones or tablets to have their own version of Jarvis!

Long story short, a lot of engineers like me, were under the influence of the MCU (Marvel’s cinematic universe) and Apple and wanted to have their voice assistant a little bit differently! Instead of buying an iPhone 4S, we preferred to start making our own voice assistants.

In this article, I’m discussing the basics you need to learn for making your very own version of Siri. I warn you here, there wil be no codes at least in this one!

How does a voice assistant work?

In order to make something, we first need to learn how on earth that thing works! So, let’s discuss about voice assistants and how they work. They’re much simpler than what you think. It’s guaranteed your mind will be blown by their simplicity!

  • Listening: a voice assistant, as called, needs to listen to the voices and detects what is a decent human voice. For this, we need speech recognition systems. These systems will be discussed further. We just can make one, or we can use one that’s already made.
  • Understanding: In the 2015 movie Avengers: Age of Ultron, Tony Stark (a.k.a Iron Man) says “Jarvis is only a natural language understanding matrix” not considering the matrix part, other part of this sentence makes sense to me. Voice assistants need to understand what we tell them. They can have A.I or hard coded answers or a little bit of both.
  • Responding: after processing what we’ve said, the voice assistant needs to provide the responses that fit our request. For example, you say “Hey Alexa, play music” and your Alexa device will ask you for the title, you say “Back in Black” and she’ll play the song from spotify or youtube music.

Now, we know about the functionality. What about the implementation? It’s a whole other story. The rest of the article, is more about the technical side of making an intelligent chatbot…

Implementation of a Voice Assistant

Speech Recognition

Before we start to make our voice assistant, we have to make sure it can hear. So we need to implement a simple speech recognition system.

Although it’s not really hard to implement a speech recognition system, I personally prefer to go with something which is already made, like Python’s speech recognition library (link). This library sends the audio signal directly to IBM, Microsoft or Google API’s and shows us the transcription of our talk.

In the other hand, we can make our own system with a dataset, which has tons of voices and their transcriptions. But as you may know, you need to make your data diverse af. Why? Let me explain it a little bit better.

When you have your own voice only, your dataset doesn’t have the decent diversity. If you add your girlfriend, sister, brother, co-workers, etc. You still have no diversity. The result may be decent, but it only limits itself to your own voice, or the voices of your family members and friends!

The second problem is that your very own speech recognition, can’t understand that much. Because your words and sentences might be limited to the movie dialogues or books you like. We need the diversity to be everywhere in our dataset.

Is there any solution to this problem? Yes. You can use something like Mozilla’s dataset (link) for your desired language and make a speech recognition system. These data provided by the people around the world and it’s as diverse as possible.

Natural Language Understanding

As I told you, a voice assistant should process what we tell her. The best way of processing is artificial intelligence but we also can do a hard coded proof-of-concept as well.

What does that mean? hard coding in programming means when we want some certain input to have a fixed output, we don’t rely on our logic for that answer, but we just write code like if the input is this, give the user that, with no regard of the logic. In this case, the logic can be A.I, but we tell the machine if user said Hi, you simply say Hi!

But in the real world applications we can’t just go with the A.I. or hard coded functions. A real voice assistant is usually a combination of both. How? When you ask your voice assistant for the price of bitcoin, it’s a hard coded function.

But when you just talk to your voice assistant she’ll may make some answers to you, which may have a human feel and that’s when A.I. comes in.

Responding

Although providing responses can be considered a part of the understanding process, I prefer to talk about the whole thing in a separate section.

A response is usually what the A.I. will tell us, and the question is how that A.I. knows what we mean? and this is an excellent question. Designing the intelligent part of the voice assistant or in general chatbots, is the trickiest part.

The main backbone of responses, is your intention. What is your chatbot for? Is it a college professor assistant or it’s just something that will give you a Stark feeling? Is it designed to flirt with lonely people or it’s designed to help the elderly? There are tons of questions you have to answer before designing your own assistant.

After you asked you those questions, you need to classify what people would say to your bot under different categories. These categories are called intents. Let me explain by example.

You go to a Cafe, the waiter gives you the menu and you see the menu, right? Your intention is now clear. You want some coffee. So, how you ask about coffee? I will say Sir, a cup of espresso please. And that’s this simple. In order to answer all coffee related questions, we need to consider different states, as much as possible. What if customer asks for Macchiato? What if they ask for Mocha? What if they ask for a cookie with their coffee? and this is where A.I. can help.

A.I. is nothing other than making predictions using math. A long time ago, I used to write the whole A.I. logic myself. But later a YouTuber called NeuralNine developed a library called neural intents and it’s for this purpose! How does this library work?

It’s simple. We give the library a bunch of questions and our desired answers. The model we train, can classify questions and then simply predict what category our sayings belong to. Let me show you the example.

When you say a cup of espresso please, the A.I. sees words cup and espresso. What happens then? she’ll know these words belong to the coffee category, so she’ll give you one of those fixed answers from that category.

Keeping answers fixed by the way, is not always a good thing. For some reasons, we may need to make a generative chatbot which also can make responses like a human. Those bots are more complex and require more resources, studies and time.

Final Thoughts

The world of programming is beautiful and vast. When it comes to A.I. it becomes more fun of course. In this article, I tried to explain how a voice assistant can be constructed but I actually didn’t dig deep to the implementation.

Why so? I guess implementation is good, but in most cases, like every other aspect of programming, it’s just putting together some tools. So learning the concept, is much more important in most cases, like this.

I hope the article was useful for you. If it is, please share it with your friends and leave a comment for me. I’d be super thankful.