You only need Python to make AI agents.

In 2022, ChatGPT released and LLMs becoming the hot topic of pretty much every technology related press, event, YouTube video, etc. It was like finding the secret ingredient to a potion which can make you immortal.

But Meta didn’t let OpenAI becoming the one and only. They also started the game by releasing their well-named model Large Language Model Meta AI or LLaMA which we all know and love. Not only Meta, but our friends at Mistral AI weren’t idle and they also released a good bunch of open source models and the result of their work even motivated me in making of my Persian LLM, Maral.

But nowadays, good LLM is not a big problem. With a quick search on the internet, we easily can find good LLMs. Base models and fine-tunes which are made for generic or specific purposes, models which are armed with reasoning, models which are made for programmers, etc.

We have the text output, now we need action. This is what I’m going to discuss in this particular post and I also will love to hear back from you as well.

AI Agents add action to LLMs

Well, I remember when the make-shift Android rip-off of iPod touch or simply Rabbit R1 was introduced, they just advertised the device to work on a Large Action Model or LAM. I always was thinking about how can we modify one of the open LLMs to have action? Then I got the answer.

The simplest thing we can think of is an LLM tuned on JSON input for different API’s with different tones. It is what I believe function calling or tool calling is. But it still has the downside.

Imagine I train LLaMA 3.2 on API’s from AirBnB, Shopify, Amazon, Uber and Spotify. What will happen if you ask for a YouTube video? You even won’t get rick-rolled and it won’t be a good sign for products such as Rabbit R1 (or any other competitors).

Then I got familiar with Crew AI which is a framework for making agents. But honestly, I never understood these AI frameworks. Most of them are making the process of making a simple application over complicated. But thanks to Crew AI, I finally could understand what an AI agent is.

An AI agent, adds actions in a human understandable way to LLMs. Like when you ask ChatGPT to create a picture, it calls an API running Dall-E and then gives you the image. This is what an agent is…! (at least until it’s not called Smith).

Making an AI Agent without the frameworks is possible!

Well, it is possible. You only need Python and probably OpenAI’s library to make an agent. First of all let’s see what an agent does. An agent simply gets a prompt from you. Something like Send an email to John Doe and explain why I will be late tomorrow. The AI model has to understand some steps here.

First, it has to call a function to search your contact list and find John Doe then it has to generate a text explaining why you will be late. Then the last part is to send the email over an email server (which can be a private mail server or a provider like Google’s Gmail).

Also, you can make it one step more difficult for your own agent and ask it to do these in the GUI (basically you need to use a Vision model for this task).

Let’s make it happen in Python. It will be easy and you will understand it better.

Python example

Disclaimer: Since I have a full working code example on github, this part of the blog will be just a simple example.

First step is to find an LLM. I personally think any provider with an OpenAI compatible API works perfectly and for this particular project, I’m using my own LLM which is known as Jabir Project.

Jabir Project is a finetune on LLaMA 3.1 405B and proven itself in many different tasks. If you don’t want to use Jabir LLMs, it’s fine. You may prefer OpenAI, DeepInfra or OpenRouter. Also you may want to go local, so why not using Ollama?

Well, assuming you want to use Jabir’s API, you need to set up an OpenAI client like this:

from openai import OpenAI

client = OpenAI(api_key="FAKE", base_url="https://openai.jabirpoject.org/v1")

This is as easy as typing one line of code! You may be wondering why I used “FAKE” as the API key? It was when I tried to add Ollama’s API to my code and I understood that OpenAI library requires a value for the API key.

Then, we need to set up a simple agent class:

class Agent:
    
    def __init__(self, system=""):
        self.system = system
        self.messages = []
        if self.system:
            self.messages.append({"role" : "system", "content" : system})
    
    def __call__(self, message):
        self.messages.append({"role" : "user", "content" : message})
        result = self.execute()
        self.messages.append({"role" : "assistant", "content" : result})
        return result
    
    def execute(self):
        completion = client.chat.completions.create(
            model = "jabir-400b",
            messages = self.messages,
            temperature = 0.0
        )
        
        return completion.choices[0].message.content

This agent class is what that matters a lot. Since it has a memory of what happened.

You can run the agent like this:

sample_agent = Agent("You are a helpful assistant")
print(sample_agent("What is 1+1?"))

Now the main question is that how can we add actions to this agent?

The Sample Agent with real action

As I was working on a way to make agents with no frameworks, I came up with the idea of making each action a python function and then ask the AI to generate something for me which can be later parsed into inputs for those.

I made it in form of a jupyter notebook and it is available through my Github account. You can write agents like this and be completely framework-independent.

Conclusion

Almost three years ago I made a blog post here called I was too cheap to pay $10 a month for Github’s copilot so I made my own and it was a good start of my journey to generative AI. Although I abandoned text generation for a somehow long time and started Mann-E, I got back to the world of NLP with Maral models.

And Maral got abandoned because my personal life was getting a little rough and then I decided to start a personalization platform called Atelier AI. Which lets you create your own LoRAs for Mann-E models.

But when I restarted the Jabir Project, I thought an LLM is not enough. This model should be the foundation of something bigger. This is why I did a lot of research on AI agents, and now I completely am aware of what I’m going to do.

I love to hear back from readers of my blog about what possible ideas we can implement using LLMs and agents, so I politely ask all of you participate in the discussion and let’s build the future together.

November 25, 2024

Let’s build Metaverse with AI : LLaMA Mesh is out of picture

In the previous post I mentioned that I could not get LLaMA Mesh to work, right? So I could and in this particular post, I am going to explain what happened and why LLaMA Mesh is not a good option at all.

First, I will explain the workflow of the model’s deployment. Because I think it is important to know the flow. Then, I will tell you what I asked it and why I am very disappointed in this model (although I thought it might be a promising one).

The Flow

In this part, I’m explaining what flows I chose in order to make LLaMA Mesh work. First flow I chose was an absolute failure, but this morning I was thinking about every place I could host a custom model, so I managed to deploy and test the model and pretty much get disappointed.

The failed flow

First, I paid a visit to my always goto website RunPod and tried to use their serverless system and deploy the model using vLLM package. I explained this in the previous post.

First, it didn’t work and I decided to go with a quantized version. It didn’t work either. I know if I could spend a few hours on their website, I’d be successful in terms of running the model but to be honest, it wasn’t really a priority for me at the moment.

The second failure

This wasn’t quite a failure tough. After I couldn’t deploy the model in one possible way I knew, I just headed over to Open Router. I guessed they may have the model but I was wrong.

I also didn’t surrender here. I paid a visit to Replicate as well. When I was there, I noticed there are good models labeled as 3D but non of them are LLaMA Mesh, my desired one.

The Successful One

Well after a few unsuccessful tests, I was thinking of Google Colab. But I remembered that their free tier subscription is not suitable for eight billion parameter models which are not quantized.

What is another option then? Well it all is because of an email I received this morning. I was struggling to wake up as usual and I saw my phone vibrating. I picked my phone up and saw an email from GLHF. They have a quite good bunch of models on their always on mode and also they let you run your own models (if hosted on hugging face) and then I decided to go with them!

The Disappointment

Now, this is the time I’m going to talk about how disappointed I got when I saw the results. The model is not really different from other LLMs I covered in the previous post and just had one advantage: quantization in the output 3D objects.

The integer quantization however is just good for speeding up the generation and make the output a little more “lowpoly”. Otherwise the final results were good only if you asked for basic shapes such as cubes or pyramids.

Should we rely on LLMs for 3D mesh generation at all?

Short answer is No. Long answer is that we need to work more on the procedures, understand formats more and then try to work on different formats and ways of generating 3D meshes.

Mesh generation in general is only one problem. We also have problems such as polishing and materializing the output 3D object which can’t be easily done by a large language model.

What’s next?

Now, I’m more confident about the idea I discussed before. Taking existing image models, fine tune them on 3D objects and use an existing image to 3D model in order to make the 3D objects needed.

But I have another problem, what happens when we generate items and not having a place to put them? So for now I guess we need a world generator system which we should be thinking about.

November 22, 2024

Let’s build Metaverse with AI: What we have?

In the previous post about building metaverse with AI (link), I discussed the generic points of view, what we need and all the stuff like that. In this post, I am going to discuss about AI models we have which can be helpful in order to build the metaverse using AI and also possible pipelines.

Also, remember that in this particular post, I only will be discussing the AI models which I think can be helpful in building a virtual universe. So if your favorite AI isn’t in the list, accept my apologizes.

AI models for Metaverse

First of all, I think for building a virtual universe or metaverse using AI, we need these models:

Image generation models: These models will help us build everything imaginable. These are essential in pretty much every AI art project and of course, very useful in order to make the concept of our supposed Metaverse.
Music/SFX generation models: Imagine walking in a jungle. The landscape is pictured in your mind right? Now go a little deeper. You hear the sounds in your head, too. This is what we call soundscape in Ambient or minimalistic music (as I wrote about it before). Now let’s consider a metaverse we’re building, right? This newly made universe needs sounds. Without sounds, metaverse doesn’t mean anything. We need AI models in order to generate music, sounds and soundscapes for us.
Vision Language Models: These are important as well. In building the metaverse, we need everything to be as automated as possible. Basically, we need the matrix but in a good way. So a vision model can easily analyze a scene and generate respective prompts for sound generators.
3D Generation models: And the question is why not? We try to make a complete 3D universe and we need to make 3D objects which let people make their desired universe, right? With AI, this will be a reality.

Now, let’s dive a little more in depth and look at what models we have access to!

Image Generators

If you ask me, this is the easiest type of model to find for this particular project. We have tons of proprietary options such as Dall-E 3 or Midjourney or even FLUX Pro. Which are all considered the best in the business.

In the open source side, we’ve got Mann-E, Stable Diffusion and other useful models as well, right? This means with a small search on the web, we can find out the best way of visualizing our dreams of a made-up universe.

Also, due to my research about different models and hosting services, hosting models on replicate or modal is very easy. For other types of hosting we may explore possibilities on CivitAI or Runware as well.

Music and Sound Effects generators

This is also not a rare thing. Although I am not really familiar with the music generation space and I only know Stable Audio LM and Meta’s Music Gen in open space, and Suno AI in proprietary space, I guess we already have the best in the business.

Vision Models

Well, I personally use Open Router to find out about the possibilities of these models, and being honest, the best model I could find for vision task was nothing but GPT-4o.

Although there are good vision models out there, but most of them are very generic or very specific and GPT-4o is right at the middle. We can use this model in order to describe different scenes in our metaverse. Also, we may utilize this model in order to be a guide through the metaverse or just help us build 3D objects or soundscapes.

3D Generation Models

Well these models are currently the rarest models in the list. We may need two approaches for this specific task:

Text to 3D: very similar to text to image, you just describe your scene or object, and get the 3D object. Although it may be a little buggy, but it will be a fun experiment to implement a model or pipeline for text to 3D. It will help the residents of our metaverse to generate assets of their choice as easy as typing what they have in their minds.
Image to 3D: This is also a possibility. Currently, I use TripoSR a lot for making different 3D objects, but I still couldn’t find the best input images or the best settings or hyper-parameter tuning for getting the best results.

With 3D generators, our workflow will become much much easier than what you may think. So we need another step, right?

What’s next?

Well, in the previous post we discussed the whole idea of metaverse and what we need to build one. In this one, we just discovered the AI tools we may be able to utilize. The next will be a study on how we can make a metaverse AI model at all.

It will be the most challenging part of the project, but in my honest and unfiltered opinion, it is also the best part!

November 21, 2024

Let’s build Metaverse with AI : Introduction

It was 2021, the whole products under the flag of Facebook, went down for a few hours. I remember that most of my friends just started messaging me on Telegram instead of WhatsApp and also no new post or story was uploaded on Instagram.

A few hours passed, everything went back to normal, except one. Zuckerberg made a huge announcement and then told the whole world Facebook will be known as Meta and he also announced the Metaverse as a weird alternate life game where you can pay actual money and get a whole lot of nothing.

I personally liked the idea of metaverse (and at the same time, I was a co-founder of ARMo, an augmented reality startup company) so you may guess, it was basically mu job to follow the trends and news about metaverse and what happens around it.

It’s been a few days I am thinking about metaverse again. Because I have a strong belief about the whole thing becoming a hype again. Specially with this bull run on bitcoin and other currencies. I also concluded that metaverse has a big missing circle, which I’m going to discuss in this post.

A little backstory

Since I started Mann-E, as an AI image generation platform, a lot of people messaged me about connecting the whole thing to the blockchain. Recently, I just moved the whole payment system to cryptocurrencies and I’m happy of what I’ve done, not gonna lie.

But for being on the chain, I had different thoughts in mind and one of them was an ICO, or even an NFT collection. They may seem cool but they also always have the righteous amount of criticism and skepticism as well. I don’t want to be identified as a bad guy in my community of course, so I left those ideas for good.

As you read prior to this paragraph, I have a history in XR (extended reality) business and currently, I have my own AI company. I was thinking about the connection of Metaverse and AI, and opportunities of both!

Before going deep, I have to ask a question…

What did we need to access the metaverse?

In 2021, when it was the hot topic of every tech forum, if you asked Okay then, how can I enter the metaverse? No one could answer correctly. At least in Iranian scene, it was like this.

I did a lot of research and I found these to enter a metaverse of choice:

A crypto wallet: Which is not a big deal. Pretty much everyone who’s familiar with tech and these new trends, owns a crypto wallet. They’re everywhere. You can have them as web apps, native apps, browser extensions and even in hardware form. If you want to waste a few hours of your life, you also can build one from scratch.
Internet browser: Are you kidding me? We all have it. Currently most of the applications we’ve used to install on our computers turned into SaaS platforms. We need to have a good browser.
A bit of crypto: The problem in my opinion starts here. . Most of these projects however had a token built on ETH network (or accepted Ethereum directly) but some of them had their native currencies which were impossible to buy from well-known exchanges and as you guessed, it increased the chance of scam! But in general it was a little odd to be forced to pay to enter the verse without knowing what is happening there. I put an example here for you. Imagine you are in Dubai, and you see a luxurious shopping center. Then you have to pay $100 in order to enter the center and you just do window-shopping and leave the shopping center disappointed. It’s just a loss, isn’t it?

But this is not all of it. A person like me who considers him/herself as a builder needs to explore the builder opportunities as well, right? Now I have a better question and that is…

What we need to build on Metaverse?

In addition to a wallet, a browser and initial funds for entering the metaverse, you also need something else. You need Metaverse Development Skills which are not easy to achieve.

If we talk about programming side of things, most of the stuff can be easily done by using libraries such as ThreeJS or similar ones. If you have development background and access to resources such as ChatGPT, the whole process will not take more than a week to master the new library.

But there was something else which occupied my mind and it was 3D Design Skills which are not easily achievable to anyone and you may spend years to master it.

And this is why I think Metaverse needs AI. And I will explain in the next section.

The role of AI in metaverse

This is my favorite topic. I am utilizing AI since 2021 in different ways. For example, I explained about how I could analyze electrical circuits using AI. Also if you dig deeper in my blog, you may found I even explained my love of YOLOv5 models.

But my first serious Generative AI project was the time GitHub’s copilot becoming a paid product and I was too cheap to pay for it, so I build my own. In that particular project, I have utilized a large language model called BLOOM in order to generate code for me. It was the beginning of my journey in generative artificial intelligence.

A few months after that, I discovered AI image generators. It lead me to the point I could start my own startup with just a simple ten dollars fund. Now, I have bigger steps in mind.

Generative AI for metaverse

There is a good question and that is How can generative artificial intelligence be useful in the metaverse? And I have a list of opportunities here:

Tradebots: Since most of metaverse projects offer their own coin or token, we may be able to utilize AI to make some sort of advice or prediction for us. Honestly, this is my least favorite function of AI in the metaverse. I never was a big fan of fintech and similar stuff.
Agents: Of course when we’re entering the matrix, sorry, I meant metaverse, we need agents helping us find a good life there. But jokes aside, Agents can help us in different ways such as building, finding resources or how to interact with the surrounding universe as well.
Generating the metaverse: And honestly, this is my most favorite topic of all time. We may be able to utilize different models to generate different assets for us just in order to build our metaverse. For this particular one, we need different models. Not only LLMs, but image generators, sound generators, etc.

What’s next?

The next step is doing a study on every resource or model which can be somehow functional or useful in the space. Also we may need to explore possibilities of different blockchains and metaverses in general. But first, the focus must be on AI models. The rest will be made automatically 😁

November 1, 2024

Privacy-focused AI is all we need

I remember in 2020 and 2021, due to Elon Musk’s interest in crypto and also The Metaverse Hype people, specially the ones who had no idea about crypto or blockchain, started investing in the crypto markets. Although it seemed a little bit of a failure, people made profit out of it.

It is not the case, what I’m going to talk about here is that we need crypto as a form of secure payment for AI services and platforms. I guess I will do a little bit of over explanation in this video, but I promise it won’t be that much of over explanation.

My AI background

It was in March 2023 when I founded Mann-E platform, an AI image generation platform letting people make images from their ideas. Just like good old midjourney. We developed our own models, we did bootstrapping and made a community of early adopters.

I personally tried to get in touch with different AI companies, develop different models, make different products. Everything in Generative AI space, has a special place in my heart.

But in the other hand, I also have a background of FLOSS (Free/Libre and Open Source Software) activism. Something felt off for me, while working on all these AI products.

Privacy and AI

Being honest with you, pretty much non of major AI platforms (OpenAI, Anthropic, Midjourney, etc.) are private. They all collect the data, they use it to improve their models, and in return, they give you basically nothing but fancy images or LLMs which are terrible at making a dad joke.

The platform we need is a platform with these details or characteristics:

Sign up/Sign in as normal
No email verification (in order to make it possible for people who are using weird mail servers or fake email addresses)
Crypto only payments.

So now you may ask isn’t it alienating people who are paying in fiat? Well I have to say a lot of platforms alienated people from different corners of the world where they have no access to paypal or any other payment services. So I guess it won’t be a big deal!

In the other side, there are enough platforms accepting fiat currency. If you want to pay in fiat currencies, there are tens of thousands of options in front of you. But what happens when you want to pay in crypto? You will face a whole lot of nothing.

Now what I’m going to do?

Well, more than a year ago, in an event, I was talking about how OpenAI, Midjourney, Meta, Microsoft and NVIDIA are in a way of becoming the big blue of AI industry. But thinking to myself, my approach wasn’t really different from those guys as well.

Now, I decided to make a new platform, which is absolutely privacy focused, not recording prompt, not making you confirm your email and do all the payments in crypto (BTC, ETH and TRX are for the start seem good).

Become an early adopter

As always, I need people to become early adopters. So I made this Google Form (link) to ask you become a part of this project (for this one, please provide a real email address 😂). Also, you can support this project and accelerate the process of making it.

Conclusion

The project currently has no name, so I’d be happy to hear your suggestions. Naming aside, I personally think this concepts becomes more popular in the following years. Specially with the growth of Telegram airdrops and meme coins, crypto will have a new life.

I guess it is the time we have to act and make crypto a great payment tool for modern technology!

May 31, 2024

FrontBricks, my LLM-based weekend project which is inspired by Vercel’s V0

Since 2022, there is a hype of generative artificial intelligence and it resulted in a bunch of cool projects. Although a lot of us may remember that Github’s copilot was much older. Those days, I wrote an article about how I was too cheap to pay $10 a month for copilot, so I made my own!

That was somehow the beginning of my interest in AI field. I spent around four years in this field and like most of us, I tried to utilize different tools and products. In this article, I’m talking about FrontBricks which is my newest product and how it started as a weekend project!

A little bit of history

In 2023, I launched Mann-E which is an AI image generator based on its own models (and more information is provided in the website). A few months ago, I also launched Maral, which is a 7 billion parameter LLM specialized for the Persian language (the language I speak).

Also, around a month ago, I did some tests with brand new LLMs such as LLaMa 3, in order to make Mann-E Search which can be somehow an alternative to Perplexity but with a little difference (it doesn’t provide a chat interface).

I guess this can clarify how I am drowned in AI space and how much I love generative AI! Now we can talk about FrontBricks!

What is FrontBricks?

You may be familiar with Vercel’s V0 which is a generative AI tool helping people generate frontend components. I liked their idea, and I joined their waitlist and a couple days later, I got access to the platform.

It was a cool experience, and some sparks formed in my head. I found out that pretty much all LLMs are really good at the task of code generation, and we can utilize one to generate the code and use another one in order to find out if the code is valid or not.

This was my whole idea so I sat at my desk and started to code a basic tool to send my prompts to OpenAI’s API in order to generate and then another one to do the validation using LLaMa 3 70B and GPT-4 as well (I used OpenAI again).

I also found another bottleneck, which was JSX code generation. I did a little bit of research and I found that is not really a big deal and using the power of Regex and text manipulation, it’s easily possible to turn pure HTML to JSX!

I wrote pretty much everything, so I just switched to my work environment, created a simple rails app and then connected it to my backend module. Now, I have a platform which can be an alternative to Vercel’s V0!

Today, I am just announcing frontbricks, but I have to say before this post around 211 people gave me their email addresses to put them in the list of early adopters and I gave them access to the platform earlier this week!

My birthday (May 30th) was in this week, so I guess it can also be a bit of surprise for my friends and the community.

How can I access FrontBricks?

Well, it is easy. You just need to go to frontbricks.com and create an account (sign up link). Then you just need to confirm your email and boom, you have unlimited access to FrontBricks, completely free of charge!

You can generate a component, then improve it and every time you felt you need a new component, you easily can choose to create a new code snippet. It is as easy as drinking a cup of tea.

Future Plans

Since this project isn’t monetized yet, the very first thing coming to my mind is a way to monetize it (you still can donate in crypto through this link). A good business model can help this project be much better.

I also am thinking of releasing an open source model based on the data provided on FrontBricks, because one of the reasons I coded this project is just that I couldn’t find a model specialized for front-end generation!

These are my concerns for now. If you have any other ideas, I’m open to here.

Conclusion

I have a haystack of ideas in my mind, and if I find enough time, I implement them. Mann-E and FrontBricks are just two of projects I just made and to be honest, Mann-E with around 6000 users and more than 50,000 generated images, is somehow one my most successful projects.

FrontBricks has potential, but I guess I can’t keep it up alone. I’m open to technical and business ideas as well. So if you have any ideas in mind, feel free to send me a message, my email is haghiri75@gmail.com 😁

January 13, 2024January 15, 2024

Nucleus is the proof that “Small is the new Big”

No matter what you heard, size matters. Specially in the world of AI models, having a smaller and more affordable model is the key to win the competition. This is why Microsoft even invested time, GPU and money on Phi project, which is a Small Language Model or SLM for short.

In this post, I represent Nucleus. My newest language model project, which is based on Mistral (again) and has 1.13 billion parameters. And of course, this post will have a s*it ton of reference to HBO’s Silicon Valley series 😁

Background

If you know me, you know that I have a good background in messing around with generative AI models such as Stable Diffusion, GPT-2, LLaMa and Mistral. I even tried to do something with BLOOM (here) before but since the 176B model is too expensive to be put in the mix, I left it behind.

But later, I started my own AI image generation platform called Mann-E, and in previous weeks, my team delivered Maral, which is a 7 billion parameters language model specializing in Persian language.

After observing the world of smaller but more specific language models (should we call them SMBMLMs now?) like Phi, and also after observing the release of TinyLLaMa, I just started a journey to find how can I stay loyal to Mistral models but make them smaller.

You know, since the dawn of time, the mankind tried to make things smaller. Like smaller cars, smaller homes, smaller computers and now smaller AI models!

Basic Research

In my journey, I just wanted to know if someone ever bothered to make a smaller version of Mistral or we have to go through the whole coding procedure ourselves.

Lucky us, I could find Mistral 1B Untrained on HuggingFace and even asked the author a few questions about the model. As you can see, they’re not really okay with the model but I saw the potential. So I decided to keep this model in my arsenal of small models for research.

Then, I searched for datasets, and sparks started in my head about how can I make the damn thing happen!

The name and branding (and probably Silicon Valley references)

The name Nucleus comes from HBO’s Silicon Valley. Which is by far my most favorite shows of all time. If you remember correctly, Hooli CEO, Gavin Belson had something to do to piss Richard off, right? So he made Nucleus. But his Nucleus was bad. I tried to make mine better at least 😁

Since we know it’s time to pay the piper, let’s waste less time and jump right into the technical details of the project.

Pre-Training

Since the model claimed to be untrained we can understand that it only knows what the language is right? Even if now you try to infer the model on HuggingFace or locally, you may get a huge sequence of letters with no meaning at all.

So our first task was to pretrain that. Pretraining the model was quiet easy using a 3090 and spending 40 hours. It was done on the one and only TinyStories dataset.

Actually this dataset is great for pre-training and giving the base models the idea of the language and linguistic structures. It does it pretty well. Although since it only has 2 million rows, you have to expect huge over-fitting which can be easily fixed trough fine tuning the model.

Training on Tiny Textbooks

Well, the whole point of Phi 1 was that textbooks are all you need and since Microsoft doesn’t like to share their dataset with us, we had to perform a huge research on available options.

The very first option coming to my mind was using GPT-4 to generate textbooks but it could be astronomical minding that we are not funded and spending a few thousand dollars on a dataset? no thanks.

So during this research procedure, we discovered Tiny Textbooks dataset. Apparently Nam Pham did a great thing. They crawled the web and made it to textbooks. So kudos to them for letting us use their awesome dataset.

Okay, fine tuning called for another 40 hours of training, and it was fine. After fine-tuning for two epochs and 420k steps each, we’ve got the best results we could get.

Results

On TinyStories, the model really loved telling stories about Lily and there was no surprise for me at least. But on Tiny Textbooks, the model did a great job. Okay, this is just the result when I asked for a pizza recipe:

And as you can see, it’s basically what HuggingFace offers. With a little bit of settings, you easily can get good results out of this baby!

But sadly it still sucks at two things (which are basically what make you even click on an LLM-related link). First is question-answering (or instruction following) which is not surprising and second which made me personally sad is coding since I am a developer and I like a well-made coding assistant.

But in general, it can be competing with other well known models I guess. it all depends on what we train the model on, right?

But it still needs more effort and training, so we are heading to the next section!

License

If you know me from the past, you know I love permissive licenses. So this model is licensed and published under MIT license. You can use it for commercial use without any permission from us.

Further changes and studies

The model does well on English. But what about more languages? My essential mission is to try to make it work with Persian language.
It is good at generation of textbooks and apparently loves food recipes and history lessons. But it needs more. Maybe code textbooks are fine.
The model should be trained on pure code (StableCode style) and also code-instruct style (I haven’t seen models like that or maybe because I am too lazy to not check all the models).
The model should be trained on a well-crafted instruct-following dataset. For me personally, it is OpenOrca. What do you suggest?

Donations are appreciated

If you open our github repository, you will find a few crypto wallets. Well, we appreciate donations to the projects, because we’re still not funded and we’re waiting for investors’ responses.

These donations help us keep the project up, make content about them and spread the word for Free/Libre and Open Source Software or FLOSS!

Conclusion

In the world where people get excited about pretty much every react js app wrapped around OpenAI’s Chat API and call it a new thing, or companies try to reinvent the iPod with the power of ChatGPT, and also make a square shaped iPod Touch, new models are the key to keep our business up.

But you know, if models are still huge and you can’t run them locally, this will call for more and more proprietary stuff where you have no control over the data and you may end up giving up the confidential data of your company to a third party.

Open source small language models or Open SLMs, are the key to have a better world. You easily can run this model on a 2080 (or even less powerful GPU) and you know what it means? Consumer hardware can have access to good AI stuff.

This is where we are headed in 2024, a new year of awesomeness with open models, regardless of their size.

December 28, 2023

Maral is here, 7 billion parameters bilingual model with support of Persian!

If you read my previous post, you know how much I like open source AI material, and I even jokingly titled my BLOOM post I was too cheap to pay for GitHub’s copilot! So making an open source model was always one of my goals of life. Also, in my Persian blog, I pointed out that the dominance of English language in current LLM scene is a little bit concerning (read it here).

Now as of today, I am pleased to announce that Maral is here. The 7 billion parameters bilingual model which can respond to Persian and English prompts, and can produce GPT-3.5 level of answers based on the dataset we fed to it!

Maral 7B alpha 1 and its advantages

Since the release of GPT2 and BERT models, there were efforts for making a Persian text generation model in our community. But to be honest, most of them left untouched in middle of the road.

In last years AI revolution however, people saw potential in the realm of generative AI and started working on models. From RAGs on existing models to fine-tuning basic models which could somehow understand Perso-Arabic alphabet.

But with the release of Mistral model, everything has changed. I personally never thought a 7 billion parameters model can understand multiple languages this well. So I put more information on the next section of the article on why Mistral became my number one choice as the base model!

However, the biggest problem was still there and it was the dataset. Finding a good enough dataset is always a bottleneck. But we’ve been lucky enough that one of Iranian developers, has translated Alpaca Dataset to our beloved Persian language (and it’s accessible here).

When you’re in possession of needed ingredients for your potion, I guess it’s time to light up the caldron and start making the potion!

Why Mistral?

As a developer and an enthusiastic person, I always try new models and tools specially when it comes to text. Mistral was the new kid in the corner and I personally witnessed a lot of positive reviews about it. So I tried these:

Loading and testing model on normal English tasks it was good for.
Testing model on some more complicated task such as reasoning or basic math.
Testing the model on code generation.

All of the above tests passed very well. You probably never expect a middle sized model to perform well on all of the given tasks, but this one was a little different. Although it was a little bit confused in reasoning tasks, I could pass on that (since even GPT-4 has problems with reasoning).

But I always do another tests on these models, because I’m Iranian and I speak Persian/Farsi, and I really like to know how model performs on my language. So these were what I have tested:

Generic Persian text generation, when the model started generating nonsense but it showed me the potential, I had a guess it may have seen some Persian text before.
Asking Persian questions, it tried the best to put words together but at some point, it returned to nonsense or even answered completely in English!
Translation! Believe it or not, it can be a very good measure of accuracy in multilinguality of the model (Okay, I made that term up, stay calm). Although model was successful in English to French and Spanish (with my very limited knowledge), it haven’t performed well on Persian.

Okay, the test showed me the potential. So I had to team up with my colleague and make it happen! Let’s add support for our mother tongue to this model!

Train procedure and infrastructure

Now let’s talk about the fun stuff. First, we saw that we may need a very big and somehow unaffordable (at least for us) infrastructure to train mistral from scratch.

So we performed a big research on the topic and found these methods:

Retrieve-Augment Generation (RAG)
Quantized Low Rand Adoption (QLoRa) and Parameter Efficient Fine Tuning (PEFT)

To be honest RAGs are cool, but they won’t lead to a new model. So we tried QLoRa and PEFT.

The basic training (with extremely inaccurate results) have done on a T4 (Colab’s free tier) and then we’ve decided to go further. So I went after our friends at Jupyto, a company where you can rent GPUs hourly from and based in Iran.

They had great offers for powerful GPUs and we got our hands on a 3090 Ti with 64 GB of RAM. It was a perfect machine for doing the training and we’ve trained the better model on this setup.

The QLoRa training took over 10 hours for 5 epochs (each epoch took more than 100 minutes) and the results were out of this world! It could give us text which is semantically and grammatically correct!

Then, we’ve merged the adapter to the base model to take advantage of the main knowledge of the model as well.

Although, I personally faced a set of problems which I will point out int the next section.

The problems you may face using Maral

Since we’re on our alpha stage, I have to admit you may face these problems while using Maral, specially on Persian language.

The prompt format is based on Guanaco format. So it doesn’t have tokens for start and end of sentences.
The tokenizer is not optimized for Persian letters yet. So it may make it slow on Persian language.
The model is really good at hallucinating.
According to the previous item, it also easily produce misinformation. So please be careful with the answers you get from the model.
The model likes to repeat itself a lot. So If you get a repetitive answer, do not worry.
Model being so large, is a little hard to deploy on consumer hardware. However in the HuggingFace page, we’ve provided 8 bit loading instructions as well.

Furthrer works

Optimizing tokenizer for Perso-Arabic alphabet.
Providing a better dataset.
Add bos_token and eos_token to the tokenizer, specially for instruct following/chat model.
Providing GTPQ, GGUF or GGML models to make it more affordable on consumer hardware.
Making much smaller models (say 1B or 2B) with more focused niche.

You don’t owe money to the brush company if you sell your art

In my previous post, I explained how the future of the content is AI. Also, in an older post, I was talking about how AI generated content can revolutionize the world of interior design/architecture. In this post however, I’m not talking about these topics and I’m going to talk about legal issues and questions about AI generated art, and there will be a twist at the end. Wait for it 😁

AI content creators are concerned about legal stuff

Yes, they are. If they are not, they are making a very very big mistake. When you create any form of content, one of the most important aspects of publishing it is the legal issues.

These legal stuff are usually about the rights of content creators over their content and also the rights of companies who develop the tools for content creation.

In this part of the article, I am talking about what I guess is the important legal topic in this new generation of content creation.

The Ownership

The very first time I posted about my own AI art generator model Voyage in a Telegram chat room, one of my friends asked Who owns the generated art? You? Or us? and I explained since you have to run the generator on your own computer, you are the owner of the generated art and you don’t owe me anything.

By the way, most of them gave me huge credits when they posted their artwork on social media or even on the very same chat room.

But I found out most of those proprietary art generators like Midjourney don’t act like that. They make you pay them if you want to own what is your own. Let me make this a little bit clear.

Imagine you are going to buy a nice set of brushes and colors. You paid for it, right? Now you made a beautiful piece of art with those tools and now you want to sell it. Now imagine the brush company asks for shares! Isn’t it hilarious? of course it is. I believe this must be considered by AI Artists who use these proprietary tools to generate content.

Use by and for minors

another important topic in the new generation of content creation tool is always how minors will use it? and it also concerns me a lot (specially since Stable Diffusion 2.0 has no NSFW filtering). So what should we do for our younger friends? A lot of content creation platforms like YouTube, Pinterest, Instagram, DeviantArt, etc have their own policies and filters for public content distribution.

For example, I’m a big fan of horror movies and when I search about content about them such as reviews, fan arts and even scripts, I usually face the age confirmation pages and modals. Now you can understand where will I go with this topic.

AI is dumb, it cannot understand what it generates and we need a little more human observation on the generated content. For example in Stable Diffusion’s discord, I remember reacting to NSFW content by a certain emoji, could mark it as potentially harmful and then they could improve their NSFW filtering system.

Plagiarism

I guess you thought I don’t give a fine F about copyrights, right? No it’s not true. I believe artists and content creators should be credited well. So let’s talk about another topic which seems very important.

The very first day I started AI content generation, there only was a good free (in any sense of the word free) and it was VQGAN+CLIP. It was a great tool to make art and even today it has a unique quality of art comparing to other tools.

But even those days, I had a huge concern. What if I plagiarize another artist’s work? and this concern was at its highest form when I figured out adding names of well known artists such as Greg Rutkowski, James Gurney, Thomas Kinkade, Salvador Dali and thousands more can alter the result for us! So as both AI generator developers and artists, we should pay attention to this matter as well!

And last but not the least: Fake Art!

One of my most favorite activities is trying new artist names in my prompts. I love to see how their minds would paint what I’m thinking of. But there is a problem, What if I say this is an unreleased painting by a well known artist? and this can lead us to a huge money fraud.

I never could stop thinking about these matters, and as a person who developed a model and generated tons of content with AI, I never want to be classified as a fraud or scammer or even a person who disrupts the work of other artists.

I guess we talked enough about legal issues, let’s get to the big plot twist of this blog!

Big Twist!

The young blonde woman in the picture is beautiful. Isn’t she? I made it using my model Voyage which I introduced earlier in this blog post. You want to use Voyage and create your own art? Fine. You won’t owe me anything if you do. And if you want to use it in Google Colab, here is the link to the notebook!

Voyage is trained on the data crawled from OpenArt and as you can see, it is a model which can work with a very artistic feel comparing to other models which are available.

Conclusion

In this blog post, we discussed about one of the important aspects of AI content creation/generation which is legal stuff. We also have to fight for our rights of ownership as content creators. In my personal opinion, it is okay to ask for money for a service. We pay a lot for infrastructure and computing power as developers or companies but if we make our users pay us shares, I guess it’s not fair.

In the other hand, we need more and more open source tools for AI content creation. Big tech companies are ruling the market in this world as well and it never is good.

I hope this article was useful and if you like more content like this, please consider sharing it with your friends 🙂

November 19, 2022

The future of content is AI

I personally never counted myself as a content creator but apparently, I always have been counted as one. Why you may ask? The answer is easy. I have a habit of filming my work, writing blog posts (mostly in Persian), posting my work and code on twitter and stuff. All of these are behaviors from a content creator.

My content on the other hand were mostly about me, I never cared about making those type of advertisement reports (where you have to care a lot about SEO, back-links and stuff) because it wasn’t my job to create the content. Now, I am thinking about it, but my own way.

The history of content creation

Before going deep about this, let’s clear something. This part of the article is from my own point of view and it’s not a certain history, but at least, this is how I saw content creation and how it works.

One-way content generation

Let’s go back a lot. I mean A LOT! Maybe in 2006, you opened a URL in your Internet Explorer and then find out a very ugly static website written in pure HTML. Some of those websites also had some annoying JS functions (we should be grateful about the modern use of JS, there are no mouse pointer following figures or rain in background anymore!).

This is an example of a one way form of content. The content you can not react to as is. You had to find an email address in Contact Us page, or fill their forms and usually they did not ever viewed the respective inbox. So you couldn’t help them improve their content or right their wrongs.

Here comes the blog

I almost was 12 when I discovered the concept of blogs, and I also started writing in a free blogging service (which is very popular among Iranian community and you can find it here) and it was amazing.

The whole greatness of blogging was that it wasn’t “one-way” and people could interact with each other using comments and at the same time, chatrooms were also pretty popular. So we usually had a good time with our internet pals those days. And you know what does that mean?

User generated content (UGC) matters!

It really does. Imagine you want to get a new hair dryer. So what do you do? I guess you go to amazon and search for hair dryers. A hair dryer is not an object you buy once a week, so you need to know that the hair dryer in question lasts enough or not, how much power does it take and does it meet health guidelines and regulations for a product like that?

You just read the description, specifications and other details provided by the seller on Amazon. It’s good, but not great. You have an idea about the product, but you don’t know how is its user experience. What can we do about this? Easy, we scroll down to the user reviews. Were people rated and described their feelings about the product.

In the reviews section, you find out this product doesn’t last that much, you even may search in other platforms about the very same product and find out what is wrong with the product in question. For me, the second platform is always YouTube. People do a lot of good product reviews on YouTube (even those who got sponsored by the brand we’re looking for, are usually helpful as well!) and guess what? YouTube is also a platform for UGC!

But what, it doesn’t end here. You read this post but you still are confused about the title. I have to say this is where the actual fun begins!

The future of content is AI!

Now this is the part you were waiting for, in this section, I’m going to talk about how AI can help us create better content because recently, I follow the trend of AI art a lot! I also coded and developed some AI Art tools myself! I also was too cheap to get copilot paid membership, and created my own version. See? I officially joined the army of content creators, but in my very own way.

Sentiment Analysis

I guess this one is not really about content creation but more about content moderation. But moderation is as important as creation (if not more) and I had to put it here. Having a sentiment analysis system on our user generated content, can help us find if the product has poor quality or how toxic our community is or something like this.

To be honest it helps us more than it seems. It helps us make a better community (pretty much by banning suspicious users) and also give feedback to our suppliers who sent us products with poor quality. It doesn’t end here by the way, my example is still about a retail store and not a general website.

In the modern day, you have to watch your tongue more than before. A lot of people stood for their rights and the typical words of your daily speech can be offensive to other people. So in this particular case I believe these analytic tools can help us improve even in our personal lives by having a better community.

We talked enough about content moderation using AI, let’s go to the fun and interesting topic of content generation!

The rise of AI art generators

AI art is basically an empire now. AI art generators such as Dall-E 2 and Midjourney (you probably would like to take a look at my open source version of midjourney, OpenJourney, just saying) are very popular and in the other hand, Stable Diffusion (and forks) are really growing in the open source side as well.

You cannot deny the fact that these are pretty cool tools of content creation. These tools can help us bring our ideas to life in forms of art, 3D design, interior design, UI/UX and a lot more. So we have to talk about these, we have to recognize these images as the new content people create and enjoy!

It does not end here as well. There is also a new trend of Text To Music which means a lot of music creators (me included!) may use AI to create music as well. This is the beauty of AI content creation.

And finally, everyone offers AI these days.

Yes, every company which had an even small relation to content creation, offers AI! We expect big names of our industry such as Google or Meta provide tons of AI tools such as libraries, frameworks, models, datasets and even programming languages. But do you know what amazed me recently?

Notion also provides AI solutions for productivity and ideas! You basically can have some sort of copilot for your content calendar or even better (in case of some people worse) an ai companion for task management and I think this is great.

Now we have tools to create text, images, videos and sounds, what should be our next step? I guess we have to read minds (and I’ll write an article about that as soon as possible).

Conclusion

Now let’s conclude (I know, I have this section on every blog post and I don’t put anything useful here). We just found out where we have started the age of digital content creation. Internet had a great role in revolutionizing this age and opened new doors of opportunity for us, people who usually couldn’t get the chance of writing in a magazine or newspaper easily. These days we write on Twitter (at least until we can write without paying Elon Musk for that!) and it needs no privilege. It only requires an internet connection.

So AI can help us improve our content, it can help us write better reviews, it can help us turn a bunch of photographs into a full report. You just input your photos, the image-to-text pipeline starts and extract details of each photo, then you edit them and now you have your reports.

In my opinion, AI is there to help us make the world a better place. Because it provides us an equal chance of being author, artist, musician and anything which required some level of privilege in the past.

AI Agents add action to LLMs

Making an AI Agent without the frameworks is possible!

Python example

The Sample Agent with real action

Conclusion

The Flow

The failed flow

The second failure

The Successful One

The Disappointment

Should we rely on LLMs for 3D mesh generation at all?

What’s next?

AI models for Metaverse

Image Generators

Music and Sound Effects generators

Vision Models

3D Generation Models

What’s next?

A little backstory

What did we need to access the metaverse?

What we need to build on Metaverse?

The role of AI in metaverse

Generative AI for metaverse

What’s next?

My AI background

Privacy and AI

Now what I’m going to do?

Become an early adopter

Conclusion

A little bit of history

What is FrontBricks?

How can I access FrontBricks?

Future Plans

Conclusion

Background

Basic Research

The name and branding (and probably Silicon Valley references)

Pre-Training

Training on Tiny Textbooks

Results

License

Further changes and studies

Links

Donations are appreciated

Conclusion

Maral 7B alpha 1 and its advantages

Why Mistral?

Train procedure and infrastructure

The problems you may face using Maral

Furthrer works

Related links

AI content creators are concerned about legal stuff

The Ownership

Use by and for minors

Plagiarism

And last but not the least: Fake Art!

Big Twist!

Conclusion

The history of content creation

One-way content generation

Here comes the blog

User generated content (UGC) matters!

The future of content is AI!

Sentiment Analysis

The rise of AI art generators

And finally, everyone offers AI these days.

Conclusion