You only need Python to make AI agents.

In 2022, ChatGPT released and LLMs becoming the hot topic of pretty much every technology related press, event, YouTube video, etc. It was like finding the secret ingredient to a potion which can make you immortal.

But Meta didn’t let OpenAI becoming the one and only. They also started the game by releasing their well-named model Large Language Model Meta AI or LLaMA which we all know and love. Not only Meta, but our friends at Mistral AI weren’t idle and they also released a good bunch of open source models and the result of their work even motivated me in making of my Persian LLM, Maral.

But nowadays, good LLM is not a big problem. With a quick search on the internet, we easily can find good LLMs. Base models and fine-tunes which are made for generic or specific purposes, models which are armed with reasoning, models which are made for programmers, etc.

We have the text output, now we need action. This is what I’m going to discuss in this particular post and I also will love to hear back from you as well.

AI Agents add action to LLMs

Well, I remember when the make-shift Android rip-off of iPod touch or simply Rabbit R1 was introduced, they just advertised the device to work on a Large Action Model or LAM. I always was thinking about how can we modify one of the open LLMs to have action? Then I got the answer.

The simplest thing we can think of is an LLM tuned on JSON input for different API’s with different tones. It is what I believe function calling or tool calling is. But it still has the downside.

Imagine I train LLaMA 3.2 on API’s from AirBnB, Shopify, Amazon, Uber and Spotify. What will happen if you ask for a YouTube video? You even won’t get rick-rolled and it won’t be a good sign for products such as Rabbit R1 (or any other competitors).

Then I got familiar with Crew AI which is a framework for making agents. But honestly, I never understood these AI frameworks. Most of them are making the process of making a simple application over complicated. But thanks to Crew AI, I finally could understand what an AI agent is.

An AI agent, adds actions in a human understandable way to LLMs. Like when you ask ChatGPT to create a picture, it calls an API running Dall-E and then gives you the image. This is what an agent is…! (at least until it’s not called Smith).

Making an AI Agent without the frameworks is possible!

Well, it is possible. You only need Python and probably OpenAI’s library to make an agent. First of all let’s see what an agent does. An agent simply gets a prompt from you. Something like Send an email to John Doe and explain why I will be late tomorrow. The AI model has to understand some steps here.

First, it has to call a function to search your contact list and find John Doe then it has to generate a text explaining why you will be late. Then the last part is to send the email over an email server (which can be a private mail server or a provider like Google’s Gmail).

Also, you can make it one step more difficult for your own agent and ask it to do these in the GUI (basically you need to use a Vision model for this task).

Let’s make it happen in Python. It will be easy and you will understand it better.

Python example

Disclaimer: Since I have a full working code example on github, this part of the blog will be just a simple example.

First step is to find an LLM. I personally think any provider with an OpenAI compatible API works perfectly and for this particular project, I’m using my own LLM which is known as Jabir Project.

Jabir Project is a finetune on LLaMA 3.1 405B and proven itself in many different tasks. If you don’t want to use Jabir LLMs, it’s fine. You may prefer OpenAI, DeepInfra or OpenRouter. Also you may want to go local, so why not using Ollama?

Well, assuming you want to use Jabir’s API, you need to set up an OpenAI client like this:

from openai import OpenAI

client = OpenAI(api_key="FAKE", base_url="https://openai.jabirpoject.org/v1")

This is as easy as typing one line of code! You may be wondering why I used “FAKE” as the API key? It was when I tried to add Ollama’s API to my code and I understood that OpenAI library requires a value for the API key.

Then, we need to set up a simple agent class:

class Agent:
    
    def __init__(self, system=""):
        self.system = system
        self.messages = []
        if self.system:
            self.messages.append({"role" : "system", "content" : system})
    
    def __call__(self, message):
        self.messages.append({"role" : "user", "content" : message})
        result = self.execute()
        self.messages.append({"role" : "assistant", "content" : result})
        return result
    
    def execute(self):
        completion = client.chat.completions.create(
            model = "jabir-400b",
            messages = self.messages,
            temperature = 0.0
        )
        
        return completion.choices[0].message.content

This agent class is what that matters a lot. Since it has a memory of what happened.

You can run the agent like this:

sample_agent = Agent("You are a helpful assistant")
print(sample_agent("What is 1+1?"))

Now the main question is that how can we add actions to this agent?

The Sample Agent with real action

As I was working on a way to make agents with no frameworks, I came up with the idea of making each action a python function and then ask the AI to generate something for me which can be later parsed into inputs for those.

I made it in form of a jupyter notebook and it is available through my Github account. You can write agents like this and be completely framework-independent.

Conclusion

Almost three years ago I made a blog post here called I was too cheap to pay $10 a month for Github’s copilot so I made my own and it was a good start of my journey to generative AI. Although I abandoned text generation for a somehow long time and started Mann-E, I got back to the world of NLP with Maral models.

And Maral got abandoned because my personal life was getting a little rough and then I decided to start a personalization platform called Atelier AI. Which lets you create your own LoRAs for Mann-E models.

But when I restarted the Jabir Project, I thought an LLM is not enough. This model should be the foundation of something bigger. This is why I did a lot of research on AI agents, and now I completely am aware of what I’m going to do.

I love to hear back from readers of my blog about what possible ideas we can implement using LLMs and agents, so I politely ask all of you participate in the discussion and let’s build the future together.

November 27, 2024

Let’s build Metaverse with AI: Building asset generator

Look at this:

How do you think this apple has been made? Excellent question. After the previous post, I said we should put LLMs out of the picture for now. Also we needed to talk about 3D, because it is important in whole metaverse space, right? Today I just did it. I trained a LoRA on FLUX and then tried to make 3D objects from what an AI model is capable of generating.

The Image Generator

In this part, I specifically talk about the image generation procedure. It will be a good experience sharing procedure and the open source models created in this process will be linked in the topic as well.

For making an image generator model, we need a base model. Since the whole Generative Metaverse project for me was a fun project and not a serious commercial one, I chose FLUX. However, if I try to go to the blockchain/crypto side of things (probably on TON network) I may consider SDXL as base in order to have no problems in terms of commercial use.

Anyway, everything here is pretty standard. Pretty much every step I took in order to make early versions of Mann-E. So I guess it will be worth sharing one more time, right?

The Dataset

AI models are just a bunch of boring mathematical functions and they become amazing when they are fed with good data. So we needed to create a dataset. As always, the best data generator I could use was Midjourney and of course, I headed over to their website and recharged my account.

I played with a good bunch of prompt combinations to find what is the best one fitting what I have in mind. So after tweaking a lot, I got this: <subject>, lowpoly, 3d illustration, dark background, isometric camera angle.

Here is a sample of what generated with this prompt formula:

After that, I used ChatGPT in order to generate a list of objects we may use or see everyday. After that, I made a prompt list and automated the image generation procedure and got around 800 pictures. Now it was time for training!

The training

First, I was thinking about using Replicate or fal.ai in order to train the LoRA. Honestly they provide easy and affordable ways of training LoRA on FLUX (and to my knowledge, you also may be able to have SD 1.5 and SDXL LoRA’s trained on replicate) but there is one big problem.

These websites are usually not suitable for large scale training or if they offer large scale training systems, you should negotiate with them and as I said, this is a fun project. Not a big OpenAI scale commercial product!

So I was looking for another way. As you may know, Google Colab’s free tier subscription is also no good for FLUX training. So I used AI Toolkit template on RunPod in order to train the said LoRA. I used an 80GB A100 and it took around 3 hours on 100 pictures.

The files

If you’re interested in the dataset, I uploaded the whole dataset and pictures here. You can see there is a folder called minimized images which is 100 hand picked images from the original dataset.

And if you’re looking for the LoRA, you can download and even test it here.

The 3D Generation

Well, after making the image generator, we needed a way of turning single images to 3D files and of course the 3D format must be something acceptable for all devices.

OBJ and FBX are great formats when it comes to game development (specially if you’re using Unity game engine) but for WebGL and WebXR, gLTF or GLB formats are usually preferred.

The best option for this, is fal.ai’s TripoSR API. You upload your image, the model is being called and BOOM you have a GLB file which can be used on every WebGL or WebXR project you can think of.

What’s next?

Since I personally am working on another project with Mann-E’s proprietary models, I may stop this particular project right here. I almost did everything I had in mind.

Although we still have the important topic of world generation using AI, but I guess it needs a more in depth study and will not be this easy at all. Also the commercializing process of the whole thing is also a topic of thought and for now, I just want to keep the project fun.

Maybe in a few weeks, I return with a more commercial approach and also some ideas about the whole blockchain or crypto space.

November 1, 2024

Privacy-focused AI is all we need

I remember in 2020 and 2021, due to Elon Musk’s interest in crypto and also The Metaverse Hype people, specially the ones who had no idea about crypto or blockchain, started investing in the crypto markets. Although it seemed a little bit of a failure, people made profit out of it.

It is not the case, what I’m going to talk about here is that we need crypto as a form of secure payment for AI services and platforms. I guess I will do a little bit of over explanation in this video, but I promise it won’t be that much of over explanation.

My AI background

It was in March 2023 when I founded Mann-E platform, an AI image generation platform letting people make images from their ideas. Just like good old midjourney. We developed our own models, we did bootstrapping and made a community of early adopters.

I personally tried to get in touch with different AI companies, develop different models, make different products. Everything in Generative AI space, has a special place in my heart.

But in the other hand, I also have a background of FLOSS (Free/Libre and Open Source Software) activism. Something felt off for me, while working on all these AI products.

Privacy and AI

Being honest with you, pretty much non of major AI platforms (OpenAI, Anthropic, Midjourney, etc.) are private. They all collect the data, they use it to improve their models, and in return, they give you basically nothing but fancy images or LLMs which are terrible at making a dad joke.

The platform we need is a platform with these details or characteristics:

Sign up/Sign in as normal
No email verification (in order to make it possible for people who are using weird mail servers or fake email addresses)
Crypto only payments.

So now you may ask isn’t it alienating people who are paying in fiat? Well I have to say a lot of platforms alienated people from different corners of the world where they have no access to paypal or any other payment services. So I guess it won’t be a big deal!

In the other side, there are enough platforms accepting fiat currency. If you want to pay in fiat currencies, there are tens of thousands of options in front of you. But what happens when you want to pay in crypto? You will face a whole lot of nothing.

Now what I’m going to do?

Well, more than a year ago, in an event, I was talking about how OpenAI, Midjourney, Meta, Microsoft and NVIDIA are in a way of becoming the big blue of AI industry. But thinking to myself, my approach wasn’t really different from those guys as well.

Now, I decided to make a new platform, which is absolutely privacy focused, not recording prompt, not making you confirm your email and do all the payments in crypto (BTC, ETH and TRX are for the start seem good).

Become an early adopter

As always, I need people to become early adopters. So I made this Google Form (link) to ask you become a part of this project (for this one, please provide a real email address 😂). Also, you can support this project and accelerate the process of making it.

Conclusion

The project currently has no name, so I’d be happy to hear your suggestions. Naming aside, I personally think this concepts becomes more popular in the following years. Specially with the growth of Telegram airdrops and meme coins, crypto will have a new life.

I guess it is the time we have to act and make crypto a great payment tool for modern technology!

May 31, 2024

FrontBricks, my LLM-based weekend project which is inspired by Vercel’s V0

Since 2022, there is a hype of generative artificial intelligence and it resulted in a bunch of cool projects. Although a lot of us may remember that Github’s copilot was much older. Those days, I wrote an article about how I was too cheap to pay $10 a month for copilot, so I made my own!

That was somehow the beginning of my interest in AI field. I spent around four years in this field and like most of us, I tried to utilize different tools and products. In this article, I’m talking about FrontBricks which is my newest product and how it started as a weekend project!

A little bit of history

In 2023, I launched Mann-E which is an AI image generator based on its own models (and more information is provided in the website). A few months ago, I also launched Maral, which is a 7 billion parameter LLM specialized for the Persian language (the language I speak).

Also, around a month ago, I did some tests with brand new LLMs such as LLaMa 3, in order to make Mann-E Search which can be somehow an alternative to Perplexity but with a little difference (it doesn’t provide a chat interface).

I guess this can clarify how I am drowned in AI space and how much I love generative AI! Now we can talk about FrontBricks!

What is FrontBricks?

You may be familiar with Vercel’s V0 which is a generative AI tool helping people generate frontend components. I liked their idea, and I joined their waitlist and a couple days later, I got access to the platform.

It was a cool experience, and some sparks formed in my head. I found out that pretty much all LLMs are really good at the task of code generation, and we can utilize one to generate the code and use another one in order to find out if the code is valid or not.

This was my whole idea so I sat at my desk and started to code a basic tool to send my prompts to OpenAI’s API in order to generate and then another one to do the validation using LLaMa 3 70B and GPT-4 as well (I used OpenAI again).

I also found another bottleneck, which was JSX code generation. I did a little bit of research and I found that is not really a big deal and using the power of Regex and text manipulation, it’s easily possible to turn pure HTML to JSX!

I wrote pretty much everything, so I just switched to my work environment, created a simple rails app and then connected it to my backend module. Now, I have a platform which can be an alternative to Vercel’s V0!

Today, I am just announcing frontbricks, but I have to say before this post around 211 people gave me their email addresses to put them in the list of early adopters and I gave them access to the platform earlier this week!

My birthday (May 30th) was in this week, so I guess it can also be a bit of surprise for my friends and the community.

How can I access FrontBricks?

Well, it is easy. You just need to go to frontbricks.com and create an account (sign up link). Then you just need to confirm your email and boom, you have unlimited access to FrontBricks, completely free of charge!

You can generate a component, then improve it and every time you felt you need a new component, you easily can choose to create a new code snippet. It is as easy as drinking a cup of tea.

Future Plans

Since this project isn’t monetized yet, the very first thing coming to my mind is a way to monetize it (you still can donate in crypto through this link). A good business model can help this project be much better.

I also am thinking of releasing an open source model based on the data provided on FrontBricks, because one of the reasons I coded this project is just that I couldn’t find a model specialized for front-end generation!

These are my concerns for now. If you have any other ideas, I’m open to here.

Conclusion

I have a haystack of ideas in my mind, and if I find enough time, I implement them. Mann-E and FrontBricks are just two of projects I just made and to be honest, Mann-E with around 6000 users and more than 50,000 generated images, is somehow one my most successful projects.

FrontBricks has potential, but I guess I can’t keep it up alone. I’m open to technical and business ideas as well. So if you have any ideas in mind, feel free to send me a message, my email is haghiri75@gmail.com 😁

April 8, 2023

Re-creating Midjourney with only $10 – Technical Report for Mann-E 5 development

The year 2022 was an amazing year for generative AI market and no one can deny in this year, release of some cool models such as Midjourney, Stable Diffusion and ChatGPT made this market bigger, better and more competitive. You may also know Mann-E, the model I have developed on top of Runway ML’s Stable Diffusion 1.5 using Dream Booth. In this particular article, I provide you with a report for the development procedure of Mann-E 5, which will be accessible at April 14th 2023 on Mann-E Platform.

Introduction

The Intention

The main intention of the Mann-E at first place was a personal discovery of AI Art and text-to-image models, but later I found the business/commercial opportunities and since I also am an open-source enthusiast, the main intention changed to providing an easy and accessible open-source alternative to midjourney.

Since Midjourney is only accessible through Discord, it’s expensive (compared to most of other image generation models) and there is also a huge problem for Iranian users to use the basic or standard plans, the idea of a platform for art generation.

The method

For this particular version, I used self-instruct method which was used for Stanford’s Alpaca dataset and model. The tools used for this project were as following:

ChatGPT
Midjourney
Dream Booth

The Procedure

Using Midjourney

The main idea of using midjourney generated images in the fine-tuning process sparked in my mind from PromptHero’s Openjourney project. They used Dream Booth and data from Midjourney version 4.0 at first, then they did the train on more than 100K images on their own infrastructure.

So, Midjourney became a good source of data, because you probably won’t face any intellectual property or copyright issues in the process of using images created by their algorithm (the full explanation is available in my previous post).

ChatGPT as a prompt engineer

I’ve seen people create great prompt for Midjourney using ChatGPT. As a large language model, both ChatGPT and GPT-3 (and GPT-4) can be great choices for creating prompts. So I’ve chosen ChatGPT since it had a free interface and also more affordable API’s.

P.S: There are also different models which we can use in order to generate prompts, but they may need extra setup. They’ll be explained in future researches.

Dream Booth

The most affordable way of creating your own text to image model is Dream Booth. It can be executed on a free Colab Notebook and there are also tons of tools available for doing the job.

For development of Mann-E 5, I used the code from this repository. Although some modifications were needed, but the whole code is perfect.

Development of Mann-E

Getting needed tools for development

First, I had to open an OpenAI account. Which I had done before. I needed to use ChatGPT in order to get prompts.

Then, I got a basic license for midjourney, which gives you ~4 hours of fast generation and you can generate images without any difficulties. Of course I’ve done a few test (around a hundred images) before getting to work.

The last thing was writing a piece of python code to crop my images to 512×512 (midjourney upscaled outputs are 1024×1024) and also save the respective prompts in a text file.

Then, I used fast Dream Booth code in order to fine-tune Stable Diffusion version 2.1 on the dataset I just gathered.

Using ChatGPT as a prompt engineer

Currently everyone is using ChatGPT in their advantage, since it was the most revolutionary LLM of 2022 and probably 2023. For the development process of Mann-E 5, I’ve used GPT-3.5 Turbo (which is the free one) and it generated pretty good prompts.

This is the example of message for prompt formula:

Here is a Midjourney prompt formula:

raw photo, [the subject], [mood], [time of the day], [art style], [camera], [lens]

The “mood” can be realistic, cyberpunk, steampunk or futurist.

Do you understand?

As you can see, I just gave it the way of my prompting. Then it responded:

Yes, I understand the Midjourney prompt formula you have provided. It includes a raw photo as the starting point, followed by information about the subject, mood, time of day, art style, camera, and lens. The mood can be one of four options: realistic, cyberpunk, steampunk, or futurist.

The response shows it’s now ready to be asked for new prompts. Then I asked it for 5 to 10 prompts per idea. It means I could have 20 to 40 prompts per idea so I was set for Midjourney image generation.

Here is how I asked it about prompts:

Give me five prompts for “ruins of a roman temple”

and here is how it gave me the set of prompts (it’s trimmed for this article).

A striking black and white image of the ruins of a Roman temple, with dramatic shadows and highlights emphasizing the structure’s grandeur and decay, shot at night with a modern digital camera and a wide-angle lens.

If you spend time on Midjourney prompting, you will notice it’s a pretty good prompt, even if it doesn’t follow the formula very well.

Generating images using midjourney

This was the easy part. The whole process was feeding ChatGPT generated prompts to Midjourney and upscale and download the images.

The result was 464 images with different prompts which included different moods, styles and genres.

Pre-processing the dataset

Since Stable Diffusion only accepts 512×512 or 768×768 images as the input data, I had to write a simple python code to do the resizing using opencv.

Also there was an excel file including image file names and prompts used for image. I had to add a function to turn each prompt to a text file with the same name as the image files.

Training Stable Diffusion using Dream Booth

Unlike Mann-E 4, Mann-E 5 is based on Stable Diffusion version 2.1 (512px version). The training was done in two different steps.

In the first steps, it was 5440 steps of Dream Booth training (which is calculated by (number of images * 10) + 800 formula) and 928 steps on the text encoder to understand the trigger words.

In the second steps, the resulting checkpoints and weights of the first steps were tuned on 10880 steps (twice the first one) and 928 text-encoder steps to get the resulting images closer to the dataset.

It took total of 4 hours of training on a T4 shared GPU on Google Colab. Of course upgrading the colab plan to pro or pro+ can be beneficial in order to get better GPU’s and better training time.

The Results

Further Study and Research

The new model still has problems in photo-realistic images, but does a great job on illustration and concept art. So for now, it can be considered an artistic model. In the future, the other side also most be fixed.

The next thing is trying to tune the base model (whether Stable Diffusion version 2.1 or Mann-E checkpoints) on a larger dataset with more diverse images in order to get it closer to Midjourney.

Conclusion

Using pre-trained and available AI models such as ChatGPT not only elevate people’s lives, but also helps even AI engineers and developers to have more concern free data for their projects and products.

Also using Midjourney as a tool for creating Royalty Free images is a wise choice specially when you try to create a brand new text to image AI model.

In conclusion, I can say I’ve got much better results this time, because I utilized both ChatGPT and Midjourney for my needs. The checkpoints for Mann-E 5 will be available at HuggingFace on Friday, April 14th, 2023 at the same time of the public release of Mann-E platform.

June 24, 2021June 24, 2021

A to Z of making an intelligent voice assistant

It was 2011, a sad year for a lot of apple fans (me included) because Steve Jobs, one of original co-founders of Apple Computers died October that year. Also, it could become sadder if there was no iPhone 4S and its features that year.

A few years prior to the first introduction of Siri (which introduced with iPhone 4S), a movie called Iron Man came out from Marvel Studios. Unlike comic books, Jarvis wasn’t an old man in this movie. Jarvis was an A.I. I’m not sure if the movie inspired companies to add the voice assistant to their systems or not, but I’m sure a lot of people just bought those phones or tablets to have their own version of Jarvis!

Long story short, a lot of engineers like me, were under the influence of the MCU (Marvel’s cinematic universe) and Apple and wanted to have their voice assistant a little bit differently! Instead of buying an iPhone 4S, we preferred to start making our own voice assistants.

In this article, I’m discussing the basics you need to learn for making your very own version of Siri. I warn you here, there wil be no codes at least in this one!

How does a voice assistant work?

In order to make something, we first need to learn how on earth that thing works! So, let’s discuss about voice assistants and how they work. They’re much simpler than what you think. It’s guaranteed your mind will be blown by their simplicity!

Listening: a voice assistant, as called, needs to listen to the voices and detects what is a decent human voice. For this, we need speech recognition systems. These systems will be discussed further. We just can make one, or we can use one that’s already made.
Understanding: In the 2015 movie Avengers: Age of Ultron, Tony Stark (a.k.a Iron Man) says “Jarvis is only a natural language understanding matrix” not considering the matrix part, other part of this sentence makes sense to me. Voice assistants need to understand what we tell them. They can have A.I or hard coded answers or a little bit of both.
Responding: after processing what we’ve said, the voice assistant needs to provide the responses that fit our request. For example, you say “Hey Alexa, play music” and your Alexa device will ask you for the title, you say “Back in Black” and she’ll play the song from spotify or youtube music.

Now, we know about the functionality. What about the implementation? It’s a whole other story. The rest of the article, is more about the technical side of making an intelligent chatbot…

Implementation of a Voice Assistant

Speech Recognition

Before we start to make our voice assistant, we have to make sure it can hear. So we need to implement a simple speech recognition system.

Although it’s not really hard to implement a speech recognition system, I personally prefer to go with something which is already made, like Python’s speech recognition library (link). This library sends the audio signal directly to IBM, Microsoft or Google API’s and shows us the transcription of our talk.

In the other hand, we can make our own system with a dataset, which has tons of voices and their transcriptions. But as you may know, you need to make your data diverse af. Why? Let me explain it a little bit better.

When you have your own voice only, your dataset doesn’t have the decent diversity. If you add your girlfriend, sister, brother, co-workers, etc. You still have no diversity. The result may be decent, but it only limits itself to your own voice, or the voices of your family members and friends!

The second problem is that your very own speech recognition, can’t understand that much. Because your words and sentences might be limited to the movie dialogues or books you like. We need the diversity to be everywhere in our dataset.

Is there any solution to this problem? Yes. You can use something like Mozilla’s dataset (link) for your desired language and make a speech recognition system. These data provided by the people around the world and it’s as diverse as possible.

Natural Language Understanding

As I told you, a voice assistant should process what we tell her. The best way of processing is artificial intelligence but we also can do a hard coded proof-of-concept as well.

What does that mean? hard coding in programming means when we want some certain input to have a fixed output, we don’t rely on our logic for that answer, but we just write code like if the input is this, give the user that, with no regard of the logic. In this case, the logic can be A.I, but we tell the machine if user said Hi, you simply say Hi!

But in the real world applications we can’t just go with the A.I. or hard coded functions. A real voice assistant is usually a combination of both. How? When you ask your voice assistant for the price of bitcoin, it’s a hard coded function.

But when you just talk to your voice assistant she’ll may make some answers to you, which may have a human feel and that’s when A.I. comes in.

Responding

Although providing responses can be considered a part of the understanding process, I prefer to talk about the whole thing in a separate section.

A response is usually what the A.I. will tell us, and the question is how that A.I. knows what we mean? and this is an excellent question. Designing the intelligent part of the voice assistant or in general chatbots, is the trickiest part.

The main backbone of responses, is your intention. What is your chatbot for? Is it a college professor assistant or it’s just something that will give you a Stark feeling? Is it designed to flirt with lonely people or it’s designed to help the elderly? There are tons of questions you have to answer before designing your own assistant.

After you asked you those questions, you need to classify what people would say to your bot under different categories. These categories are called intents. Let me explain by example.

You go to a Cafe, the waiter gives you the menu and you see the menu, right? Your intention is now clear. You want some coffee. So, how you ask about coffee? I will say Sir, a cup of espresso please. And that’s this simple. In order to answer all coffee related questions, we need to consider different states, as much as possible. What if customer asks for Macchiato? What if they ask for Mocha? What if they ask for a cookie with their coffee? and this is where A.I. can help.

A.I. is nothing other than making predictions using math. A long time ago, I used to write the whole A.I. logic myself. But later a YouTuber called NeuralNine developed a library called neural intents and it’s for this purpose! How does this library work?

It’s simple. We give the library a bunch of questions and our desired answers. The model we train, can classify questions and then simply predict what category our sayings belong to. Let me show you the example.

When you say a cup of espresso please, the A.I. sees words cup and espresso. What happens then? she’ll know these words belong to the coffee category, so she’ll give you one of those fixed answers from that category.

Keeping answers fixed by the way, is not always a good thing. For some reasons, we may need to make a generative chatbot which also can make responses like a human. Those bots are more complex and require more resources, studies and time.

Final Thoughts

The world of programming is beautiful and vast. When it comes to A.I. it becomes more fun of course. In this article, I tried to explain how a voice assistant can be constructed but I actually didn’t dig deep to the implementation.

Why so? I guess implementation is good, but in most cases, like every other aspect of programming, it’s just putting together some tools. So learning the concept, is much more important in most cases, like this.

I hope the article was useful for you. If it is, please share it with your friends and leave a comment for me. I’d be super thankful.