In 2022, ChatGPT released and LLMs becoming the hot topic of pretty much every technology related press, event, YouTube video, etc. It was like finding the secret ingredient to a potion which can make you immortal.
But Meta didn’t let OpenAI becoming the one and only. They also started the game by releasing their well-named model Large Language Model Meta AI or LLaMA which we all know and love. Not only Meta, but our friends at Mistral AI weren’t idle and they also released a good bunch of open source models and the result of their work even motivated me in making of my Persian LLM, Maral.
But nowadays, good LLM is not a big problem. With a quick search on the internet, we easily can find good LLMs. Base models and fine-tunes which are made for generic or specific purposes, models which are armed with reasoning, models which are made for programmers, etc.
We have the text output, now we need action. This is what I’m going to discuss in this particular post and I also will love to hear back from you as well.
AI Agents add action to LLMs
Well, I remember when the make-shift Android rip-off of iPod touch or simply Rabbit R1 was introduced, they just advertised the device to work on a Large Action Model or LAM. I always was thinking about how can we modify one of the open LLMs to have action? Then I got the answer.
The simplest thing we can think of is an LLM tuned on JSON input for different API’s with different tones. It is what I believe function calling or tool calling is. But it still has the downside.
Imagine I train LLaMA 3.2 on API’s from AirBnB, Shopify, Amazon, Uber and Spotify. What will happen if you ask for a YouTube video? You even won’t get rick-rolled and it won’t be a good sign for products such as Rabbit R1 (or any other competitors).
Then I got familiar with Crew AI which is a framework for making agents. But honestly, I never understood these AI frameworks. Most of them are making the process of making a simple application over complicated. But thanks to Crew AI, I finally could understand what an AI agent is.
An AI agent, adds actions in a human understandable way to LLMs. Like when you ask ChatGPT to create a picture, it calls an API running Dall-E and then gives you the image. This is what an agent is…! (at least until it’s not called Smith).
Making an AI Agent without the frameworks is possible!
Well, it is possible. You only need Python and probably OpenAI’s library to make an agent. First of all let’s see what an agent does. An agent simply gets a prompt from you. Something like Send an email to John Doe and explain why I will be late tomorrow. The AI model has to understand some steps here.
First, it has to call a function to search your contact list and find John Doe then it has to generate a text explaining why you will be late. Then the last part is to send the email over an email server (which can be a private mail server or a provider like Google’s Gmail).
Also, you can make it one step more difficult for your own agent and ask it to do these in the GUI (basically you need to use a Vision model for this task).
Let’s make it happen in Python. It will be easy and you will understand it better.
Python example
Disclaimer: Since I have a full working code example on github, this part of the blog will be just a simple example.
First step is to find an LLM. I personally think any provider with an OpenAI compatible API works perfectly and for this particular project, I’m using my own LLM which is known as Jabir Project.
Jabir Project is a finetune on LLaMA 3.1 405B and proven itself in many different tasks. If you don’t want to use Jabir LLMs, it’s fine. You may prefer OpenAI, DeepInfra or OpenRouter. Also you may want to go local, so why not using Ollama?
Well, assuming you want to use Jabir’s API, you need to set up an OpenAI client like this:
from openai import OpenAI client = OpenAI(api_key="FAKE", base_url="https://openai.jabirpoject.org/v1")
This is as easy as typing one line of code! You may be wondering why I used “FAKE” as the API key? It was when I tried to add Ollama’s API to my code and I understood that OpenAI library requires a value for the API key.
Then, we need to set up a simple agent class:
class Agent: def __init__(self, system=""): self.system = system self.messages = [] if self.system: self.messages.append({"role" : "system", "content" : system}) def __call__(self, message): self.messages.append({"role" : "user", "content" : message}) result = self.execute() self.messages.append({"role" : "assistant", "content" : result}) return result def execute(self): completion = client.chat.completions.create( model = "jabir-400b", messages = self.messages, temperature = 0.0 ) return completion.choices[0].message.content
This agent class is what that matters a lot. Since it has a memory of what happened.
You can run the agent like this:
sample_agent = Agent("You are a helpful assistant") print(sample_agent("What is 1+1?"))
Now the main question is that how can we add actions to this agent?
The Sample Agent with real action
As I was working on a way to make agents with no frameworks, I came up with the idea of making each action a python function and then ask the AI to generate something for me which can be later parsed into inputs for those.
I made it in form of a jupyter notebook and it is available through my Github account. You can write agents like this and be completely framework-independent.
Conclusion
Almost three years ago I made a blog post here called I was too cheap to pay $10 a month for Github’s copilot so I made my own and it was a good start of my journey to generative AI. Although I abandoned text generation for a somehow long time and started Mann-E, I got back to the world of NLP with Maral models.
And Maral got abandoned because my personal life was getting a little rough and then I decided to start a personalization platform called Atelier AI. Which lets you create your own LoRAs for Mann-E models.
But when I restarted the Jabir Project, I thought an LLM is not enough. This model should be the foundation of something bigger. This is why I did a lot of research on AI agents, and now I completely am aware of what I’m going to do.
I love to hear back from readers of my blog about what possible ideas we can implement using LLMs and agents, so I politely ask all of you participate in the discussion and let’s build the future together.