Why I love YOLOv5?

I am a big fan of Nicholas Renotte’s channel on YouTube. I also love computer vision and its combination with deep learning. A few months ago, Nicholas posted this video, which is about YOLOv5. I usually am too lazy to watch videos which are longer than 15 minutes and I watch them in a few episodes. But this video made me sit behind the laptop screen for over an hour and I’m sure I won’t regret it.

So let’s start the article and see where this story begins. As I mentioned earlier, I love computer vision specially when it’s combined with deep learning. I believe it can help us solve very complex problems of our projects with ease. My journey in world of these YOLO models have started almost a year ago, when I wanted to develop a simple object detection for detecting street signs.

Firstly, I found a lot of tutorials on darknet based training but l did not manage to get it to the work, specially since I have a mac, it could be a very realistic nightmare. So I guess YOLOv5 was a miracle. In this article, I am going to explain why I love YOLOv5 and why I prefer it to other YOLO versions.

What is YOLOv5?

According to their github repository, YOLOv5 is a family of deep learning models which is essentially trained on Microsoft’s COCO dataset. This makes it a very very general-purpose object detection tool which is fine for basic research and fun projects.

But I also needed to have my own models because I wanted to develop some domain-specific object detection software. So I realized they also provide a python script which helps you fine-tune and train your own version of YOLOv5.

So I basically fell in love with this new thing I have discovered. In the next sections, I will explain why I love YOLOv5!

Why I love YOLOv5?

Firstly, I invite you to see this chart, which shows the comparison of YOLOv5 with other commonly used object detection models:

And since there’s been a controversy about YOLOv5 claims about training time, inference time, model storage size, etc. I highly recommend you read this article on Roboflow’s blog.

So we can conclude the very first thing which made me happy is the speed and that’s right. The second thing by the way is the fact I am lazy. Yes, I am lazy and I know it.

I always tried to compile darknet and use it for having a YOLOv4 model and make my projects on top of YOLOv4 but when I saw how hard it can get and since I have a mac and I didn’t really want to fire-up an old computer for these projects, I was looking for something which does everything with a bunch of python scripts.

Since I discovered the YOLOv5, I started working with it and the very first project I have done was this pedestrian detection for a self-driving car.

Then, I started doing a lot of research and asking about what I can do with YOLOv5. I find out I can do pretty much anything I want with ease as they provided a lot of stuff themselves. Isn’t that good enough? Fine. Let me show you another youtube video of mine which I solved my crop problem with their internal functions.

If you’re not convinced yet, I have to tell you there is a great method which is called pandas in this family of models.

As the name tells us, it really outputs a pandas dataframe which you can easily use data from that dataframe. Let me set a better example for you. Considering we want to find out which plants are afflicted and which ones are not in a drone footage.

By using this method, we can simply make an algorithm which counts the amount of afflicted ones in a single frame, so we can easily find out how many afflicted plants we have in a certain area. The whole point here is that we have statistically right data for most of our researches.

The other example would be the same as my pedestrian detection system. We can command the car to get data first from the cameras to make sure we’re dealing with pedestrians and second get data from distance measurement system (which can be an Ultrasonic or LiDAR) to make sure when it should send braking command.

Conclusion

Let’s make a conclusion on the whole article. I love YOLOv5 because it made life easier for me, as a computer vision enthusiast. It provided the tools I wanted and honestly, I am really thankful to Ultralytics for this great opportunity they have provided for us.

In general I always prefer easy-to-use tools and YOLOv5 was this for me. I need to focus on the goal I have instead of making a whole object detection algorithm or model from scratch.

I finally can conclude that having a fast, easy-to-use and all-python tool for object detection was what I was always seeking and YOLOv5 was my answer.

I am glad to have you as a reader on my blog and I have to say thank you for the time you’ve spent on my blog reading this article. Stay safe!

A to Z of making an intelligent voice assistant

It was 2011, a sad year for a lot of apple fans (me included) because Steve Jobs, one of original co-founders of Apple Computers died October that year. Also, it could become sadder if there was no iPhone 4S and its features that year.

A few years prior to the first introduction of Siri (which introduced with iPhone 4S), a movie called Iron Man came out from Marvel Studios. Unlike comic books, Jarvis wasn’t an old man in this movie. Jarvis was an A.I. I’m not sure if the movie inspired companies to add the voice assistant to their systems or not, but I’m sure a lot of people just bought those phones or tablets to have their own version of Jarvis!

Long story short, a lot of engineers like me, were under the influence of the MCU (Marvel’s cinematic universe) and Apple and wanted to have their voice assistant a little bit differently! Instead of buying an iPhone 4S, we preferred to start making our own voice assistants.

In this article, I’m discussing the basics you need to learn for making your very own version of Siri. I warn you here, there wil be no codes at least in this one!

How does a voice assistant work?

In order to make something, we first need to learn how on earth that thing works! So, let’s discuss about voice assistants and how they work. They’re much simpler than what you think. It’s guaranteed your mind will be blown by their simplicity!

  • Listening: a voice assistant, as called, needs to listen to the voices and detects what is a decent human voice. For this, we need speech recognition systems. These systems will be discussed further. We just can make one, or we can use one that’s already made.
  • Understanding: In the 2015 movie Avengers: Age of Ultron, Tony Stark (a.k.a Iron Man) says “Jarvis is only a natural language understanding matrix” not considering the matrix part, other part of this sentence makes sense to me. Voice assistants need to understand what we tell them. They can have A.I or hard coded answers or a little bit of both.
  • Responding: after processing what we’ve said, the voice assistant needs to provide the responses that fit our request. For example, you say “Hey Alexa, play music” and your Alexa device will ask you for the title, you say “Back in Black” and she’ll play the song from spotify or youtube music.

Now, we know about the functionality. What about the implementation? It’s a whole other story. The rest of the article, is more about the technical side of making an intelligent chatbot…

Implementation of a Voice Assistant

Speech Recognition

Before we start to make our voice assistant, we have to make sure it can hear. So we need to implement a simple speech recognition system.

Although it’s not really hard to implement a speech recognition system, I personally prefer to go with something which is already made, like Python’s speech recognition library (link). This library sends the audio signal directly to IBM, Microsoft or Google API’s and shows us the transcription of our talk.

In the other hand, we can make our own system with a dataset, which has tons of voices and their transcriptions. But as you may know, you need to make your data diverse af. Why? Let me explain it a little bit better.

When you have your own voice only, your dataset doesn’t have the decent diversity. If you add your girlfriend, sister, brother, co-workers, etc. You still have no diversity. The result may be decent, but it only limits itself to your own voice, or the voices of your family members and friends!

The second problem is that your very own speech recognition, can’t understand that much. Because your words and sentences might be limited to the movie dialogues or books you like. We need the diversity to be everywhere in our dataset.

Is there any solution to this problem? Yes. You can use something like Mozilla’s dataset (link) for your desired language and make a speech recognition system. These data provided by the people around the world and it’s as diverse as possible.

Natural Language Understanding

As I told you, a voice assistant should process what we tell her. The best way of processing is artificial intelligence but we also can do a hard coded proof-of-concept as well.

What does that mean? hard coding in programming means when we want some certain input to have a fixed output, we don’t rely on our logic for that answer, but we just write code like if the input is this, give the user that, with no regard of the logic. In this case, the logic can be A.I, but we tell the machine if user said Hi, you simply say Hi!

But in the real world applications we can’t just go with the A.I. or hard coded functions. A real voice assistant is usually a combination of both. How? When you ask your voice assistant for the price of bitcoin, it’s a hard coded function.

But when you just talk to your voice assistant she’ll may make some answers to you, which may have a human feel and that’s when A.I. comes in.

Responding

Although providing responses can be considered a part of the understanding process, I prefer to talk about the whole thing in a separate section.

A response is usually what the A.I. will tell us, and the question is how that A.I. knows what we mean? and this is an excellent question. Designing the intelligent part of the voice assistant or in general chatbots, is the trickiest part.

The main backbone of responses, is your intention. What is your chatbot for? Is it a college professor assistant or it’s just something that will give you a Stark feeling? Is it designed to flirt with lonely people or it’s designed to help the elderly? There are tons of questions you have to answer before designing your own assistant.

After you asked you those questions, you need to classify what people would say to your bot under different categories. These categories are called intents. Let me explain by example.

You go to a Cafe, the waiter gives you the menu and you see the menu, right? Your intention is now clear. You want some coffee. So, how you ask about coffee? I will say Sir, a cup of espresso please. And that’s this simple. In order to answer all coffee related questions, we need to consider different states, as much as possible. What if customer asks for Macchiato? What if they ask for Mocha? What if they ask for a cookie with their coffee? and this is where A.I. can help.

A.I. is nothing other than making predictions using math. A long time ago, I used to write the whole A.I. logic myself. But later a YouTuber called NeuralNine developed a library called neural intents and it’s for this purpose! How does this library work?

It’s simple. We give the library a bunch of questions and our desired answers. The model we train, can classify questions and then simply predict what category our sayings belong to. Let me show you the example.

When you say a cup of espresso please, the A.I. sees words cup and espresso. What happens then? she’ll know these words belong to the coffee category, so she’ll give you one of those fixed answers from that category.

Keeping answers fixed by the way, is not always a good thing. For some reasons, we may need to make a generative chatbot which also can make responses like a human. Those bots are more complex and require more resources, studies and time.

Final Thoughts

The world of programming is beautiful and vast. When it comes to A.I. it becomes more fun of course. In this article, I tried to explain how a voice assistant can be constructed but I actually didn’t dig deep to the implementation.

Why so? I guess implementation is good, but in most cases, like every other aspect of programming, it’s just putting together some tools. So learning the concept, is much more important in most cases, like this.

I hope the article was useful for you. If it is, please share it with your friends and leave a comment for me. I’d be super thankful.