A to Z of making an intelligent voice assistant

It was 2011, a sad year for a lot of apple fans (me included) because Steve Jobs, one of original co-founders of Apple Computers died October that year. Also, it could become sadder if there was no iPhone 4S and its features that year.

A few years prior to the first introduction of Siri (which introduced with iPhone 4S), a movie called Iron Man came out from Marvel Studios. Unlike comic books, Jarvis wasn’t an old man in this movie. Jarvis was an A.I. I’m not sure if the movie inspired companies to add the voice assistant to their systems or not, but I’m sure a lot of people just bought those phones or tablets to have their own version of Jarvis!

Long story short, a lot of engineers like me, were under the influence of the MCU (Marvel’s cinematic universe) and Apple and wanted to have their voice assistant a little bit differently! Instead of buying an iPhone 4S, we preferred to start making our own voice assistants.

In this article, I’m discussing the basics you need to learn for making your very own version of Siri. I warn you here, there wil be no codes at least in this one!

How does a voice assistant work?

In order to make something, we first need to learn how on earth that thing works! So, let’s discuss about voice assistants and how they work. They’re much simpler than what you think. It’s guaranteed your mind will be blown by their simplicity!

  • Listening: a voice assistant, as called, needs to listen to the voices and detects what is a decent human voice. For this, we need speech recognition systems. These systems will be discussed further. We just can make one, or we can use one that’s already made.
  • Understanding: In the 2015 movie Avengers: Age of Ultron, Tony Stark (a.k.a Iron Man) says “Jarvis is only a natural language understanding matrix” not considering the matrix part, other part of this sentence makes sense to me. Voice assistants need to understand what we tell them. They can have A.I or hard coded answers or a little bit of both.
  • Responding: after processing what we’ve said, the voice assistant needs to provide the responses that fit our request. For example, you say “Hey Alexa, play music” and your Alexa device will ask you for the title, you say “Back in Black” and she’ll play the song from spotify or youtube music.

Now, we know about the functionality. What about the implementation? It’s a whole other story. The rest of the article, is more about the technical side of making an intelligent chatbot…

Implementation of a Voice Assistant

Speech Recognition

Before we start to make our voice assistant, we have to make sure it can hear. So we need to implement a simple speech recognition system.

Although it’s not really hard to implement a speech recognition system, I personally prefer to go with something which is already made, like Python’s speech recognition library (link). This library sends the audio signal directly to IBM, Microsoft or Google API’s and shows us the transcription of our talk.

In the other hand, we can make our own system with a dataset, which has tons of voices and their transcriptions. But as you may know, you need to make your data diverse af. Why? Let me explain it a little bit better.

When you have your own voice only, your dataset doesn’t have the decent diversity. If you add your girlfriend, sister, brother, co-workers, etc. You still have no diversity. The result may be decent, but it only limits itself to your own voice, or the voices of your family members and friends!

The second problem is that your very own speech recognition, can’t understand that much. Because your words and sentences might be limited to the movie dialogues or books you like. We need the diversity to be everywhere in our dataset.

Is there any solution to this problem? Yes. You can use something like Mozilla’s dataset (link) for your desired language and make a speech recognition system. These data provided by the people around the world and it’s as diverse as possible.

Natural Language Understanding

As I told you, a voice assistant should process what we tell her. The best way of processing is artificial intelligence but we also can do a hard coded proof-of-concept as well.

What does that mean? hard coding in programming means when we want some certain input to have a fixed output, we don’t rely on our logic for that answer, but we just write code like if the input is this, give the user that, with no regard of the logic. In this case, the logic can be A.I, but we tell the machine if user said Hi, you simply say Hi!

But in the real world applications we can’t just go with the A.I. or hard coded functions. A real voice assistant is usually a combination of both. How? When you ask your voice assistant for the price of bitcoin, it’s a hard coded function.

But when you just talk to your voice assistant she’ll may make some answers to you, which may have a human feel and that’s when A.I. comes in.

Responding

Although providing responses can be considered a part of the understanding process, I prefer to talk about the whole thing in a separate section.

A response is usually what the A.I. will tell us, and the question is how that A.I. knows what we mean? and this is an excellent question. Designing the intelligent part of the voice assistant or in general chatbots, is the trickiest part.

The main backbone of responses, is your intention. What is your chatbot for? Is it a college professor assistant or it’s just something that will give you a Stark feeling? Is it designed to flirt with lonely people or it’s designed to help the elderly? There are tons of questions you have to answer before designing your own assistant.

After you asked you those questions, you need to classify what people would say to your bot under different categories. These categories are called intents. Let me explain by example.

You go to a Cafe, the waiter gives you the menu and you see the menu, right? Your intention is now clear. You want some coffee. So, how you ask about coffee? I will say Sir, a cup of espresso please. And that’s this simple. In order to answer all coffee related questions, we need to consider different states, as much as possible. What if customer asks for Macchiato? What if they ask for Mocha? What if they ask for a cookie with their coffee? and this is where A.I. can help.

A.I. is nothing other than making predictions using math. A long time ago, I used to write the whole A.I. logic myself. But later a YouTuber called NeuralNine developed a library called neural intents and it’s for this purpose! How does this library work?

It’s simple. We give the library a bunch of questions and our desired answers. The model we train, can classify questions and then simply predict what category our sayings belong to. Let me show you the example.

When you say a cup of espresso please, the A.I. sees words cup and espresso. What happens then? she’ll know these words belong to the coffee category, so she’ll give you one of those fixed answers from that category.

Keeping answers fixed by the way, is not always a good thing. For some reasons, we may need to make a generative chatbot which also can make responses like a human. Those bots are more complex and require more resources, studies and time.

Final Thoughts

The world of programming is beautiful and vast. When it comes to A.I. it becomes more fun of course. In this article, I tried to explain how a voice assistant can be constructed but I actually didn’t dig deep to the implementation.

Why so? I guess implementation is good, but in most cases, like every other aspect of programming, it’s just putting together some tools. So learning the concept, is much more important in most cases, like this.

I hope the article was useful for you. If it is, please share it with your friends and leave a comment for me. I’d be super thankful.

Composing using relative scales!

In this topic, I’m going to show you how to compose a minimal music piece using “relative” scales. But first, let’s talk about music theory. A relative scale, is simply “a scale with the same note as main scale”, there are two kinds of relatives, relative minor and relative major. So, if we consider this as D minor :

D – E – F – G – A – A# – C – D

We need another scale with the same notes. Let’s start from 3rd note of our main scale :

F – G – A – A# – C – D – E – F

This is “F Major” scale, and it’s “relative major” of our D minor scale. As you can see, if you start from 3rd note of a minor scale, you will have its relative major, and if you start from 6th note of a major scale, you’ll have its “relative minor”.

Now, let’s make some piece! In this topic, I use Ableton Live 9 as my DAW, and DSK Overture and ZynAddSubFX as my plugins. Let’s make music!

First, I play an orchestral piece in D minor using DSK overture, with “Strings section”, “Flute”, “Violin” and “Cello”. It will be D-F-A-F-D-A-E.

Sounds good, but It needs some decorations! We can do that using F major! in F major section, I want to use ZynAddSubFX. And my notes will be F – A- C – A – F – C – G.  So, this is our relative major :

Now, we can mix them! This is the orchestral piece with its electronic relative major :

And, I repeat this piece for 4 or 5 times, then make a song (with some audio engineering ), here is our final song :

I hope you enjoy this topic and song! Remember that this “relative” technique is useful when you want to use Ostinatos, specially on piano, you can play the ostinato on right hand and its relative on left hand.

Ambient Music, Using usual instruments

In previous article about music,  I just explained what ambient music is, and we learned that this genre, is famous because its minimal, relaxing and unusual form.  But, in this article, I’m going to explain how we can use a usual instrument and classical music technique, to make this kind of music.

Before we start, I’m going to explain one important thing, ambient is a minimalistic genre of music, so we have to keep it as simple as we can. minimalism means we have to use only maximum three chords, and repeat shapes and melodies for a while. Minimalism is the most important characteristic of ambient music!

Now, let’s discuss about our goal : making unusual music with a usual instrument. I choose piano in this case, because I have a lot of good piano sounds in my laptop, I personally play guitar, but I have no good guitar sound (VST or soundfont or … ). I chose piano, you know, piano is used to be one of the most known instrument in classical music. Also, people used this instrument to play Jazz, Rock, Symphonic Metal, etc. You only need to install “Piano tiles” game and play it, to realize how people used piano in different genres and styles! So, when people used piano even in rap music, we also can use it in our modern and minimalistic style!

Let’s talk about technique, in this case, I want to use Ostinato , which means playing a melodic shape repeatedly for a while. And if you have read this article carefully, you will realize that ambient (and any other minimalistic style of music) has this characteristic. And again, we can find that every musical genre, even these modern, minimal and unusual ones, have roots in the classical music.

As an example, I’m going to write a simple song. First, we need to decide about progression. I want to use 1-3-5 in key of D minor, which means I need these chords :
Dm – F – Am
So, we need to play these chords, each bar (or measure, I really don’t know what it’s called in different parts of the world :D) , includes one chord. This can be our sample :

So, We just played chords, it makes no sense at all. If you play piano (or any other classical instrument) you know when we want to play a piece, we need to decorate our main line. So, I add some decorations to my piece, which will be my ositnato :

And for now, I just play this piece for 5 times, then I do some sound engineering (adding effects, etc) and what you will hear is :

Yes, this is a simple ambient song! A simple ostinato, with reverb and pitch shifting, fades in and then fades out. I used LMMS for writing song, and playing it on a piano, and I also used Audacity for sound engineering.

I try to write more about ambient music in my blog, because I know it’s one of the most popular musical styles, and I know a lot of people who want to play this kind of music, but they don’t know how to start.

Good Luck!

A very short introduction to Ambient Music

In my Persian blog, I had written a lot about operating systems, computer architecture and digital electronics. I have plans for English blog and I’ll write about computer science and engineering in future, but, I decided to explain my experiences in music for now. In this article, I’m going to talk about ambient music, and how it’s produced. Of course, this is not a music theory or musical software tutorial.

Let’s talk about ambient music, what is ambient music? Ambient music is a minimalistic, modern and electronic genre, which is invented by “Brian Eno” in early 70’s. Ambient is actually a subgenre of electronic music, but after years of evolution, it’s known as an independent musical genre.

Characteristics

Ambient music, is highly dependent on the environment. You will realize this when you hear the name. Actually, this genre is based on John Cage’s theory, Everything we do is music and his 4’33” is one of the best ambient songs ever! Four minutes and thirty three seconds of silence, the composer asks you to listen to ambient noises. It means, you can record any sounds and then make it ambient music. This is true, but, not every sounds. A lot of ambient tracks are just recordings from nature, and a simple melody is played over that sounds. Some others are electronic productions, based on natural sounds and atmospheres. This means, ambient is some kind of avant-garde music, you are free to do everything you want!

Styles

You know every musical genre, can be played in different styles. In this section, I just explain some of ambient music styles. I’m sure there are more styles, but these styles are my favorites :

  • Dark Ambient :
    This is one of the most known styles of the ambient music. Sometimes people think that dark ambient is a subgenre of ambient music, but it’s actually not. Because it’s the same concept, but with scary, depressing or dark atmosphere. Sounds like someone plays his/her music in an abandoned and haunted place 😀
  • Space Ambient :
    This is another style. If you’re a fan of outer space life, science fictions and movies like Star Wars or Star Trek, this is your kind of music. In this style, musicians use effects which can make you feel aliens are in your home! And this is what makes this style awesome!

There are more, but I usually listen to these styles. So, I can explain these two better. For more information, you can find ambient musicians on YouTube, Soundcloud, Jamendo, etc. And ask them about their style!

Subgenres

And now, we are going to take a look on subgenres of ambient music. These genres are created to show us how minimal music can be perfect!

  • Drone :
    My most favorite subgenre of ambient music, drone music is just sustained chord, note or sound. Also, artists may decorated the sustained sounds using small melodies, or a single melody is repeated on the drone sound. I’ll explain drone music in future.
  • Lowercase :
    This is the most artistic form of ambient music. Artists record sounds from nature, or daily activities, and then, amplify them and edit them to make a melody. Lowercase music is one of the most minimalistic genres, and one of the most amazing ones, too!

In this article, we talked about ambient music and which kind of music we can call ambient. In future, I’ll explain more about making an ambient track and I’ll introduce my favorite ambient artists.

Good luck!

Hello world!

Hello World!

This is my first blog post in English. After years of blogging in my mother tongue, Persian, I decided to start writing in English. I think blogging in English is much better, because more people can read what I write, and also more eyes will see my posts and works. I’ll start writing my experiences here, as soon as possible.