{"id":286,"date":"2023-12-28T00:06:32","date_gmt":"2023-12-27T20:36:32","guid":{"rendered":"https:\/\/haghiri75.com\/en\/?p=286"},"modified":"2023-12-28T00:06:32","modified_gmt":"2023-12-27T20:36:32","slug":"maral-is-here-7-billion-parameters-bilingual-model-with-support-of-persian","status":"publish","type":"post","link":"https:\/\/haghiri75.com\/en\/maral-is-here-7-billion-parameters-bilingual-model-with-support-of-persian\/","title":{"rendered":"Maral is here, 7 billion parameters bilingual model with support of Persian!"},"content":{"rendered":"<p>If you read my previous post, you know how much I like open source AI material, and I even jokingly titled my BLOOM post <a href=\"https:\/\/haghiri75.com\/en\/i-was-to-cheap-to-pay-10-a-month-for-copilot-so-i-made-my-own\/\">I was too cheap to pay for GitHub&#8217;s copilot<\/a>! So making an open source model was always one of my goals of life. Also, in my Persian blog, I pointed out that the dominance of English language in current LLM scene is a little bit concerning (<a href=\"https:\/\/haghiri75.com\/2023\/08\/15\/english-domination-on-ai-is-concerning\/\">read it here<\/a>).<\/p>\n<p>Now as of today, I am pleased to announce that Maral is here. The 7 billion parameters bilingual model which can respond to Persian and English prompts, and can produce GPT-3.5 level of answers based on the dataset we fed to it!<\/p>\n<p><img data-attachment-id=\"287\" data-permalink=\"https:\/\/haghiri75.com\/en\/maral-is-here-7-billion-parameters-bilingual-model-with-support-of-persian\/maral-featured-2\/\" data-orig-file=\"https:\/\/haghiri75.com\/en\/wp-content\/uploads\/maral-featured-2.png\" data-orig-size=\"1024,576\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"maral-featured-2\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/haghiri75.com\/en\/wp-content\/uploads\/maral-featured-2-300x169.png\" data-large-file=\"https:\/\/haghiri75.com\/en\/wp-content\/uploads\/maral-featured-2.png\" decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-full wp-image-287\" src=\"http:\/\/haghiri75.com\/en\/wp-content\/uploads\/maral-featured-2.png\" alt=\"\" width=\"1024\" height=\"576\" srcset=\"https:\/\/haghiri75.com\/en\/wp-content\/uploads\/maral-featured-2.png 1024w, https:\/\/haghiri75.com\/en\/wp-content\/uploads\/maral-featured-2-300x169.png 300w, https:\/\/haghiri75.com\/en\/wp-content\/uploads\/maral-featured-2-768x432.png 768w\" sizes=\"(max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/p>\n<h1>Maral 7B alpha 1 and its advantages<\/h1>\n<p>Since the release of GPT2 and BERT models, there were efforts for making a Persian <em>text generation<\/em> model in our community. But to be honest, most of them left untouched in middle of the road.<\/p>\n<p>In last years <em>AI revolution <\/em>however, people saw potential in the realm of generative AI and started working on models. From RAGs on existing models to fine-tuning basic models which could somehow understand Perso-Arabic alphabet.<\/p>\n<p>But with the release of <a href=\"https:\/\/huggingface.co\/mistralai\/Mistral-7B-v0.1\">Mistral<\/a> model, everything has changed. I personally never thought a 7 billion parameters model can understand multiple languages this well. So I put more information on the next section of the article on why Mistral became my number one choice as the base model!<\/p>\n<p>However, the biggest problem was still there and it was <em>the dataset.<\/em> Finding a good enough dataset is always a bottleneck. But we&#8217;ve been lucky enough that one of Iranian developers, has translated <em>Alpaca Dataset <\/em>to our beloved Persian language (and it&#8217;s accessible <a href=\"https:\/\/huggingface.co\/datasets\/sinarashidi\/alpaca-persian\">here<\/a>).<\/p>\n<p>When you&#8217;re in possession of needed ingredients for your potion, I guess it&#8217;s time to light up the caldron and start making the potion!<\/p>\n<h2>Why Mistral?<\/h2>\n<p>As a developer and an enthusiastic person, I always try new models and tools specially when it comes to text. Mistral was the new kid in the corner and I personally witnessed a lot of positive reviews about it. So I tried these:<\/p>\n<ul>\n<li>Loading and testing model on normal English tasks it was good for.<\/li>\n<li>Testing model on some <em>more complicated task <\/em>such as reasoning or basic math.<\/li>\n<li>Testing the model on code generation.<\/li>\n<\/ul>\n<p>All of the above tests passed very well. You probably never expect a middle sized model to perform well on all of the given tasks, but this one was a little different. Although it was a little bit confused in reasoning tasks, I could pass on that (since even GPT-4 has problems with reasoning).<\/p>\n<p>But I always do another tests on these models, because I&#8217;m Iranian and I speak Persian\/Farsi, and I really like to know how model performs on my language. So these were what I have tested:<\/p>\n<ul>\n<li>Generic Persian text generation, when the model started generating nonsense but it showed me the potential, I had a guess it may have seen some Persian text before.<\/li>\n<li>Asking Persian questions, it tried the best to put words together but at some point, it returned to nonsense or even answered completely in English!<\/li>\n<li>Translation! Believe it or not, it can be a very good measure of accuracy in multilinguality of the model (Okay, I made that term up, stay calm). Although model was successful in English to French and Spanish (with my very limited knowledge), it haven&#8217;t performed well on Persian.<\/li>\n<\/ul>\n<p>Okay, the test showed me the potential. So I had to team up with my colleague and make it happen! Let&#8217;s add support for our mother tongue to this model!<\/p>\n<p><img data-attachment-id=\"292\" data-permalink=\"https:\/\/haghiri75.com\/en\/maral-is-here-7-billion-parameters-bilingual-model-with-support-of-persian\/maral-featured\/\" data-orig-file=\"https:\/\/haghiri75.com\/en\/wp-content\/uploads\/maral-featured.png\" data-orig-size=\"1024,576\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"maral-featured\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/haghiri75.com\/en\/wp-content\/uploads\/maral-featured-300x169.png\" data-large-file=\"https:\/\/haghiri75.com\/en\/wp-content\/uploads\/maral-featured.png\" decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-full wp-image-292\" src=\"http:\/\/haghiri75.com\/en\/wp-content\/uploads\/maral-featured.png\" alt=\"\" width=\"1024\" height=\"576\" srcset=\"https:\/\/haghiri75.com\/en\/wp-content\/uploads\/maral-featured.png 1024w, https:\/\/haghiri75.com\/en\/wp-content\/uploads\/maral-featured-300x169.png 300w, https:\/\/haghiri75.com\/en\/wp-content\/uploads\/maral-featured-768x432.png 768w\" sizes=\"(max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/p>\n<h2>Train procedure and infrastructure<\/h2>\n<p>Now let&#8217;s talk about the fun stuff. First, we saw that we may need a very big and somehow <em>unaffordable <\/em>(at least for us) infrastructure to train mistral from scratch.<\/p>\n<p>So we performed a big research on the topic and found these methods:<\/p>\n<ul>\n<li>Retrieve-Augment Generation (RAG)<\/li>\n<li>Quantized Low Rand Adoption (QLoRa) and Parameter Efficient Fine Tuning (PEFT)<\/li>\n<\/ul>\n<p>To be honest RAGs are cool, but they won&#8217;t lead to a new model. So we tried QLoRa and PEFT.<\/p>\n<p>The basic training (with extremely inaccurate results) have done on a T4 (Colab&#8217;s free tier) and then we&#8217;ve decided to go further. So I went after our friends at Jupyto, a company where you can rent GPUs hourly from and based in Iran.<\/p>\n<p>They had great offers for powerful GPUs and we got our hands on a 3090 Ti with 64 GB of RAM. It was a perfect machine for doing the training and we&#8217;ve trained the better model on this setup.<\/p>\n<p>The QLoRa training took over 10 hours for 5 epochs (each epoch took more than 100 minutes) and the results were out of this world! It could give us text which is semantically and grammatically correct!<\/p>\n<p>Then, we&#8217;ve merged the adapter to the base model to take advantage of the main knowledge of the model as well.<\/p>\n<p>Although, I personally faced a set of problems which I will point out int the next section.<\/p>\n<h2>The problems you may face using Maral<\/h2>\n<p>Since we&#8217;re on our alpha stage, I have to admit you may face these problems while using Maral, specially on Persian language.<\/p>\n<ul>\n<li>The prompt format is based on <em>Guanaco <\/em>format. So it doesn&#8217;t have tokens for start and end of sentences.<\/li>\n<li>The tokenizer is not optimized for Persian letters yet. So it may make it slow on Persian language.<\/li>\n<li>The model is really good at hallucinating.<\/li>\n<li>According to the previous item, it also easily produce misinformation. So please be careful with the answers you get from the model.<\/li>\n<li>The model likes to repeat itself a lot. So If you get a repetitive answer, do not worry.<\/li>\n<li>Model being so large, is a little hard to deploy on consumer hardware. However in the HuggingFace page, we&#8217;ve provided 8 bit loading instructions as well.<\/li>\n<\/ul>\n<h2>Furthrer works<\/h2>\n<ul>\n<li>Optimizing tokenizer for Perso-Arabic alphabet.<\/li>\n<li>Providing a better dataset.<\/li>\n<li>Add bos_token and eos_token to the tokenizer, specially for instruct following\/chat model.<\/li>\n<li>Providing GTPQ, GGUF or GGML models to make it more affordable on consumer hardware.<\/li>\n<li>Making much smaller models (say 1B or 2B) with more focused niche.<\/li>\n<\/ul>\n<h2>Related links<\/h2>\n<ul>\n<li><a href=\"https:\/\/huggingface.co\/MaralGPT\/Maral-7B-alpha-1\">Model on HuggingFace<\/a><\/li>\n<li><a href=\"https:\/\/haghiri75.com\/2023\/12\/26\/%d9%85%d8%a7%d8%b1%d8%a7%d9%84-%d8%a7%db%8c%d9%86%d8%ac%d8%a7%d8%b3%d8%aa%d8%8c-%d9%85%d8%af%d9%84-%db%b7-%d9%85%db%8c%d9%84%db%8c%d8%a7%d8%b1%d8%af-%d9%be%d8%a7%d8%b1%d8%a7%d9%85%d8%aa%d8%b1%db%8c\/\">Persian Blog on the model<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>If you read my previous post, you know how much I like open source AI material, and I even jokingly titled my BLOOM post I was too cheap to pay for GitHub&#8217;s copilot! So making an open source model was always one of my goals of life. Also, in my Persian blog, I pointed out &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/haghiri75.com\/en\/maral-is-here-7-billion-parameters-bilingual-model-with-support-of-persian\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Maral is here, 7 billion parameters bilingual model with support of Persian!&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_newsletter_tier_id":0,"footnotes":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false}}},"categories":[2],"tags":[23,37,11,22,45,24,25],"jetpack_publicize_connections":[],"aioseo_notices":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p8BkKn-4C","jetpack-related-posts":[{"id":399,"url":"https:\/\/haghiri75.com\/en\/you-only-need-python-to-make-ai-agents\/","url_meta":{"origin":286,"position":0},"title":"You only need Python to make AI agents.","author":"prp-e","date":"December 31, 2024","format":false,"excerpt":"In 2022, ChatGPT released and LLMs becoming the hot topic of pretty much every technology related press, event, YouTube video, etc. It was like finding the secret ingredient to a potion which can make you immortal. But Meta didn't let OpenAI becoming the one and only. They also started the\u2026","rel":"","context":"In &quot;Projects&quot;","block_context":{"text":"Projects","link":"https:\/\/haghiri75.com\/en\/category\/projects\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/mann-e-images.storage.c2.liara.space\/319996af-289a-4617-8d0c-6580e4793747.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/mann-e-images.storage.c2.liara.space\/319996af-289a-4617-8d0c-6580e4793747.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/mann-e-images.storage.c2.liara.space\/319996af-289a-4617-8d0c-6580e4793747.png?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/mann-e-images.storage.c2.liara.space\/319996af-289a-4617-8d0c-6580e4793747.png?resize=700%2C400&ssl=1 2x, https:\/\/i0.wp.com\/mann-e-images.storage.c2.liara.space\/319996af-289a-4617-8d0c-6580e4793747.png?resize=1050%2C600&ssl=1 3x"},"classes":[]},{"id":300,"url":"https:\/\/haghiri75.com\/en\/nucleus-is-the-proof-that-small-is-the-new-big\/","url_meta":{"origin":286,"position":1},"title":"Nucleus is the proof that &#8220;Small is the new Big&#8221;","author":"prp-e","date":"January 13, 2024","format":false,"excerpt":"No matter what you heard, size matters. Specially in the world of AI models, having a smaller and more affordable model is the key to win the competition. This is why Microsoft even invested time, GPU and money on Phi project, which is a Small Language Model or SLM for\u2026","rel":"","context":"In &quot;Computer Architecture &amp; Programming&quot;","block_context":{"text":"Computer Architecture &amp; Programming","link":"https:\/\/haghiri75.com\/en\/category\/computer\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/haghiri75.com\/en\/wp-content\/uploads\/CEO.png?resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/haghiri75.com\/en\/wp-content\/uploads\/CEO.png?resize=350%2C200 1x, https:\/\/i0.wp.com\/haghiri75.com\/en\/wp-content\/uploads\/CEO.png?resize=525%2C300 1.5x, https:\/\/i0.wp.com\/haghiri75.com\/en\/wp-content\/uploads\/CEO.png?resize=700%2C400 2x"},"classes":[]},{"id":318,"url":"https:\/\/haghiri75.com\/en\/frontbricks-my-llm-based-weekend-project-which-is-inspired-by-vercels-v0\/","url_meta":{"origin":286,"position":2},"title":"FrontBricks, my LLM-based weekend project which is inspired by Vercel&#8217;s V0","author":"prp-e","date":"May 31, 2024","format":false,"excerpt":"Since 2022, there is a hype of\u00a0generative artificial intelligence\u00a0and it resulted in a bunch of cool projects. Although a lot of us may remember that Github's copilot was much older. Those days, I wrote an article about how I was too cheap to pay $10 a month for copilot, so\u2026","rel":"","context":"In &quot;Computer Architecture &amp; Programming&quot;","block_context":{"text":"Computer Architecture &amp; Programming","link":"https:\/\/haghiri75.com\/en\/category\/computer\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/haghiri75.com\/en\/wp-content\/uploads\/frontbricks.png?resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/haghiri75.com\/en\/wp-content\/uploads\/frontbricks.png?resize=350%2C200 1x, https:\/\/i0.wp.com\/haghiri75.com\/en\/wp-content\/uploads\/frontbricks.png?resize=525%2C300 1.5x, https:\/\/i0.wp.com\/haghiri75.com\/en\/wp-content\/uploads\/frontbricks.png?resize=700%2C400 2x, https:\/\/i0.wp.com\/haghiri75.com\/en\/wp-content\/uploads\/frontbricks.png?resize=1050%2C600 3x"},"classes":[]},{"id":161,"url":"https:\/\/haghiri75.com\/en\/analyzing-components-of-an-electric-circuit-with-yolov5\/","url_meta":{"origin":286,"position":3},"title":"Analyzing components of an electric circuit with YOLOv5","author":"prp-e","date":"January 14, 2022","format":false,"excerpt":"In past recent weeks, I did a lot with YOLOv5. A few weeks prior to this article, I wrote an article on why I love YOLOv5 and later, I did a project with YOLOv5 which was somehow a try for making something like symbolab or similar software. I explained that\u2026","rel":"","context":"In &quot;Computer Architecture &amp; Programming&quot;","block_context":{"text":"Computer Architecture &amp; Programming","link":"https:\/\/haghiri75.com\/en\/category\/computer\/"},"img":{"alt_text":"Electric Circuit component analysis using YOLOv5","src":"https:\/\/i0.wp.com\/haghiri75.com\/en\/wp-content\/uploads\/Screen-Shot-2022-01-13-at-1.05.41-AM-1024x583.png?resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/haghiri75.com\/en\/wp-content\/uploads\/Screen-Shot-2022-01-13-at-1.05.41-AM-1024x583.png?resize=350%2C200 1x, https:\/\/i0.wp.com\/haghiri75.com\/en\/wp-content\/uploads\/Screen-Shot-2022-01-13-at-1.05.41-AM-1024x583.png?resize=525%2C300 1.5x"},"classes":[]},{"id":1,"url":"https:\/\/haghiri75.com\/en\/hello-world\/","url_meta":{"origin":286,"position":4},"title":"Hello world!","author":"prp-e","date":"April 1, 2017","format":false,"excerpt":"Hello World! This is my first blog post in English. After years of blogging in my mother tongue, Persian, I decided to start writing in English. I think blogging in English is much better, because more people can read what I write, and also more eyes will see my posts\u2026","rel":"","context":"Similar post","block_context":{"text":"Similar post","link":""},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":192,"url":"https:\/\/haghiri75.com\/en\/i-was-to-cheap-to-pay-10-a-month-for-copilot-so-i-made-my-own\/","url_meta":{"origin":286,"position":5},"title":"I was too cheap to pay $10 a month for copilot, so I made my own","author":"prp-e","date":"September 4, 2022","format":false,"excerpt":"In mid 2021, there was a revolution in coding. As a lazy programmer who always needed a fast and smart assistant, I was really happy to have Github Copilot in my arsenal of coding tools. So I was one of the early adapters of the whole idea of AI pair\u2026","rel":"","context":"In &quot;Computer Architecture &amp; Programming&quot;","block_context":{"text":"Computer Architecture &amp; Programming","link":"https:\/\/haghiri75.com\/en\/category\/computer\/"},"img":{"alt_text":"My own copilot!","src":"https:\/\/i0.wp.com\/haghiri75.com\/en\/wp-content\/uploads\/Screen-Shot-2022-09-03-at-10.46.46-PM.png?resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/haghiri75.com\/en\/wp-content\/uploads\/Screen-Shot-2022-09-03-at-10.46.46-PM.png?resize=350%2C200 1x, https:\/\/i0.wp.com\/haghiri75.com\/en\/wp-content\/uploads\/Screen-Shot-2022-09-03-at-10.46.46-PM.png?resize=525%2C300 1.5x"},"classes":[]}],"_links":{"self":[{"href":"https:\/\/haghiri75.com\/en\/wp-json\/wp\/v2\/posts\/286"}],"collection":[{"href":"https:\/\/haghiri75.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/haghiri75.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/haghiri75.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/haghiri75.com\/en\/wp-json\/wp\/v2\/comments?post=286"}],"version-history":[{"count":10,"href":"https:\/\/haghiri75.com\/en\/wp-json\/wp\/v2\/posts\/286\/revisions"}],"predecessor-version":[{"id":298,"href":"https:\/\/haghiri75.com\/en\/wp-json\/wp\/v2\/posts\/286\/revisions\/298"}],"wp:attachment":[{"href":"https:\/\/haghiri75.com\/en\/wp-json\/wp\/v2\/media?parent=286"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/haghiri75.com\/en\/wp-json\/wp\/v2\/categories?post=286"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/haghiri75.com\/en\/wp-json\/wp\/v2\/tags?post=286"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}