- Allam to launch on Microsoft and IBM
- Chatbot understands dialects
- Uses the ‘largest Arabic dataset’
Saudi Arabia’s Arabic language artificial intelligence (AI) chat model, Allam, is now the largest Arabic-language chatbot in the world and will be available on prominent platforms by Microsoft and IBM, officials announced this week.
Allam is a “large language model”, or LLM, that was first launched by a subsidiary of the Saudi Data and Artificial Intelligence Authority (SDAIA) in 2023.
Now the SDAIA is about to roll out its most advanced versions.
There are a growing number of Arabic chatbots following the model of OpenAI’s ChatGPT, which revolutionised AI technology when an advanced version was launched two years ago.
Allam allows users to speak to it in various Saudi dialects and receive a response in formal Arabic, though it can operate in English too.
“We will target government entities who will pay for customised versions and in the next year there will be one for the public that will be free,” said Raedah Almarzooq, an AI engineer at Allam.
“What’s different is that it thinks in Arabic, it doesn’t just translate Arabic to English and then English back to Arabic,” she said, referring to current US-developed LLMs.
This year, Google launched the first Arabic version of Gemini Advanced, the premium version of the search engine’s chat-based AI product, which is available to subscribers. It says its basic version can handle input from 16 Arabic dialects.
Allam’s availability on Microsoft’s Azure and IBM’s Wotsonx generative AI platforms was formally announced at an AI forum in Riyadh this week. IBM also said it was opening a regional headquarters in Riyadh with an AI training centre costing $200 million.
Allam’s major regional competitor is Jais, which was developed by a unit of the Abu Dhabi-owned company G42 in collaboration with the Mohamed bin Zayed University of Artificial Intelligence and the Silicon Valley-based company Cerebras Systems.
Jais said in 2022 that its system had been trained on 116 billion Arabic “tokens” – the term for words, character sets, or combinations of words that the system is trained on – and 279 billion English tokens.
Allam says it exceeds this, in what it described as a major national project in collaboration with the tech giant Nvidia to provide an advanced chatbot for 450 million Arabic speakers.
Esam AlWagait, head of the SDAIA’s National Information Centre, which oversaw the project, said: “We compiled the largest Arabic dataset, containing more than 500 billion Arabic tokens.
- Saudi Arabia pledges to ‘confront’ risks posed by AI
- Saudi Arabia sets 12% target for AI’s contribution to GDP
- IBM to help Saudi Arabia train AI models in Arabic
“We know that LLMs require specialised hardware, and a lot of that hardware, so we did whatever it took to secure the infrastructure to train the model.
“We engaged more than 400 subject matter experts to generate more than one million prompts that we used to fine tune the LLM system into many areas, from economy, to religion, to history, to Arabic language, and so on.”
A Saudi government status report published this week by the SDAIA said the kingdom “is rapidly positioning itself as a global leader in artificial intelligence” and is aiming for AI to account for 12 percent of economic growth by 2030.