From Swahili to Zulu, African Techies Develop AI Language Tools

Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)
Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)
TT

From Swahili to Zulu, African Techies Develop AI Language Tools

Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)
Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)

When the Nigerian government announced plans in April to develop a multilingual AI tool to boost digital inclusion across the West African nation, 28-year-old computer science student Lwasinam Lenham Dilli was thrilled.

Dilli had struggled to scrape datasets from the internet to build a large language model (LLM), used to power AI chatbots, in his native Hausa language as part of his final-year project at university.

"I needed texts in English and their corresponding translation in Hausa but I couldn't get anything online, (there was) no clean data," Dilli told the Thomson Reuters Foundation.

"(Creating local language LLMs) is a way to ensure that our local dialects and languages will not be forgotten or left out of the AI ecosystem," he added.

The world has been swept up in a whirlwind of AI mania, with tools such as OpenAI's ChatGPT, Meta's Llama 2, and Mistral AI captivating millions globally with their ability to generate human-like text.

But for many tech-savvy Africans, the excitement has been tempered by a frustrating reality: when languages like Hausa, Amharic, or Kinyarwanda are entered into the chat, many of these advanced systems falter, often producing nonsensical responses.

Technology experts warn the lack of LLMs in African languages will lead to the exclusion of millions of people on the continent, increasing both the digital and economic divide.

The Nigerian government-led initiative to develop a multilingual LLM aims to level the playing field.

"The LLM will be trained on five low-resource languages and accented English to ensure stronger language representation ... for development of artificial intelligence solutions," said Nigeria's Digital Economy Minister Bosun Tijani in April.

The government will partner with Nigerian AI startups, and local data will be collected by volunteers who are fluent in any of five Nigerian languages: Yoruba, Hausa, Igbo, Ibibio, and West African lingua franca—Pidgin.

To build the model, the project will also draw on the expertise of more than 7,000 fellows from Nigeria's tech talent program - a government scheme to train three million people in skills such as coding and programming.

Silas Adekunle, co-founder of Awarri, an AI startup that is part of the initiative, said building a nuanced AI tool that understood Nigeria's unique language and cultural landscape presented many challenges.

"We have so many different accents and languages, and this (LLM) will enable many people and developers to build products that leverage AI but are for the Nigerian market," said Adekunle.

"The scale of the project, especially with limited resources, has required us to be creative in how we train the model, gather the data, compute and label what we have."

CLOSING THE AI LANGUAGE GAP

Africa is home to more than 2,000 languages spoken across 54 countries, according to the United Nations Educational, Scientific and Cultural Organization (UNESCO).

However, the majority of African languages remain underrepresented on the internet. English dominates the digital space, accounting for around 50% of all websites, followed by Spanish, German, Japanese, and French.

Along with the Nigerian government initiative, there are also a small but growing number of African startups rising to the challenge of developing AI tools in languages like Swahili, Amharic, Zulu and Sesotho.

In Kenya, for instance, health tech firm Jacaranda Health has pioneered the first LLM operating in Swahili to improve maternal healthcare in East Africa.

Built on Meta's Llama 3 system, UlizaLlama (AskLlama) aims to refine Jacaranda Health's SMS service for low-income Swahili-speaking expectant mothers who have queries ranging from dietary concerns and fetal movement to exercise during pregnancy.

The platform currently provides pre-written automated responses, but once UlizaLlama is integrated by the end of June, it will tailor responses to individual needs, offering more detailed pregnancy guidance and emergency support.

"A lot of these expectant moms can't just do a Google search. UlizaLlama's goal is to make sure that we get them the accurate answers in the fastest possible time," Jay Patel, Jacaranda Health's director of technology, told the Thomson Reuters Foundation.

"We're shooting for about 85% accuracy to start with and a faster response time. At the moment, it takes a few minutes to respond, but we are hoping to get that down to less than a minute in the future."

In South Africa, the Masakhane initiative is using open-source machine learning to translate African languages.

Lelapa AI, a South African AI research lab, has pioneered VulaVula – a for-profit language processing tool that translates, transcribes and analyses languages in English, Afrikaans, Zulu and Sesotho.

DATA SCARCITY, ETHICAL CONCERNS

But AI experts say building LLMs in African languages poses significant challenges, ranging from availability of data to ethical concerns over consent, compensation and copyright.

Many African languages are low-resource languages, meaning there is a scarcity of data to train these models effectively - unlike high-resource languages such as English or French.

Michael Michie, co-founder of Everse Technology Africa, an AI startup building intelligence into data protection and privacy, said collecting the data needed to train LLMs also raised ethical questions.

In many African communities, oral tradition predominates, and certain communities may not be interested in sharing their language to train LLMs and this should be respected.

"There are currently no regulations or laws in African countries that address issues related to consent, privacy and compensation to communities when collecting data to train AI tools - this needs to be addressed," said Michie.

"There are questions of who owns the language and who benefits. There needs to be guidelines to prevent exploitation and ensure the development of these LLMs benefits the people they are meant to serve," he added.

Open-source initiatives like Creative Commons, which allow creators to legally share their work with specified conditions like ensuring attribution and non-commercial use, are also not a perfect solution, said some AI experts.

"At the moment there's this push of saying everything should just be under Creative Commons," said Vukosi Marivate, associate professor of computer science at the University of Pretoria and co-founder of Lelapa AI.

But if everything is open source, it may be harder to properly reimburse and acknowledge the original contributors to these language models, he said.

"A lot of people are working on LLMs now because of the prestige, that's where the money is, but we need to make sure that our languages are actually being taken care of."



Major Publishers Sue Meta for Copyright Infringement Over AI Training

Cars drive past a sign of Meta, the new name for the company formerly known as Facebook, at its headquarters in Menlo Park, California, US, October 28, 2021. (Reuters)
Cars drive past a sign of Meta, the new name for the company formerly known as Facebook, at its headquarters in Menlo Park, California, US, October 28, 2021. (Reuters)
TT

Major Publishers Sue Meta for Copyright Infringement Over AI Training

Cars drive past a sign of Meta, the new name for the company formerly known as Facebook, at its headquarters in Menlo Park, California, US, October 28, 2021. (Reuters)
Cars drive past a sign of Meta, the new name for the company formerly known as Facebook, at its headquarters in Menlo Park, California, US, October 28, 2021. (Reuters)

Publishers Elsevier, Cengage, Hachette, Macmillan and McGraw Hill sued Meta Platforms in Manhattan federal court on Tuesday, alleging that the tech giant misused their books and journal articles to train its artificial intelligence model Llama.

The publishers, as well as author Scott Turow, alleged in the proposed class action complaint that Meta pirated millions of their works and used them without permission to train its large language models to respond to human prompts.

“AI is powering transformative innovations, ‌productivity and creativity ‌for individuals and companies, and courts have rightly ‌found ⁠that training AI ⁠on copyrighted material can qualify as fair use," a Meta spokesperson responded in a statement on Tuesday.

"We will fight this lawsuit aggressively.”

The publishers allege that Meta pirated works ranging from textbooks to scientific articles to novels including "The Fifth Season" by N.K. Jemisin and "The Wild Robot" by Peter Brown for its ⁠AI training.

They asked the court for ‌permission to represent a larger class ‌of copyright owners and an unspecified amount of monetary damages.

"Meta’s mass-scale ‌infringement isn’t public progress, and AI will never be properly ‌realized if tech companies prioritize pirate sites over scholarship and imagination," Maria Pallante, president of the Association of American Publishers, said in a statement.

The lawsuit opens a new front in the ongoing copyright ‌battle between creators and tech companies over AI training, in which dozens of authors, news outlets, ⁠visual ⁠artists and other plaintiffs have sued companies including Meta, OpenAI and Anthropic for infringement.

All of the pending cases will likely revolve around whether AI systems make fair use of copyrighted material by using it to create new, transformative content.

The first two judges to consider the matter issued diverging rulings last year.

Amazon- and Google-backed Anthropic was the first major AI company to settle one of the cases, agreeing last year to pay a group of authors $1.5 billion to resolve a class-action lawsuit that could have cost the company billions more in damages for alleged piracy.


Microsoft, Google and xAI to Give US Govt Early Access to AI Models for Security Checks

A Google logo is seen at a company research facility in Mountain View, California, US, May 13, 2025. (Reuters)
A Google logo is seen at a company research facility in Mountain View, California, US, May 13, 2025. (Reuters)
TT

Microsoft, Google and xAI to Give US Govt Early Access to AI Models for Security Checks

A Google logo is seen at a company research facility in Mountain View, California, US, May 13, 2025. (Reuters)
A Google logo is seen at a company research facility in Mountain View, California, US, May 13, 2025. (Reuters)

Microsoft, Google and Elon Musk’s xAI agreed to give the US government early access to new artificial intelligence models for national security testing, as US officials grow alarmed by the hacking capabilities of Anthropic’s newly unveiled Mythos.

The Center for AI Standards and Innovation at the Department of Commerce said on Tuesday that the agreement would allow it to evaluate the models before deployment and conduct research to assess their capabilities and security risks.

The agreement fulfills a pledge the Trump administration made in July 2025 to partner with technology companies to vet their AI models for “national security risks."

Microsoft will work with ‌US government scientists ‌to test AI systems “in ways that probe unexpected behaviors,” ‌the company ⁠said in a statement. ⁠Together they will develop shared datasets and workflows for testing the company’s models, the company said. Microsoft signed a similar agreement with the UK’s AI Security Institute, according to the statement.

Concern is growing in Washington over the national security risks posed by powerful AI systems. By securing early access to frontier models, US officials are aiming to identify threats ranging from cyberattacks to military misuse before the tools are widely deployed.

The development ⁠of advanced AI systems including Anthropic's Mythos has in recent weeks ‌created a stir globally, including among US officials ‌and corporate America, over their ability to supercharge hackers.

"Independent, rigorous measurement science is essential to understanding ‌frontier AI and its national security implications," CAISI Director Chris Fall said in ‌a statement.

The move builds on previous agreements with OpenAI and Anthropic, established in 2024 under the Biden administration when CAISI was known as the US Artificial Intelligence Safety Institute.

Under former President Joe Biden, the institute focused on developing AI tests, definitions and voluntary safety standards. It ‌was led by Biden tech adviser Elizabeth Kelly, who has since joined Anthropic, according to her LinkedIn profile.

CAISI, which serves ⁠as the government's ⁠main hub for AI model testing, said it had already completed more than 40 evaluations, including on cutting-edge models not yet available to the public.

Developers frequently hand over versions of their models with safety guardrails stripped back so the center can probe for national security risks, the agency said.

xAI did not immediately respond to a request for comment. Google declined to comment.

Last week, the Pentagon said it had reached agreements with seven AI companies to deploy their advanced capabilities on the Defense Department's classified networks as it seeks to broaden the range of AI providers working across the military.

The Pentagon announcement did not include Anthropic, which has been embroiled in a dispute with the Pentagon over guardrails on the military's use of its AI tools.


Samsung Electronics Appoints New TV Chief amid Mounting Competition

FILE PHOTO: The logo of Samsung Electronics is seen at the company's store in Seoul, South Korea, April 15, 2025.   REUTERS/Kim Hong-Ji/File Photo
FILE PHOTO: The logo of Samsung Electronics is seen at the company's store in Seoul, South Korea, April 15, 2025. REUTERS/Kim Hong-Ji/File Photo
TT

Samsung Electronics Appoints New TV Chief amid Mounting Competition

FILE PHOTO: The logo of Samsung Electronics is seen at the company's store in Seoul, South Korea, April 15, 2025.   REUTERS/Kim Hong-Ji/File Photo
FILE PHOTO: The logo of Samsung Electronics is seen at the company's store in Seoul, South Korea, April 15, 2025. REUTERS/Kim Hong-Ji/File Photo

Samsung Electronics, the world's No. 1 TV maker, has replaced its TV head for the first time in more than two years, as it faces mounting competition from Chinese rivals at home and abroad.

Samsung said in a statement on Monday that it has appointed Lee Won-jin, who was previously head of the Global Marketing Office, ⁠as the new ⁠head of its Visual Display Business, succeeding Yong Seok-woo, who will serve as an adviser.

Samsung usually carries out its annual management reshuffle around December, and the company did not disclose the ⁠reason for the replacement.

A Samsung Electronics official told Reuters the new leader is expected to bring a fresh perspective and the change needed for the TV business, which is facing intensifying market competition.

In March, China's TCL Electronics and Japan's Sony signed binding agreements for a strategic partnership in the home entertainment field, increasing pressure on rivals.

The ⁠Nikkei ⁠newspaper previously reported Samsung was considering discontinuing sales of home appliances and TVs in China within this year in the face of competition from Chinese companies that have undercut rivals.

Samsung said last month its TV profit declined in the first quarter because of stagnating demand and rising raw-material costs. Lee had previously worked at Google before moving to Samsung in 2014.