From Swahili to Zulu, African Techies Develop AI Language Tools

Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)
Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)
TT

From Swahili to Zulu, African Techies Develop AI Language Tools

Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)
Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)

When the Nigerian government announced plans in April to develop a multilingual AI tool to boost digital inclusion across the West African nation, 28-year-old computer science student Lwasinam Lenham Dilli was thrilled.

Dilli had struggled to scrape datasets from the internet to build a large language model (LLM), used to power AI chatbots, in his native Hausa language as part of his final-year project at university.

"I needed texts in English and their corresponding translation in Hausa but I couldn't get anything online, (there was) no clean data," Dilli told the Thomson Reuters Foundation.

"(Creating local language LLMs) is a way to ensure that our local dialects and languages will not be forgotten or left out of the AI ecosystem," he added.

The world has been swept up in a whirlwind of AI mania, with tools such as OpenAI's ChatGPT, Meta's Llama 2, and Mistral AI captivating millions globally with their ability to generate human-like text.

But for many tech-savvy Africans, the excitement has been tempered by a frustrating reality: when languages like Hausa, Amharic, or Kinyarwanda are entered into the chat, many of these advanced systems falter, often producing nonsensical responses.

Technology experts warn the lack of LLMs in African languages will lead to the exclusion of millions of people on the continent, increasing both the digital and economic divide.

The Nigerian government-led initiative to develop a multilingual LLM aims to level the playing field.

"The LLM will be trained on five low-resource languages and accented English to ensure stronger language representation ... for development of artificial intelligence solutions," said Nigeria's Digital Economy Minister Bosun Tijani in April.

The government will partner with Nigerian AI startups, and local data will be collected by volunteers who are fluent in any of five Nigerian languages: Yoruba, Hausa, Igbo, Ibibio, and West African lingua franca—Pidgin.

To build the model, the project will also draw on the expertise of more than 7,000 fellows from Nigeria's tech talent program - a government scheme to train three million people in skills such as coding and programming.

Silas Adekunle, co-founder of Awarri, an AI startup that is part of the initiative, said building a nuanced AI tool that understood Nigeria's unique language and cultural landscape presented many challenges.

"We have so many different accents and languages, and this (LLM) will enable many people and developers to build products that leverage AI but are for the Nigerian market," said Adekunle.

"The scale of the project, especially with limited resources, has required us to be creative in how we train the model, gather the data, compute and label what we have."

CLOSING THE AI LANGUAGE GAP

Africa is home to more than 2,000 languages spoken across 54 countries, according to the United Nations Educational, Scientific and Cultural Organization (UNESCO).

However, the majority of African languages remain underrepresented on the internet. English dominates the digital space, accounting for around 50% of all websites, followed by Spanish, German, Japanese, and French.

Along with the Nigerian government initiative, there are also a small but growing number of African startups rising to the challenge of developing AI tools in languages like Swahili, Amharic, Zulu and Sesotho.

In Kenya, for instance, health tech firm Jacaranda Health has pioneered the first LLM operating in Swahili to improve maternal healthcare in East Africa.

Built on Meta's Llama 3 system, UlizaLlama (AskLlama) aims to refine Jacaranda Health's SMS service for low-income Swahili-speaking expectant mothers who have queries ranging from dietary concerns and fetal movement to exercise during pregnancy.

The platform currently provides pre-written automated responses, but once UlizaLlama is integrated by the end of June, it will tailor responses to individual needs, offering more detailed pregnancy guidance and emergency support.

"A lot of these expectant moms can't just do a Google search. UlizaLlama's goal is to make sure that we get them the accurate answers in the fastest possible time," Jay Patel, Jacaranda Health's director of technology, told the Thomson Reuters Foundation.

"We're shooting for about 85% accuracy to start with and a faster response time. At the moment, it takes a few minutes to respond, but we are hoping to get that down to less than a minute in the future."

In South Africa, the Masakhane initiative is using open-source machine learning to translate African languages.

Lelapa AI, a South African AI research lab, has pioneered VulaVula – a for-profit language processing tool that translates, transcribes and analyses languages in English, Afrikaans, Zulu and Sesotho.

DATA SCARCITY, ETHICAL CONCERNS

But AI experts say building LLMs in African languages poses significant challenges, ranging from availability of data to ethical concerns over consent, compensation and copyright.

Many African languages are low-resource languages, meaning there is a scarcity of data to train these models effectively - unlike high-resource languages such as English or French.

Michael Michie, co-founder of Everse Technology Africa, an AI startup building intelligence into data protection and privacy, said collecting the data needed to train LLMs also raised ethical questions.

In many African communities, oral tradition predominates, and certain communities may not be interested in sharing their language to train LLMs and this should be respected.

"There are currently no regulations or laws in African countries that address issues related to consent, privacy and compensation to communities when collecting data to train AI tools - this needs to be addressed," said Michie.

"There are questions of who owns the language and who benefits. There needs to be guidelines to prevent exploitation and ensure the development of these LLMs benefits the people they are meant to serve," he added.

Open-source initiatives like Creative Commons, which allow creators to legally share their work with specified conditions like ensuring attribution and non-commercial use, are also not a perfect solution, said some AI experts.

"At the moment there's this push of saying everything should just be under Creative Commons," said Vukosi Marivate, associate professor of computer science at the University of Pretoria and co-founder of Lelapa AI.

But if everything is open source, it may be harder to properly reimburse and acknowledge the original contributors to these language models, he said.

"A lot of people are working on LLMs now because of the prestige, that's where the money is, but we need to make sure that our languages are actually being taken care of."



As AI Gains a Workplace Foothold, States are Trying to Make Sure Workers Don't Get Left Behind

Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)
Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)
TT

As AI Gains a Workplace Foothold, States are Trying to Make Sure Workers Don't Get Left Behind

Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)
Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)

With many jobs expected to eventually rely on generative artificial intelligence, states are trying to help workers beef up their tech skills before they become outdated and get outfoxed by machines that are becoming increasingly smarter.
Connecticut is working to create what proponents believe will be the country's first Citizens AI Academy, a free online repository of curated classes that users can take to learn basic skills or obtain a certificate needed for employment, The Associated Press said.
“This is a rapidly evolving area," said state Democratic Sen. James Maroney. "So we need to all learn what are the best sources for staying current. How can we update our skills? Who can be trusted sources?”
Determining what skills are necessary in an AI world can be a challenge for state legislators given the fast-moving nature of the technology and differing opinions about what approach is best.
Gregory LaBlanc, professor of Finance, Strategy and Law at the Haas School of Business at Berkeley Law School in California, says workers should be taught how to use and manage generative AI rather than how the technology works, partly because computers will soon be better able to perform certain tasks previously performed by humans.
“What we need is to lean into things that complement AI as opposed to learning to be really bad imitators of AI," he said. “We need to figure out what is AI not good at and then teach those things. And those things are generally things like creativity, empathy, high level problem solving.”
He said historically people have not needed to understand technological advancements in order for them to succeed.
“When electricity came along, we didn’t tell everybody that they needed to become electrical engineers,” LeBlanc said.
This year, at least four states — Connecticut, California, Mississippi and Maryland — proposed legislation that attempted to deal with AI in the classroom somehow. They ranged from Connecticut's planned AI Academy, which was originally included in a wide-ranging AI regulation bill that failed but the concept is still being developed by state education officials, to proposed working groups that examine how AI can be incorporated safely in public schools. Such a bill died in the Mississippi legislature while the others remain in flux.
One bill in California would require a state working group to consider incorporating AI literacy skills into math, science, history and social science curriculums.
“AI has the potential to positively impact the way we live, but only if we know how to use it, and use it responsibly,” said the bill's author, Assemblymember Marc Berman, in a statement. “No matter their future profession, we must ensure that all students understand basic AI principles and applications, that they have the skills to recognize when AI is employed, and are aware of AI’s implications, limitations, and ethical considerations."
The bill is backed by the California Chamber of Commerce. CalChamber Policy Advocate Ronak Daylami said in a statement that incorporating information into existing school curricula will “dispel the stigma and mystique of the technology, not only helping students become more discerning and intentional users and consumers of AI, but also better positioning future generations of workers to succeed in an AI-driven workforce and hopefully inspiring the next generation of computer scientists.”
While Connecticut's planned AI Academy is expected to offer certificates to people who complete certain skills programs that might be needed for careers, Maroney said the academy will also include the basics, from digital literacy to how to pose questions to a chatbot.
He said it's important for people to have the skills to understand, evaluate and effectively interact with AI technologies, whether it’s a chatbot or machines that learn to identify problems and make decisions that mimic human decision-making.
“Most jobs are going to require some form of literacy,” Maroney said. “I think that if you aren’t learning how to use it, you’ll be at a disadvantage."
A September 2023 study released by the job-search company Indeed found all US jobs listed on the platform had skills that could be performed or augmented by generative AI. Nearly 20% of the jobs were considered “highly exposed,” which means the technology is considered good or excellent at 80% or more of the skills that were mentioned in the Indeed job listings.
Nearly 46% of the jobs on the platform were “moderately exposed,” which means the GenAI can perform 50% to 80% of the skills.
Maroney said he is concerned how that skills gap — coupled with a lack of access to high-speed internet, computers and smart phones in some underserved communities — will exacerbate the inequity problem.
A report released in February from McKinsey and Company, a global management consulting firm, projected that generative AI could increase household wealth in the US by nearly $500 billion by 2045, but it would also increase the wealth gap between Black and white households by $43 billion annually.
Advocates have been working for years to narrow the nation’s digital skills gap, often focusing on the basics of computer literacy and improving access to reliable internet and devices, especially for people living in urban and rural areas. The advent of AI brings additional challenges to that task, said Marvin Venay, chief external affairs and advocacy officer for the Massachusetts-based organization Bring Tech Home.
“Education must be included in order for this to really take off publicly ... in a manner which is going to give people the ability to eliminate their barriers,” he said of AI. “And it has to be able to explain to the most common individual why it is not only a useful tool, but why this tool will be something that can be trusted.”
Tesha Tramontano-Kelly, executive director of the Connecticut-based group CfAL for Digital Inclusion, said she worries lawmakers are “putting the cart before the horse” when it comes to talking about AI training. Ninety percent of the youths and adults who use her organization's free digital literacy classes don't have a computer in the home.
While Connecticut is considered technologically advanced compared to many other states and nearly every household can get internet service, a recent state digital equity study found only about three-quarters subscribe to broadband. A survey conducted as part of the study found 47% of respondents find it somewhat or very difficult to afford internet service.
Of residents who reported household income at or below 150% of the federal poverty level, 32% don't own a computer and 13% don't own any internet enabled device.
Tramontano-Kelly said ensuring the internet is accessible and technology equipment is affordable are important first steps.
“So teaching people about AI is super important. I 100% agree with this,” she said. “But the conversation also needs to be about everything else that goes along with AI."