SAIL Logo
HomeAboutProjectsNews & EventsNLP ResourcesContact
SAIL Logo

Somali-language AI and Innovation Lab — Pioneering the digital frontier for Somali language through cutting-edge AI research and innovation.

Jamhuriya University of Science and Technology
Mogadishu, Somalia
sail@just.edu.so
+252 - 61- 2223999

About

  • About SAIL
  • Research Areas
  • Why SAIL?

Quick Links

  • Featured Projects
  • News & Insights
  • Resources
  • Contact

2026 SAIL - Somali-language AI and Innovation Lab. All rights reserved.

AIcompleted

AfriMTE and AfriCOMET: Enhancing COMET to Embrace Under-resourced African Languages

Read Full Article
March 8, 2026
SAIL Team

Abstract

Despite the recent progress on scaling multilingual machine translation (MT) to several under-resourced African languages, accurately measuring this progress remains challenging, since evaluation is often performed on n-gram matching metrics such as BLEU, which typically show a weaker correlation with human judgments. Learned metrics such as COMET have higher correlation; however, the lack of evaluation data with human ratings for under-resourced languages, complexity of annotation guidelines like Multidimensional Quality Metrics (MQM), and limited language coverage of multilingual encoders have hampered their applicability to African languages. In this paper, we address these challenges by creating high-quality human evaluation data with simplified MQM guidelines for error detection and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AfriCOMET: COMET evaluation metrics for African languages by leveraging DA data from well-resourced languages and an African-centric multilingual encoder (AfroXLM-R) to create the state-of-the-art MT evaluation metrics for African languages with respect to Spearman-rank correlation with human judgments (0.441).

Related Projects

Explore more projects in this category

Research Paper
NLP

Morphologically-informed Somali Lemmatization Corpus built with a Web-based Crowdsourcing Platform

Somali NLP Engine
AI/NLP

Detection of Somali-written Fake News and Toxic Messages on the Social Media Using Large Language Models

OCR System
NLP

CIRAL: A Test Collection for CLIR Evaluation in African Languages