A collection of high-quality datasets, pre-trained models, and specialized linguistic tools for the Somali language.
Curated dataset for fine-tuning Somali language models on specific tasks.
Large-scale pretraining dataset for Somali language models.
A comprehensive corpus for Somali lemmatization developed through crowdsourcing efforts.
Multimodal question answering dataset for Somali language understanding.
Parallel corpus for machine translation involving Somali and other African languages.
News articles dataset for classification tasks, part of the Masakhane African languages initiative.
Enhanced BERT model for Somali language tasks (Coming soon).
T5-based model fine-tuned for Somali language generation and understanding tasks.
BERT-based model specifically trained for Somali language understanding and fake news detection.
Comprehensive lexical database for Somali language resources and linguistic analysis.