BAREC
م

In 2023, the Abu Dhabi Arabic Language Centre, in collaboration with a team from New York University Abu Dhabi and Zayed University, launched the Balanced Arabic Readability Corpus project “BAREC”, which aims at collecting a corpus of 10 million words that includes a wide spectrum of literary genres and topics from different countries, with a special focus on readability levels. Parts of this corpus will be annotated using a specific criteria, and these labels will form the basis for the development of artificial intelligence tools that can automatically determine the level of text readability.


Readability, or the measure of readability, is a concept closely related to the classification of texts into reading levels based on factors including spelling, morphology, grammar, and vocabulary complexity. Thus, developing standardised models of readability is vital to improve reading rates, aid language learning, and enhance academic achievement.


The project is committed to making all contributions open source to support the research community in the field of Arabic language computing, and to enrich linguistic resources in the Arabic language.

00:00
00:00