One of the major challenges that under-represented and endangered language communities face in language technology is the lack or paucity of language data. This is also the case of the Southern varieties of the Kurdish and Laki languages for which very limited resources are available with insubstantial progress in tools. To tackle this, we provide a few approaches that rely on the content of local news websites, a local radio station that broadcasts content in Southern Kurdish and fieldwork for Laki. In this paper, we describe some of the challenges of such under-represented languages, particularly in writing and standardization, and also, in retrieving sources of data and retro-digitizing handwritten content to create a corpus for Southern Kurdish and Laki. In addition, we study the task of language identification in light of the other variants of Kurdish and Zaza-Gorani languages.
Archives
Assessing the Quality of Machine Translation from Kurmanji Kurdish into English
The assessment of quality by the current most widely used on-line machine translation systems such as Google Translate and Bing Translator has always been a hotly debated and controversial topic. This research endeavors to assess the translation quality of the already referred to on-line machine translation systems so as to highlight the level of their inadequate quality, if any. Yet, due to the nonexistence of a unique quality assessment method as far as the translation by the two systems is concerned, the current research sets out to utilize an error analysis method for assessing the quality of the translation of two specialized texts from Kurdish into English by Google Translate and Bing Translator systems. The error analysis of the chosen texts reveals that both systems achieved excellent results in the orthography category, with 100 and 98.7 percent accuracy for Google and Bing, respectively. Additionally, results of 98.8% for Google and 97.5% for Bing concerning lexis reflected positive outcomes for both systems. Because both systems recently adopted NMT (neural machine translation), which simulates the way human brain functions to produce translation and learns from texts formerly translated by human translators, the two systems performed very well in these areas. The analysis also shows that the two selected systems were successful in the translation of the selected texts with reference to English rules of grammar achieving outstanding results that are 99.6 accuracy for Google and 99.4 for Bing. For further research, this study recommends doing more assessment on translation of more types of Kurdish texts through conducting the linguistic error analysis.