One of the major challenges that under-represented and endangered language communities face in language technology is the lack or paucity of language data. This is also the case of the Southern varieties of the Kurdish and Laki languages for which very limited resources are available with insubstantial progress in tools. To tackle this, we provide a few approaches that rely on the content of local news websites, a local radio station that broadcasts content in Southern Kurdish and fieldwork for Laki. In this paper, we describe some of the challenges of such under-represented languages, particularly in writing and standardization, and also, in retrieving sources of data and retro-digitizing handwritten content to create a corpus for Southern Kurdish and Laki. In addition, we study the task of language identification in light of the other variants of Kurdish and Zaza-Gorani languages.
Archives
Towards a dialectology of Southern Kurdish: Where to begin?
This contribution provides an overview of the current state of knowledge on the dialectology of Southern Kurdish (hereafter SK). The introductory paragraphs discuss the concept of SK, survey existing sources and briefy address core issues of terminology. The bulk of the study reviews Fattah’s (2000: 9) proposed dialect classifcation, and complements it with the evaluation of language data from older sources, the author’s own research in Kermānshāh Province and other documentation activities recently carried out in the SK-speaking area, sketching possible directions for future research.
The Laki variety of Harsin
This book presents a documentation and analysis of Harsini, the language variety spoken by the people of Harsin, a small urban centre located in south-east Kermānshāh Province, western Iran. The main features of phonology and morphosyntax are outlined, and an extensive corpus of transcribed spoken texts, recorded in situ, is also provided, together with a lexicon. The book also includes comparative notes and discussion of the place of Harsini within Laki, and its relationship to Southern Kurdish. The sound files from the text corpus are available online at https://multicast.aspra.uni-bamberg.de/resources/kurdish/#laki