The Corpus of Karelian Russian is a collection of spoken texts recorded in Karelia between 2019 − 2021. As of April 2022, the Corpus includes 60+ hours of recorded sociolinguistic and socio-anthropological interviews with residents of Petrozavodsk, Kostomuksha, Voknavolok, Pryazha, Vedlozero, Svyatozero, Kroshnozero, Veshkelitsa, Olonets, Verkhov’ye, Tuuksa, Vidlitsa, Kovera, Kotkozero, Ilyinsky, Gavrilovka, Simon-Navolok, Rechnaya Selga, Bol’shaya Selga, Kuytezha, Ust’ye Vidlitsy, Bolshiye Gory, Yurgelitsa, Nurmolitsy, Megrega. Most texts feature biographical narratives, discussion of Karelian language and the linguistic situation in the region, as well as Karelian culture and everyday life.
All speakers are bilingual (Karelian / Russian), with Russian acquired as L2 during their teen years. The speakers’ ages vary from 22 to 87.
Speech samples represent a contact variety of Russian spoken by native speakers of Karelian. The Corpus can be used in studies of bilingual speech phenomena and features of Russian influenced by Karelian.
The dataset was collected by Laboratory for Arctic Social Sciences and Humanities and a team of students from the National Research University Higher School of Economics (Moscow) during fieldwork trips funded by the HSE Foundation for Educational Initiatives (“Rediscovering Russia” program). The recording and annotation of the Corpus of Russian spoken in Karelia is supported by the Linguistic Convergence Laboratory of the Higher School of Economics. It is conducted within the framework of the Basic Research Program at the National Research University 'Higher School of Economics' (HSE) and supported as part of the Russian Academic Excellence Project '5-100'.
New texts will be added to this corpus after future fieldwork sessions.
The annotation of the texts was done in standard Russian orthography and does not reflect the phonetic and morphological peculiarities of the dialect. Dialectal representations of inflectional endings were replaced with their literary cognates. The phonetic composition of stems (including roots and affixes) was preserved. The syntax and, in part, agreement patterns particular to the dialect have also been preserved. In order to study the linguistic peculiarities of the texts in the corpus, it is necessary to listen to the audio recordings of the examples.