Spoken corpus
of Abaza

The corpus contains oral texts of the Tapanta dialect of the Abaza language. Recording were made during a joint HSE University / RSUH expeditions to the village of Inzhich-Chukun in the Abazinsky district of the Karachay-Cherkess Republic in 2017-2019. Text analysis and glossing was done by the participants in the research and study group “Aspects of Abaza Grammar” and the RSF grant # 17-18-01184 “Communicative organization of natural discourse in spoken and signed languages.” The search function entered a closed testing regime in December 2019.



Abaza texts are presented in both the Abaza Cyrillic orthography and the Latin transcription developed by the Moscow research group for the study of Northwest Caucasian languages. The differences between the IPA and the system of transcription used in the corpus are as follows:

  • ejective consonants are marked by a dot above or below the symbol (for example, ṗ, ṭ, ḳ),
  • palatalized consonants are marked with an apostrophe (for example, g’, ʁ’, χ’),
  • consonants c, č, š, ʒ, ǯ, ž, λ correspond to t͡s, t͡ʃ, ʃ, dz, d͡ʒ, ʒ, ɬ in the IPA,
  • consonants ŝ, ẑ, ĉ (hissing-hushing sounds) are absent from the IPA.

In the Cyrillic orthography stress is indicated by a capital letter, and in the transcription by an acute accent mark.


This corpus is one of a group that uses the search platform tsakorpus. Instructions with a description of the general technical capabilities of the search function in a corpus of this type can be found in the “Information” section (link with a question mark in the upper right corner of the search page). Below are a few rules specific to this corpus.

Word search with Abaza Cyrillic orthography

Words and wordforms in the Cyrillic orthography are entered in the field “word (orth).” For example, a search for the word апхьарта ‘school’ will result in all instances of this wordform in the corpus. Capital Latin I is used in the orthography to indicate ejective consonants, for example, пI, тI, кI.

Words can be searched for using regular expressions (for more detail see “Information”). For example, the mark ? means “any symbol,” and the mark * signifies “any amount of symbols.” Thus, a search for ла in the current version of the corpus will yield ла ‘dog’; a search for ла? will yield лан ‘her mother’; and a search for ла* will turn up ла ‘dog’, лан ‘her mother’, лара ‘she’, лаща ‘her brother’ and more. Register does not affect the search results.

This field is also used for searches by Russian translation. Before entering a Russian word, switch the “Language/Layer” field from “Abaza” to “Russian”.

Word search with Latin transcription

In the field “word (trans)” it is possible to search by wordform using the Latin transcription. For example, a search of the word apχ’arta ‘school’ will turn up all entries of this word in the corpus. The presence or absence of the stress mark does not affect the search results. Searches using regular expressions works the same as in the field “word (orth).”

Search with glosses

Searching by gloss makes it possible to get results which reveal a wordform’s morpheme structure. To search by gloss, enter the desired grammatical markers into the field “glosses” by hand (if the order of glosses is unimportant, separate them with commas) or click on the button in the right of the field to access a window containing a list of glosses to choose from as well as information regarding their location relative to the root and to one another.

The system of glosses used in the corpus is presented in the following tables.

1. Personal Prefixes

с-s-1sg.abs,, 1sg.erg
у-w-2m.abs,, 2m.erg
б-b-2f.abs,, 2f.erg
3pl.abs, 3m.erg
хI-h-1pl.abs,, 1pl.erg
шв-ŝ-2pl.abs,, 2pl.erg
л, 3f.erg
на- / а-na- / 3n.erg
р- / д-r- /, 3pl.erg
з, rel.erg

2. Other prefixes

аба-, а(й)-aba-, a(j), rec.ergreciprocal
ан-an-rel.tmptemporal relativizer
ба-/па-ba-/pa-qadvadverbial interrogative
гIа-ʕa-cslcislocative (directional preverb ‘hither’)
гь-g’-neg.empemphatic negation
з-z-rel.rsnreason relativizer
ма-ma-jud‘judicative’ (“from X-s point of view”)
мхъа-mqa-involinvoluntative (‘accidentally’)
на-na-trltranslocative (directional preverb ‘thither’)
other preverbs loc
elative form of the preverb
ш-š-rel.mnrmanner relativizer
(ъ)а-(ʔ)a-rel.loclocative relativizer

3. Suffixes

impimperative (no marker, is indicated in parentheses after the root)
aoraorist (no marker, is indicated in parentheses after the root)
resresultative (no marker, is indicated in parentheses after the root)
-ба/-па-ba/-paclnclassifier of non-humans (with numerals)
-бырг-bərgjustconverb (‘just as’)
-га-ganinsinstrument nominal
-гвыща-gʷəš’adprcdepreciative (regret)
-гIв-ʕʷclhclassifier of humans (with numerals)
-гIв-ʕʷnagagent nominalization
-гIваца-ʕʷacaimmedimmediative, restrictive (‘just now’)
-да-daqhhuman interrogative
-дза-ʒalimlimitive (‘[up] to’)
-дъа-dʔacarcaritive (‘without’)
-з(а)-z(a)pst.nfinnon-finite past tense
-заджвыкI-zaʒ̂əḳrstrrestrictive (‘only’)
-зара-zaracond.rsnreason conditional (‘once’)
-зд / -зтI-zd / -zṭprm.stpermissive (static verbs)
-(з)ла-(z)ladyndynamization of stative verbs
-зтын-ztəncond.realrealis conditional
-əjprspresent tense
-ижьтара-əjž’tarasinceconverb (‘since’)
-йа-jaqnnon-human interrogative
-кI-ḳunitunit counting suffix (with numerals)
-кIва-ḳʷacvb.negnegative converb
-ма-maqyes-no interrogative
-мгIва-mʕʷanpotpossibility nominalizer
-мыгIва-məʕʷadprcdepreciative (regret)
-npst(.dcl)past tense
-нацIкIьа(ра) / -ндзкIа(ра)-nac̣ḳa(ra) / ‑nʒḳa(ra)untilconverb ('until, while’)
-пхьадза-pχ’aʒaeachconverb (‘every time [when]’)
-пI/-б-ṗ/-bnpst.dclnonpast tense
-р(а)-r(a)fut.nfinnon-finite future
-ргIа-rʕaasplassociative plural
-ргIад-rʕadprm.dynpermissive for dynamic verbs
-р(ы)квын / ‑р(ы)кIвын-r(ə)kʷən / ‑r(ə)ḳʷəncondgeneral conditional
-рныс-rnəspurppurposive converb
-рта-rtanlocplace nominalization
-snondumcunctative (‘not yet’, only with negation)
-стI-sṭempemphasis (with imperative)
-əwprs.nfinnon-finite present tense (static verbs only)
-уа /-у-wa /-əwipfimperfective (dynamic verbs only)
-уачва-waĉadfcldifficilitive (‘difficult’)
rerefactive (‘again’)
-хвы-χʷəfclfacilitive (‘easy’)
-хьа-χ’aiamiamitive (‘already’)
-чва-ĉaexcnimifactive, excessive (‘too much)
-чва-ĉaplhhuman plural
futfuture tense
-ша-šacirсcircumferentive (‘around’)
-шва-ŝasmlsimilative (‘as if’)
-ща-š’anmnrmanner nominalizer

4. Some grammatical roots

-чIвы--ĉ̣ə-npronominal proform
хIваhʷaquotquotative (< say)

Corpus Composition

Currently the corpus consists of 25 texts with total run time of 53 minutes. The corpus contains 3,636 tokens.

The texts were recorded from 8 speakers born between 1930 and 1961. All the texts are spontaneous stories about the speakers’ lives, village traditions, tales and legends. More detailed information about each text can be found in the table which appears when choosing the subcorpus.

Project participants

Corpus texts were prepared as a product of the research project (No. 18-05-0014) realized through ‘The National Research University – Higher School of Economics’ Academic Fund Program in 2019 and funded by the Russian Academic Excellence Project '5-100'.

Help in analyzing the texts was provided by Abaza language speakers T. M. Abazova, O. R. Adzhieva, F. B. Aysanova, F. M. Asanayeva, A. A. Bidzhev, S. Z. Dzhandubayeva, K. M. Dzhuzhuyev, F. M. Kopsergenova, S. M. Ozova. D. O. Usha, O. Sh. Usha, T. O. Usha, Z. M. Chukova, O. M. Chukova, and A. Sh. Tsekov.

Students and faculty of HSE University and RSUH D. A. Arakelova, P. M. Arkadiev, S. P. Durneva, E. S. Klyagina, A. G. Koshevoy, Yu. A. Lander, A. B. Panova, K. I. Romanova, A. A. Rossius, A. D. Sorokina, Ya. G. Testelets, and A. I. Fedorenko.

The texts were prepared for publication in the form of the corpus by A. B. Panova and A. D. Sorokina. Further technical work was done by E. O. Sokur.

The Spoken corpus of Abaza is supported by the Linguistic Convergence Laboratory at the Higher School of Economics. The corpus was created within the framework of the HSE University Basic Research Program and funded by the Russian Academic Excellence Project '5-100'.


You may contact us with questions about the Corpus:
Anastasia Panova:

Or with questions about the search platform:
Elena Sokur:

How to cite the corpus

If you use data from the Spoken corpus of Abaza in your research, please cite as follows:

Anastasia Panova, Anna Sorokina, Peter Arkadiev, Elena Sokur. Spoken corpus of Abaza. Moscow: School of Linguistics, HSE University; Linguistic Convergence Laboratory, HSE University. (Available online at:, accessed on .)