The corpus contains oral texts of the Tapanta dialect of the Abaza language. Recording were made during a joint HSE University / RSUH expeditions to the village of Inzhich-Chukun in the Abazinsky district of the Karachay-Cherkess Republic in 2017-2019. Text analysis and glossing was done by the participants in the research and study group “Aspects of Abaza Grammar” and the RSF grant # 17-18-01184 “Communicative organization of natural discourse in spoken and signed languages.” The search function entered a closed testing regime in December 2019.
Abaza texts are presented in both the Abaza Cyrillic orthography and the Latin transcription developed by the Moscow research group for the study of Northwest Caucasian languages. The differences between the IPA and the system of transcription used in the corpus are as follows:
In the Cyrillic orthography stress is indicated by a capital letter, and in the transcription by an acute accent mark.
This corpus is one of a group that uses the search platform tsakorpus. Instructions with a description of the general technical capabilities of the search function in a corpus of this type can be found in the “Information” section (link with a question mark in the upper right corner of the search page). Below are a few rules specific to this corpus.
Words and wordforms in the Cyrillic orthography are entered in the field “word (orth).” For example, a search for the word апхьарта ‘school’ will result in all instances of this wordform in the corpus. Capital Latin I is used in the orthography to indicate ejective consonants, for example, пI, тI, кI.
Words can be searched for using regular expressions (for more detail see “Information”). For example, the mark ? means “any symbol,” and the mark * signifies “any amount of symbols.” Thus, a search for ла in the current version of the corpus will yield ла ‘dog’; a search for ла? will yield лан ‘her mother’; and a search for ла* will turn up ла ‘dog’, лан ‘her mother’, лара ‘she’, лаща ‘her brother’ and more. Register does not affect the search results.
This field is also used for searches by Russian translation. Before entering a Russian word, switch the “Language/Layer” field from “Abaza” to “Russian”.
In the field “word (trans)” it is possible to search by wordform using the Latin transcription. For example, a search of the word apχ’arta ‘school’ will turn up all entries of this word in the corpus. The presence or absence of the stress mark does not affect the search results. Searches using regular expressions works the same as in the field “word (orth).”
Searching by gloss makes it possible to get results which reveal a wordform’s morpheme structure. To search by gloss, enter the desired grammatical markers into the field “glosses” by hand (if the order of glosses is unimportant, separate them with commas) or click on the button in the right of the field to access a window containing a list of glosses to choose from as well as information regarding their location relative to the root and to one another.
The system of glosses used in the corpus is presented in the following tables.
1. Personal Prefixes
orthography | transcription | gloss |
с- | s- | 1sg.abs, 1sg.io, 1sg.erg |
у- | w- | 2m.abs, 2m.io, 2m.erg |
б- | b- | 2f.abs, 2f.io, 2f.erg |
д- | d- | 3h.abs |
й- | j- | 3n.abs 3pl.abs 3m.io, 3m.erg rel.abs |
хI- | h- | 1pl.abs, 1pl.io, 1pl.erg |
шв- | ŝ- | 2pl.abs, 2pl.io, 2pl.erg |
л- | l- | 3f.io, 3f.erg |
на- / а- | na- / a- | 3n.io. 3n.erg |
р- / д- | r- / d- | 3pl.io, 3pl.erg |
з- | z- | rel.io, rel.erg |
2. Other prefixes
orthography | transcription | gloss | name |
а- | a- | dat | dative |
а- | a- | def | definiteness |
а(й)- | (a)j- | soc | sociative |
аба-, а(й)- | aba-, a(j)- | rec.io, rec.erg | reciprocal |
ан- | an- | rel.tmp | temporal relativizer |
ата- | ata- | rep | repetitive |
ба-/па- | ba-/pa- | qadv | adverbial interrogative |
гIа- | ʕa- | csl | cislocative (directional preverb ‘hither’) |
гь- | g’- | neg.emp | emphatic negation |
дза- | ʒa- | lim | limitive |
з- | z- | ben pot | benefactive potential |
з- | z- | rel.rsn | reason relativizer |
ла- | la- | ins | instrumental |
м- | m- | neg | negation |
ма- | ma- | jud | ‘judicative’ (“from X-s point of view”) |
мхъа- | mqa- | invol | involuntative (‘accidentally’) |
на- | na- | trl | translocative (directional preverb ‘thither’) |
other preverbs | loc loc.elat | elative form of the preverb | |
р- | r- | caus | causative |
тш- | č- | rfl.abs | reflexive |
ц- | c- | com | comitative |
чв- | ĉ- | mal | malefactive |
ш- | š- | rel.mnr | manner relativizer |
(ъ)а- | (ʔ)a- | rel.loc | locative relativizer |
3. Suffixes
orthography | transcription | gloss | name |
— | — | imp | imperative (no marker, is indicated in parentheses after the root) |
— | — | aor | aorist (no marker, is indicated in parentheses after the root) |
— | — | res | resultative (no marker, is indicated in parentheses after the root) |
-ба/-па | -ba/-pa | cln | classifier of non-humans (with numerals) |
-бырг | -bərg | just | converb (‘just as’) |
-га | -ga | nins | instrument nominal |
-гвыща | -gʷəš’a | dprc | depreciative (regret) |
-(гь)и | -(g’)əj | add | additive |
-гьашва | -g’aŝa | int | intensive |
-гIа | -ʕa | elat | elative |
-гIа | -ʕa | nml | nominalization |
-гIв | -ʕʷ | clh | classifier of humans (with numerals) |
-гIв | -ʕʷ | nag | agent nominalization |
-гIвышва | -ʕʷəŝa | int | intensive |
-гIваца | -ʕʷaca | immed | immediative, restrictive (‘just now’) |
-да | -da | qh | human interrogative |
-дза | -ʒa | ass | assertive-intensive |
-дза | -ʒa | lim | limitive (‘[up] to’) |
-дъа | -dʔa | car | caritive (‘without’) |
-з(а) | -z(a) | pst.nfin | non-finite past tense |
-за | -za | infr | inferential |
-заджвыкI | -zaʒ̂əḳ | rstr | restrictive (‘only’) |
-зара | -zara | cond.rsn | reason conditional (‘once’) |
-зд / -зтI | -zd / -zṭ | prm.st | permissive (static verbs) |
-запыт | -zapət | freq | frequentative |
-(з)ла | -(z)la | dyn | dynamization of stative verbs |
-зд.хIва | -zd.hʷa | cnc | concessive |
-зтын | -ztən | cond.real | realis conditional |
-и | -əj | prs | present tense |
-ижьтара | -əjž’tara | since | converb (‘since’) |
-ищтI | -əjš’ṭ | emp | emphasis |
-йа | -ja | qn | non-human interrogative |
-ква | -kʷa | pl | plural |
-кI | -ḳ | indf | indefiniteness |
-кI | -ḳ | unit | unit counting suffix (with numerals) |
-кIва | -ḳʷa | cvb.neg | negative converb |
-ла | -la | lat | lative |
-ла | -la | hab | habitual |
-ла | -la | ins | instrumental |
-ла | -la | cnс | concessive |
-ма | -ma | q | yes-no interrogative |
-мгIва | -mʕʷa | npot | possibility nominalizer |
-мца | -mca | cvb | converb |
-мыгIва | -məʕʷa | dprc | depreciative (regret) |
-н | -n | pst(.dcl) | past tense |
-нацIкIьа(ра) / -ндзкIа(ра) | -nac̣ḳa(ra) / ‑nʒḳa(ra) | until | converb ('until, while’) |
-нда | -nda | opt | optative |
-пхьадза | -pχ’aʒa | each | converb (‘every time [when]’) |
-пI/-б | -ṗ/-b | npst.dcl | nonpast tense |
-р(а) | -r(a) | fut.nfin | non-finite future |
-ра | -ra | msd | masdar |
-ра | -ra | nml | nominalization |
-ргIа | -rʕa | aspl | associative plural |
-ргIад | -rʕad | prm.dyn | permissive for dynamic verbs |
-ркIва | -rḳʷa | cnt | continuative |
-р(ы)квын / ‑р(ы)кIвын | -r(ə)kʷən / ‑r(ə)ḳʷən | cond | general conditional |
-рныс | -rnəs | purp | purposive converb |
-рта | -rta | nloc | place nominalization |
-с | -s | nondum | cunctative (‘not yet’, only with negation) |
-стI | -sṭ | emp | emphasis (with imperative) |
-тI/-д | -ṭ/-d | dcl | declarative |
-та | -ta | adv | adverbial |
-у | -əw | prs.nfin | non-finite present tense (static verbs only) |
-уа /-у | -wa /-əw | ipf | imperfective (dynamic verbs only) |
-уачва | -waĉa | dfcl | difficilitive (‘difficult’) |
-х | -χ | re | refactive (‘again’) |
-ха | -χa | inc | inceptive |
-хвы | -χʷə | fcl | facilitive (‘easy’) |
-хьа | -χ’a | iam | iamitive (‘already’) |
-чва | -ĉa | exc | nimifactive, excessive (‘too much) |
-чва | -ĉa | plh | human plural |
-ш | -š | fut | future tense |
-ша | -ša | cirс | circumferentive (‘around’) |
-шва | -ŝa | sml | similative (‘as if’) |
-ща | -š’a | nmnr | manner nominalizer |
-ъа | -ʔa | ploc | locative |
4. Some grammatical roots
orthography | transcription | gloss | name |
-кIв(а)- | -ḳʷ(a)- | cop | copula |
-чIвы- | -ĉ̣ə- | npro | nominal proform |
ари | arəj | prox | |
ани | anəj | med | |
ауи | awəj | dist | |
хIва | hʷa | quot | quotative (< say) |
дына | dəna | filler | filler |
ptcl | particle |
Currently the corpus consists of 25 texts with total run time of 53 minutes. The corpus contains 3,636 tokens.
The texts were recorded from 8 speakers born between 1930 and 1961. All the texts are spontaneous stories about the speakers’ lives, village traditions, tales and legends. More detailed information about each text can be found in the table which appears when choosing the subcorpus.
Corpus texts were prepared as a product of the research project (No. 18-05-0014) realized through ‘The National Research University – Higher School of Economics’ Academic Fund Program in 2019 and funded by the Russian Academic Excellence Project '5-100'.
Help in analyzing the texts was provided by Abaza language speakers T. M. Abazova, O. R. Adzhieva, F. B. Aysanova, F. M. Asanayeva, A. A. Bidzhev, S. Z. Dzhandubayeva, K. M. Dzhuzhuyev, F. M. Kopsergenova, S. M. Ozova. D. O. Usha, O. Sh. Usha, T. O. Usha, Z. M. Chukova, O. M. Chukova, and A. Sh. Tsekov.
Students and faculty of HSE University and RSUH D. A. Arakelova, P. M. Arkadiev, S. P. Durneva, E. S. Klyagina, A. G. Koshevoy, Yu. A. Lander, A. B. Panova, K. I. Romanova, A. A. Rossius, A. D. Sorokina, Ya. G. Testelets, and A. I. Fedorenko.
The texts were prepared for publication in the form of the corpus by A. B. Panova and A. D. Sorokina. Further technical work was done by E. O. Sokur.
The Spoken corpus of Abaza is supported by the Linguistic Convergence Laboratory at the Higher School of Economics. The corpus was created within the framework of the HSE University Basic Research Program and funded by the Russian Academic Excellence Project '5-100'.
You may contact us with questions about the Corpus:
Anastasia Panova: anastasia.b.panova@gmail.com
Or with questions about the search platform:
Elena Sokur: elena.o.sokur@gmail.com
If you use data from the Spoken corpus of Abaza in your research, please cite as follows: