Spoken corpus of Abaza

Spoken corpus
of Abaza

The corpus contains oral texts of the Tapanta dialect of the Abaza language. Recording were made during a joint HSE University / RSUH expeditions to the village of Inzhich-Chukun in the Abazinsky district of the Karachay-Cherkess Republic in 2017-2019. Text analysis and glossing was done by the participants in the research and study group “Aspects of Abaza Grammar” and the RSF grant # 17-18-01184 “Communicative organization of natural discourse in spoken and signed languages.” The search function entered a closed testing regime in December 2019.

Search

Transcription

Abaza texts are presented in both the Abaza Cyrillic orthography and the Latin transcription developed by the Moscow research group for the study of Northwest Caucasian languages. The differences between the IPA and the system of transcription used in the corpus are as follows:

ejective consonants are marked by a dot above or below the symbol (for example, ṗ, ṭ, ḳ),
palatalized consonants are marked with an apostrophe (for example, g’, ʁ’, χ’),
consonants c, č, š, ʒ, ǯ, ž, λ correspond to t͡s, t͡ʃ, ʃ, dz, d͡ʒ, ʒ, ɬ in the IPA,
consonants ŝ, ẑ, ĉ (hissing-hushing sounds) are absent from the IPA.

In the Cyrillic orthography stress is indicated by a capital letter, and in the transcription by an acute accent mark.

Instruction

This corpus is one of a group that uses the search platform tsakorpus. Instructions with a description of the general technical capabilities of the search function in a corpus of this type can be found in the “Information” section (link with a question mark in the upper right corner of the search page). Below are a few rules specific to this corpus.

Word search with Abaza Cyrillic orthography

Words and wordforms in the Cyrillic orthography are entered in the field “word (orth).” For example, a search for the word апхьарта ‘school’ will result in all instances of this wordform in the corpus. Capital Latin I is used in the orthography to indicate ejective consonants, for example, пI, тI, кI.

Words can be searched for using regular expressions (for more detail see “Information”). For example, the mark ? means “any symbol,” and the mark * signifies “any amount of symbols.” Thus, a search for ла in the current version of the corpus will yield ла ‘dog’; a search for ла? will yield лан ‘her mother’; and a search for ла* will turn up ла ‘dog’, лан ‘her mother’, лара ‘she’, лаща ‘her brother’ and more. Register does not affect the search results.

This field is also used for searches by Russian translation. Before entering a Russian word, switch the “Language/Layer” field from “Abaza” to “Russian”.

Word search with Latin transcription

In the field “word (trans)” it is possible to search by wordform using the Latin transcription. For example, a search of the word apχ’arta ‘school’ will turn up all entries of this word in the corpus. The presence or absence of the stress mark does not affect the search results. Searches using regular expressions works the same as in the field “word (orth).”

Search with glosses

Searching by gloss makes it possible to get results which reveal a wordform’s morpheme structure. To search by gloss, enter the desired grammatical markers into the field “glosses” by hand (if the order of glosses is unimportant, separate them with commas) or click on the button in the right of the field to access a window containing a list of glosses to choose from as well as information regarding their location relative to the root and to one another.

The system of glosses used in the corpus is presented in the following tables.

1. Personal Prefixes

orthography	transcription	gloss
с-	s-	1sg.abs, 1sg.io, 1sg.erg
у-	w-	2m.abs, 2m.io, 2m.erg
б-	b-	2f.abs, 2f.io, 2f.erg
д-	d-	3h.abs
й-	j-	3n.abs 3pl.abs 3m.io, 3m.erg rel.abs
хI-	h-	1pl.abs, 1pl.io, 1pl.erg
шв-	ŝ-	2pl.abs, 2pl.io, 2pl.erg
л-	l-	3f.io, 3f.erg
на- / а-	na- / a-	3n.io. 3n.erg
р- / д-	r- / d-	3pl.io, 3pl.erg
з-	z-	rel.io, rel.erg

2. Other prefixes

orthography	transcription	gloss	name
а-	a-	dat	dative
а-	a-	def	definiteness
а(й)-	(a)j-	soc	sociative
аба-, а(й)-	aba-, a(j)-	rec.io, rec.erg	reciprocal
ан-	an-	rel.tmp	temporal relativizer
ата-	ata-	rep	repetitive
ба-/па-	ba-/pa-	qadv	adverbial interrogative
гIа-	ʕa-	csl	cislocative (directional preverb ‘hither’)
гь-	g’-	neg.emp	emphatic negation
дза-	ʒa-	lim	limitive
з-	z-	ben pot	benefactive potential
з-	z-	rel.rsn	reason relativizer
ла-	la-	ins	instrumental
м-	m-	neg	negation
ма-	ma-	jud	‘judicative’ (“from X-s point of view”)
мхъа-	mqa-	invol	involuntative (‘accidentally’)
на-	na-	trl	translocative (directional preverb ‘thither’)
other preverbs		loc loc.elat	elative form of the preverb
р-	r-	caus	causative
тш-	č-	rfl.abs	reflexive
ц-	c-	com	comitative
чв-	ĉ-	mal	malefactive
ш-	š-	rel.mnr	manner relativizer
(ъ)а-	(ʔ)a-	rel.loc	locative relativizer

3. Suffixes

orthography	transcription	gloss	name
—	—	imp	imperative (no marker, is indicated in parentheses after the root)
—	—	aor	aorist (no marker, is indicated in parentheses after the root)
—	—	res	resultative (no marker, is indicated in parentheses after the root)
-ба/-па	-ba/-pa	cln	classifier of non-humans (with numerals)
-бырг	-bərg	just	converb (‘just as’)
-га	-ga	nins	instrument nominal
-гвыща	-gʷəš’a	dprc	depreciative (regret)
-(гь)и	-(g’)əj	add	additive
-гьашва	-g’aŝa	int	intensive
-гIа	-ʕa	elat	elative
-гIа	-ʕa	nml	nominalization
-гIв	-ʕʷ	clh	classifier of humans (with numerals)
-гIв	-ʕʷ	nag	agent nominalization
-гIвышва	-ʕʷəŝa	int	intensive
-гIваца	-ʕʷaca	immed	immediative, restrictive (‘just now’)
-да	-da	qh	human interrogative
-дза	-ʒa	ass	assertive-intensive
-дза	-ʒa	lim	limitive (‘[up] to’)
-дъа	-dʔa	car	caritive (‘without’)
-з(а)	-z(a)	pst.nfin	non-finite past tense
-за	-za	infr	inferential
-заджвыкI	-zaʒ̂əḳ	rstr	restrictive (‘only’)
-зара	-zara	cond.rsn	reason conditional (‘once’)
-зд / -зтI	-zd / -zṭ	prm.st	permissive (static verbs)
-запыт	-zapət	freq	frequentative
-(з)ла	-(z)la	dyn	dynamization of stative verbs
-зд.хIва	-zd.hʷa	cnc	concessive
-зтын	-ztən	cond.real	realis conditional
-и	-əj	prs	present tense
-ижьтара	-əjž’tara	since	converb (‘since’)
-ищтI	-əjš’ṭ	emp	emphasis
-йа	-ja	qn	non-human interrogative
-ква	-kʷa	pl	plural
-кI	-ḳ	indf	indefiniteness
-кI	-ḳ	unit	unit counting suffix (with numerals)
-кIва	-ḳʷa	cvb.neg	negative converb
-ла	-la	lat	lative
-ла	-la	hab	habitual
-ла	-la	ins	instrumental
-ла	-la	cnс	concessive
-ма	-ma	q	yes-no interrogative
-мгIва	-mʕʷa	npot	possibility nominalizer
-мца	-mca	cvb	converb
-мыгIва	-məʕʷa	dprc	depreciative (regret)
-н	-n	pst(.dcl)	past tense
-нацIкIьа(ра) / -ндзкIа(ра)	-nac̣ḳa(ra) / ‑nʒḳa(ra)	until	converb ('until, while’)
-нда	-nda	opt	optative
-пхьадза	-pχ’aʒa	each	converb (‘every time [when]’)
-пI/-б	-ṗ/-b	npst.dcl	nonpast tense
-р(а)	-r(a)	fut.nfin	non-finite future
-ра	-ra	msd	masdar
-ра	-ra	nml	nominalization
-ргIа	-rʕa	aspl	associative plural
-ргIад	-rʕad	prm.dyn	permissive for dynamic verbs
-ркIва	-rḳʷa	cnt	continuative
-р(ы)квын / ‑р(ы)кIвын	-r(ə)kʷən / ‑r(ə)ḳʷən	cond	general conditional
-рныс	-rnəs	purp	purposive converb
-рта	-rta	nloc	place nominalization
-с	-s	nondum	cunctative (‘not yet’, only with negation)
-стI	-sṭ	emp	emphasis (with imperative)
-тI/-д	-ṭ/-d	dcl	declarative
-та	-ta	adv	adverbial
-у	-əw	prs.nfin	non-finite present tense (static verbs only)
-уа /-у	-wa /-əw	ipf	imperfective (dynamic verbs only)
-уачва	-waĉa	dfcl	difficilitive (‘difficult’)
-х	-χ	re	refactive (‘again’)
-ха	-χa	inc	inceptive
-хвы	-χʷə	fcl	facilitive (‘easy’)
-хьа	-χ’a	iam	iamitive (‘already’)
-чва	-ĉa	exc	nimifactive, excessive (‘too much)
-чва	-ĉa	plh	human plural
-ш	-š	fut	future tense
-ша	-ša	cirс	circumferentive (‘around’)
-шва	-ŝa	sml	similative (‘as if’)
-ща	-š’a	nmnr	manner nominalizer
-ъа	-ʔa	ploc	locative

4. Some grammatical roots

orthography	transcription	gloss	name
-кIв(а)-	-ḳʷ(a)-	cop	copula
-чIвы-	-ĉ̣ə-	npro	nominal proform
ари	arəj	prox
ани	anəj	med
ауи	awəj	dist
хIва	hʷa	quot	quotative (< say)
дына	dəna	filler	filler
		ptcl	particle

Corpus Composition

Currently the corpus consists of 25 texts with total run time of 53 minutes. The corpus contains 3,636 tokens.

The texts were recorded from 8 speakers born between 1930 and 1961. All the texts are spontaneous stories about the speakers’ lives, village traditions, tales and legends. More detailed information about each text can be found in the table which appears when choosing the subcorpus.

Project participants

Corpus texts were prepared as a product of the research project (No. 18-05-0014) realized through ‘The National Research University – Higher School of Economics’ Academic Fund Program in 2019 and funded by the Russian Academic Excellence Project '5-100'.

Help in analyzing the texts was provided by Abaza language speakers T. M. Abazova, O. R. Adzhieva, F. B. Aysanova, F. M. Asanayeva, A. A. Bidzhev, S. Z. Dzhandubayeva, K. M. Dzhuzhuyev, F. M. Kopsergenova, S. M. Ozova. D. O. Usha, O. Sh. Usha, T. O. Usha, Z. M. Chukova, O. M. Chukova, and A. Sh. Tsekov.

Students and faculty of HSE University and RSUH D. A. Arakelova, P. M. Arkadiev, S. P. Durneva, E. S. Klyagina, A. G. Koshevoy, Yu. A. Lander, A. B. Panova, K. I. Romanova, A. A. Rossius, A. D. Sorokina, Ya. G. Testelets, and A. I. Fedorenko.

The texts were prepared for publication in the form of the corpus by A. B. Panova and A. D. Sorokina. Further technical work was done by E. O. Sokur.

The Spoken corpus of Abaza is supported by the Linguistic Convergence Laboratory at the Higher School of Economics. The corpus was created within the framework of the HSE University Basic Research Program and funded by the Russian Academic Excellence Project '5-100'.

Contacts

You may contact us with questions about the Corpus:
Anastasia Panova: anastasia.b.panova@gmail.com

Or with questions about the search platform:
Elena Sokur: elena.o.sokur@gmail.com

How to cite the corpus

If you use data from the Spoken corpus of Abaza in your research, please cite as follows:

Anastasia Panova, Anna Sorokina, Peter Arkadiev, Elena Sokur. Spoken corpus of Abaza. Moscow: School of Linguistics, HSE University; Linguistic Convergence Laboratory, HSE University. (Available online at: https://lingconlab.ru/spoken_abaza/, accessed on .)