Karata area, Akhvakh district

1 What is TALD?

The Typological Atlas of the Languages of Daghestan (TALD) is a tool for the visualization of information about linguistic structures typical of Daghestan. The scope of the project currently covers all East Caucasian languages and several other languages spoken in Daghestan, Chechnya, Ingushetia and adjacent territories.

The Atlas consists of:

2 Daghestan as a linguistic area

Daghestan is the most linguistically diverse part of the Caucasus, with at least 40 different languages (and many more highly divergent idioms) spoken on a territory of 50,300 km2 that consists mostly of mountainous terrain. The majority of the languages spoken there belong to the East Caucasian (or Nakh-Daghestanian) language family: one of the three language families indigenous to the Caucasus. For the most part, the languages of the East Caucasian family are spoken in the eastern Caucasus area (with the exception of some relatively recent diasporic communities). They have no proven genealogical relationship to any other languages or language families.

Other languages spoken in Daghestan include three Turkic languages: Nogai, Kumyk (Kipchak) and Azerbaijani (Oghuz); and three Indo-European languages: Russian (Slavic, the major language of administration, education, and urban areas), Armenian (Armenic), and Tat (Iranian). Arabic is the language of religion, as most people in Daghestan are Sunni Muslims. The official languages of Daghestan (in alphabetical order) are Agul, Avar, Azerbaijani, Chechen, Dargwa, Kumyk, Lezgian, Lak, Nogai, Russian, Rutul, Tabasaran, Tat, Tsakhur.

Historically there was no single lingua franca for the whole area. As a result, Daghestanians were known for having a command of multiple locally important languages, which they picked up in the course of seasonal labor migration, trading at cardinal markets, and other types of contact. Currently these patterns are disappearing fast due to the expansion of Russian.

One of the aims of TALD is to chart the genealogical and geographical distribution of linguistic features and to facilitate multi-faceted analyses of language contact in Daghestan by comparing the presence of shared features with known patterns of bilingualism and lexical convergence.

3 Map visualizations

The Atlas currently offers three different types of map visualizations:

  1. General datapoints
  2. Extrapolated data
  3. Data granularity

Each of these visualizations has its benefits and drawbacks, so we allow the user to toggle between the different options.

Below are some examples from the chapter on Morning greetings, which describes the two main ways to greet someone in the morning in the languages of Daghestan: wishing them a good morning or asking them whether they woke up.

For map visualizations we use the Lingtypology package (Moroz, 2017) for R.

3.1 General datapoints

This is the more basic visualization, which shows one dot on the map for each language in the sample. By unticking the box “show languages” you can remove the inner dots and visualize the distribution of different values in the area without the distraction of genealogical information.

3.2 Extrapolated data

This visualization represents each language as a cluster of dots, which correspond to villages where a certain language is spoken.1 The inside of each dot is colored by language. Languages from the same group have similar colors (e.g., all Lezgic languages have some shade of green). Hover over a dot to see the name of the language, and click to view a popup with a link to the language’s page in the Glottolog database and the name of the village. The color of the outer dots indicates the value of a linguistic feature. By unticking the box “show languages” you can remove the inner dots and visualize the distribution of different values in the area without the distraction of genealogical information.

A benefit of this type of visualization is that it shows the size and boundaries of speech communities (as opposed to maps based on abstract general datapoints). Its main drawback is that it involves a lot of generalization. We do not have information on each village variety of the languages in our sample, so we extrapolate the information we have on a language or dialect to all the villages where they are spoken. In doing so, we risk overgeneralizing information and erasing possible dialectal differences.

3.3 Data granularity

The data granularity visualization shows the level of accuracy for each datapoint in the previous visualization, e.g., “village dialect” indicates that we had information about the feature for this specific village variety, while “language” means that we only had information for the language in general, from which we extrapolated information for this point. This allows the user to see what kind of data underlies the default visualization.

Our goal for the Atlas is to continue adding new data to existing datasets and thus gradually improve its coverage and accuracy.

4 Contribute to the Atlas

The chapters and datasets in the Atlas are created by researchers specializing in the languages of Daghestan as well as by students of linguistics with no prior knowledge of the area and the languages spoken there.

If you would like to contribute a chapter and / or data to the Atlas because you are studying a certain topic in the languages of Daghestan, or you are a student looking for an internship, do not hesitate to contact us! You can find our contact info under Team.

To get a better idea of our methodology and what you will have to do if you decide to become a contributor, see our Contributor Manual.

5 Access to data

The data can be accessed through the atlas interface, or downloaded directly from our GitHub page. For reasons of space, on the atlas interface we show filtered versions of the original databases, which only include the main information displayed on maps. However, both filtered and full versions of the databases are available for downloading. Full versions including more detailed information for each observation in the database (e.g., specific morphemes or wordforms, examples of their occurrence in texts with glosses and translations) can be downloaded by clicking on the download button, or by accessing our GitHub page.

6 How to cite

6.1 Plain text

Daniel, M., K. Filatov, T. Maisak, G. Moroz, T. Mukhin, C. Naccarato, and S. Verhees (2022). Typological Atlas of the Languages of Daghestan (TALD), v. 1.0.0. Moscow: Linguistic Convergence Laboratory, NRU HSE. DOI: 10.5281/zenodo.6807070. http://lingconlab.ru/dagatlas.

6.2 BibTeX

@book{tald2022,
  title = {Typological Atlas of the Languages of Daghestan (TALD), v. 1.0.0},
  author = {Michael Daniel and Konstantin Filatov and Timur Maisak and George Moroz and Timofey Mukhin and Chiara Naccarato and Samira Verhees},
  year = {2022},
  publisher = {Linguistic Convergence Laboratory, NRU HSE},
  address = {Moscow},
  url = {http://lingconlab.ru/dagatlas},
  doi = {10.5281/zenodo.6807070},
}

References

Moroz, G. (2017). Lingtypology: Easy mapping for linguistic typology. Retrieved from https://CRAN.R-project.org/package=lingtypology

  1. This visualization makes use of the East Caucasian villages dataset.↩︎