Karata area, Akhvakh district

1 What is TALD?

The Typological Atlas of the Languages of Daghestan (TALD) is a tool for the visualization of information about linguistic structures typical of Daghestan. The scope of the project currently covers all East Caucasian languages and several other languages spoken in Daghestan, Chechnya, Ingushetia and adjacent territories.

The Atlas consists of:

2 Daghestan as a linguistic area

Daghestan is the most linguistically diverse part of the Caucasus, with at least 40 different languages (and many more highly divergent idioms) spoken on a territory of 50,300 km2 that consists mostly of mountainous terrain. The majority of the languages spoken there belong to the East Caucasian (or Nakh-Daghestanian) language family: one of the three language families indigenous to the Caucasus. For the most part, the languages of the East Caucasian family are spoken in the eastern Caucasus area (with the exception of some relatively recent diasporic communities). They have no proven genealogical relationship to any other languages or language families.

Other languages spoken in Daghestan include three Turkic languages: Nogai, Kumyk (Kipchak) and Azerbaijani (Oghuz); and three Indo-European languages: Russian (Slavic, the major language of administration, education, and urban areas), Armenian (Armenic), and Tat (Iranian). Arabic is the language of religion, as most people in Daghestan are Sunni Muslims. The official languages of Daghestan (in alphabetical order) are Agul, Avar, Azerbaijani, Chechen, Dargwa, Kumyk, Lezgian, Lak, Nogai, Russian, Rutul, Tabasaran, Tat, Tsakhur.

Historically there was no single lingua franca for the whole area. As a result, Daghestanians were known for having a command of multiple locally important languages, which they picked up in the course of seasonal labor migration, trading at cardinal markets, and other types of contact. Currently these patterns are disappearing fast due to the expansion of Russian.

One of the aims of TALD is to chart the genealogical and geographical distribution of linguistic features and to facilitate multi-faceted analyses of language contact in Daghestan by comparing the presence of shared features with known patterns of bilingualism and lexical convergence.

3 Map visualizations

The Atlas currently offers three different types of map visualizations:

  1. General datapoints
  2. Extrapolated data
  3. Data granularity

Each of these visualizations has its benefits and drawbacks, so we allow the user to toggle between the different options.

Below are some examples from the chapter on Morning greetings, which describes the two main ways to greet someone in the morning in the languages of Daghestan: wishing them a good morning or asking them whether they woke up.

For map visualizations we use the Lingtypology package (Moroz, 2017) for R.

3.1 General datapoints

This is the more basic visualization, which shows one dot on the map for each language in the sample. By unticking the box “show languages” you can remove the inner dots and visualize the distribution of different values in the area without the distraction of genealogical information.

3.2 Extrapolated data

This visualization represents each language as a cluster of dots, which correspond to villages where a certain language is spoken.1 The inside of each dot is colored by language. Languages from the same group have similar colors (e.g., all Lezgic languages have some shade of green). Hover over a dot to see the name of the language, and click to view a popup with a link to the language’s page in the Glottolog database and the name of the village. The color of the outer dots indicates the value of a linguistic feature. By unticking the box “show languages” you can remove the inner dots and visualize the distribution of different values in the area without the distraction of genealogical information.

A benefit of this type of visualization is that it shows the size and boundaries of speech communities (as opposed to maps based on abstract general datapoints). Its main drawback is that it involves a lot of generalization. We do not have information on each village variety of the languages in our sample, so we extrapolate the information we have on a language or dialect to all the villages where they are spoken. In doing so, we risk overgeneralizing information and erasing possible dialectal differences.