Help


Basic concepts
Search engine

Corpus search is based on CQL syntax provided with an graphic user interface. Every request is converted into CQL syntax automatically in a field CQL Search. Corpus Data is divided into two basic units: words and fragments (roughly equivalent to the sentence). The point of searching in the corpus is to find all fragments that include given word.


Metadata

Each informant in the corpus is described with a set of parameters, such as gender, year of birth, education etc. The data can be used in two ways:

  • in searching: it is possible to limit a query response with a particular metadata unit, e.g. with age of informants;
  • in search result: it is possible to display information about the author of a statement.

The data about informants is anonymised, so each of them is encoded with sequence of letters and digits.


Regular expressions

Every string entered in a search fields is to be interpreted as a regular expression. Regular expressions are the set of rules, that allow to extend capability of interpretation of a search request. For example, a regexp query string ca[tpn] it to find cat, cap and can. A signature ca.* stands for: find every string that starts with ca and continues with any combination of symbols. The result may include cat, call, cataclysm, capability and so on.

Regular expression is an extremely useful tool, and a user does not need a specific knowledge to use it. More about regular expressions: