“Imperium. Semantic field” was developed within the project “’Empire’ semantic field in Russian, English and Czech languages” (supported by the Russian Foundation for Basic Research, Project No. 18-012-00474).
The concept of ‘empire’ is part of historical memory and worldview of Russian, English and Czech people. We can learn about it from texts, which reflect our knowledge of the world. Therefore, it was interesting for us to analyse the concept of ‘empire’ using methods of computational and corpus linguistics.
The project was aimed at forming the semantic field of ‘empire’ based on the corpora of Russian, English and Czech languages using distributional and statistical methods.
The concept of “semantic field” is used in linguistics to denote a set of linguistic units grouped by some common semantic attribute, that is, having some common component of meaning. Words and phrases are chosen as such lexical units, both common nouns and proper names. A semantic field consists of a core, the elements of which have a complete set of attributes characteristic of a certain semantic group, and a periphery, which contains elements connected with other semantic fields. Elements of a semantic field are connected to each other, while the strength of these connections can be calculated.
We have developed a technique of forming a semantic field based on corpora by means of distributional statistical analysis. Using Sketch Engine (https://www.sketchengine.eu) we have compiled corpora for Russian (32.6 mln words), English (25.6 mln words) and Czech (19.6 mln words). Having analysed lexicographic resources, we have formed semantic fields of ‘empire’ for these languages with the help of the “Thesaurus” tool within Sketch Engine. We have also used the tools and corpora developed within Aranea Corpora (https://www.juls.savba.sk/sem%C3%A4, by Radovan Garabik), Corpus.Byu.Edu (https://www.english-corpora.org/), Wortschatz (https://wortschatz.uni-leipzig.de/de) and InterCorp parallel corpora (https://treq.korpus.cz/).
As a result, we have compiled Russian, English and Czech thesauri which describe the corresponding semantic fields of ‘empire’ in these languages. The thesauri can be found on the main “Index” page (lists of words from the core zone and the periphery zone). For each of these words, users can receive information about the zone it belongs to (1 – the core zone, 2 – the periphery zone) and its frequency per million (ipm).
The words from the core zone are further described in the following way:
- one or several definitions (related to the concept of “empire”);
- examples from our Sketch Engine corpora;
- collocations, determined by means of the “WordSketch” tool within Sketch Engine (collocates, their logDice score and frequency in the corpus, as well as full forms of collocations are provided);
- thesaurus, retrieved by “Thesaurus” tool within Sketch Engine (logDice score and frequency are indicated for each element);
- translations, with the percentage of translation equivalents in InterCorp (https://treq.korpus.cz/).
The “Statistics” page contains information about the total number of words in the database and some statistical details for each language.