------------------------------ Middle Russian Corpus (RNC) ------------------------------ The Middle Russian Corpus included on the HistCorp platform is retrieved from the Old Russian section of the Universal Dependcies treebanks, containing a subset of the Middle Russian corpus (1300-1700), a part of the Russian National Corpus. HistCorp inclusion date ------------------------ November 10, 2020 Website -------- https://github.com/UniversalDependencies/UD_Old_Russian-RNC/blob/master/README.md Licence -------- Creative Commons BY-NC-SA 4.0 (https://creativecommons.org/licenses/by-nc-sa/4.0/) The HistCorp files ------------------- On the HistCorp page, the Russian texts from 'The Middle Russian Corpus' are provided in a plain text format, a tokenised format and a linguistically annotated CoNLL-U format. The linguistically annotated files ('anno') contain information on part-of-speech tags, lemma, morphology and syntax (expressed as dependency relations), following the same CoNLL-U format as on the Universal Dependencies site from which the files were extracted, except that metadata has been added in a TEI-compatible format at the top of each file. The metadata information was mainly extracted from the metadata stated in the README file on the Old Russian section of the Universal Dependencies site (https://github.com/UniversalDependencies/UD_Old_Russian-RNC/blob/master/README.md). The plain text files ('txt') contain one sentence on each line. The sentences were automatically extracted from the CoNLL-U files. In the tokenised files ('tok'), the texts are split into one token on each line. The tokenised files were automatically created, by extracting the first and second columns only (word id and word form) from the CoNLL-U files. Size: 25,822 tokens.