------------------------------------------------------ Syntactic Reference Corpus of Medieval French (SRCMF) ------------------------------------------------------ The Syntactic Reference Corpus of Medieval French (SRCMF) included on the HistCorp platform is retrieved from the Old French section of the Universal Dependcies treebanks, containing a subset of the SRCMF corpus. This subset contains 10 texts spanning from the 9th to the 13th century, with a total of 17678 sentences and 170 741 tokens. HistCorp inclusion date ------------------------ November 6, 2020 Websites --------- http://srcmf.org/ https://github.com/UniversalDependencies/UD_Old_French-SRCMF Licence -------- Creative Commons BY-SA 4.0 (http://creativecommons.org/licenses/by-sa/4.0/legalcode) The HistCorp files ------------------- On the HistCorp page, the French texts from 'Syntactic Reference Corpus of Medieval French' are provided in a plain text format ('txt'), a tokenised format ('tok' and a linguistically annotated CoNLL-U format ('anno'). The linguistically annotated files ('anno') contain information on part-of-speech tags, morphology and syntax (expressed as dependency relations), following the same CoNLL-U format as on the Universal Dependencies site from which the files were extracted, except that metadata has been added in a TEI-compatible format at the top of each file. The metadata information was mainly extracted from the metadata stated in the README file on the SRCMF section of the Universal Dependencies site (https://github.com/UniversalDependencies/UD_Old_French-SRCMF). The plain text files ('txt') contain one sentence on each line. The sentences were automatically extracted from the CoNLL-U files. In the tokenised files ('tok'), the texts are split into one token on each line. The tokenised files were automatically created, by extracting the first and second columns only (word id and word form) from the CoNLL-U files. Size: 10 texts, with a total of 170,741 tokens. Time period: 9th to 13th century