------------------------------------------------------
Syntactic Reference Corpus of Medieval French (SRCMF)
------------------------------------------------------

The Syntactic Reference Corpus of Medieval French (SRCMF) included on
the HistCorp platform is retrieved from the Old French section of the
Universal Dependcies treebanks, containing a subset of the SRCMF
corpus. This subset contains 10 texts spanning from the 9th to the
13th century, with a total of 17678 sentences and 170 741 tokens.


HistCorp inclusion date
------------------------
November 6, 2020


Websites
---------
http://srcmf.org/
https://github.com/UniversalDependencies/UD_Old_French-SRCMF


Licence
--------
Creative Commons BY-SA 4.0
(http://creativecommons.org/licenses/by-sa/4.0/legalcode) 


The HistCorp files
-------------------
On the HistCorp page, the French texts from 'Syntactic Reference
Corpus of Medieval French' are provided in a plain text format
('txt'), a tokenised format ('tok' and a linguistically annotated
CoNLL-U format ('anno').

The linguistically annotated files ('anno') contain information on
part-of-speech tags, morphology and syntax (expressed as dependency
relations), following the same CoNLL-U format as on the Universal
Dependencies site from which the files were extracted, except that
metadata has been added in a TEI-compatible format at the top of each
file. The metadata information was mainly extracted from the metadata
stated in the README file on the SRCMF section of the Universal
Dependencies site
(https://github.com/UniversalDependencies/UD_Old_French-SRCMF). 

The plain text files ('txt') contain one sentence on each line. The
sentences were automatically extracted from the CoNLL-U files.

In the tokenised files ('tok'), the texts are split into one token on each
line. The tokenised files were automatically created, by extracting
the first and second columns only (word id and word form) from the
CoNLL-U files.


Size: 10 texts, with a total of 170,741 tokens.

Time period: 9th to 13th century