-------------------------------
Late Latin Charter Treebank 2
-------------------------------

Version 1.2 of the Late Latin Charter Treebank 2 (LLCT2) contains a
number of minor corrections, and replaces the version 1.0 published at
Zenodo in 2019. The collection includes Early Medieval Latin
documentary texts from Italy between AD 774--897 with morphological
and syntactic annotation, in a Latin Dependency Treebank (LDT)
compatible linguistic annotation, and a CoNLL treebank format.

For a detailed description of the Late Latin Charter Treebanks, see
the pre-print of the paper 'Late Latin Charter Treebank: contents and
annotation', to be published in Corpora, 16:2 (2021), at the
institutional repository of the University of Helsinki. See also
Korkiakangas, T. and Lassila, M. (2013), Abbreviations, fragmentary
words, formulaic language: treebanking medieval charter material, in
Mambrini, F., Passarotti, M. and Sporleder, C., Proceedings of the
third workshop on annotation of corpora for research in the
humanities, pp. 61--72, and Korkiakangas, T. and Passarotti,
M. (2011), Challenges in Annotating Medieval Latin Charters, in
«Journal of Language Technology and Computational Linguistics», 26,
pp. 103--114. 


	Cited from https://zenodo.org/record/3633614#.X0PUCJMzbDJ
	August 24, 2020


HistCorp inclusion date
------------------------
August 24, 2020


Website
--------
https://zenodo.org/record/3633614#.X0PUCJMzbDJ


Licence
--------
Creative Commons Attribution 4.0 International
(https://creativecommons.org/licenses/by/4.0/legalcode)


The HistCorp files
-------------------
On the HistCorp page, the Latin texts from the Late Latin Charter
Treebank 2 are provided in a plain text format ('txt'), a tokenised
format ('tok'), and a CoNLL format with linguistic annotation
('anno'). 

The plain text file was created from the original CoNLL file, by
extracting the words and sentences from the CoNLL structure, adding
one sentence on each line in the resulting plain text file, and also
adding the metadata from the CoNLL file in a TEI-compatible format at
the top of each file. In addition, the number of tokens has been
calculated based on the tokenised version of the file.

Similarly, the tokenised file was created from the original CoNLL
files, by extracting the words, one on each line, with sentence
boundaries marked by an empty line.

The linguistically annotated file is the same as the original CoNLL
file, except that metadata has been added at the top of the file.


Size: 257,918 tokens.