-------------------------------
Late Latin Charter Treebank 1
-------------------------------

Version 1.2 of the Late Latin Charter Treebank 1 (LLCT1) contains a
number of minor corrections, and replaces the version 1.0 published at
Zenodo in 2018. The collection includes Early Medieval Latin
documentary texts from Italy between AD 714--869 with morphological
and syntactic annotation, in a Latin Dependency Treebank (LDT)
compatible linguistic annotation, and a Prague style treebank format
(PML).

For a detailed description of the Late Latin Charter Treebanks, see
the pre-print of the paper 'Late Latin Charter Treebank: contents and
annotation', to be published in Corpora, 16:2 (2021), at the
institutional repository of the University of Helsinki. See also
Korkiakangas, T. and Lassila, M. (2013), Abbreviations, fragmentary
words, formulaic language: treebanking medieval charter material, in
Mambrini, F., Passarotti, M. and Sporleder, C., Proceedings of the
third workshop on annotation of corpora for research in the
humanities, pp. 61--72, and Korkiakangas, T. and Passarotti, M. (2011),
Challenges in Annotating Medieval Latin Charters, in «Journal of
Language Technology and Computational Linguistics», 26, pp. 103--114. 

	Cited from https://zenodo.org/record/3633607#.X0PF15MzbDJ
	August 24, 2020


HistCorp inclusion date
------------------------
August 24, 2020


Website
--------
https://zenodo.org/record/3633607#.X0PF15MzbDJ


Licence
--------
Creative Commons Attribution 4.0 International
(https://creativecommons.org/licenses/by/4.0/legalcode)


The HistCorp files
-------------------
On the HistCorp page, the Latin texts from the Late Latin Charter
Treebank 1 are provided in a plain text format ('txt') and a tokenised
format ('tok').

The plain text files were created from the original XML files, by
extracting the words and sentences from the XML structure, adding
one sentence on each line in the resulting plain text file, and also
adding the metadata from the XML file in a TEI-compatible format at
the top of each file. In addition, the number of tokens for each file
has been calculated based on the tokenised version of the file.

Similarly, the tokenised files were created from the original XML
files, by extracting the words, one on each line, with sentence
boundaries marked by an empty line.


Size: 519 texts, with a total of 225,825 tokens.