------------------------------- Late Latin Charter Treebank 1 ------------------------------- Version 1.2 of the Late Latin Charter Treebank 1 (LLCT1) contains a number of minor corrections, and replaces the version 1.0 published at Zenodo in 2018. The collection includes Early Medieval Latin documentary texts from Italy between AD 714--869 with morphological and syntactic annotation, in a Latin Dependency Treebank (LDT) compatible linguistic annotation, and a Prague style treebank format (PML). For a detailed description of the Late Latin Charter Treebanks, see the pre-print of the paper 'Late Latin Charter Treebank: contents and annotation', to be published in Corpora, 16:2 (2021), at the institutional repository of the University of Helsinki. See also Korkiakangas, T. and Lassila, M. (2013), Abbreviations, fragmentary words, formulaic language: treebanking medieval charter material, in Mambrini, F., Passarotti, M. and Sporleder, C., Proceedings of the third workshop on annotation of corpora for research in the humanities, pp. 61--72, and Korkiakangas, T. and Passarotti, M. (2011), Challenges in Annotating Medieval Latin Charters, in «Journal of Language Technology and Computational Linguistics», 26, pp. 103--114. Cited from https://zenodo.org/record/3633607#.X0PF15MzbDJ August 24, 2020 HistCorp inclusion date ------------------------ August 24, 2020 Website -------- https://zenodo.org/record/3633607#.X0PF15MzbDJ Licence -------- Creative Commons Attribution 4.0 International (https://creativecommons.org/licenses/by/4.0/legalcode) The HistCorp files ------------------- On the HistCorp page, the Latin texts from the Late Latin Charter Treebank 1 are provided in a plain text format ('txt') and a tokenised format ('tok'). The plain text files were created from the original XML files, by extracting the words and sentences from the XML structure, adding one sentence on each line in the resulting plain text file, and also adding the metadata from the XML file in a TEI-compatible format at the top of each file. In addition, the number of tokens for each file has been calculated based on the tokenised version of the file. Similarly, the tokenised files were created from the original XML files, by extracting the words, one on each line, with sentence boundaries marked by an empty line. Size: 519 texts, with a total of 225,825 tokens.