------------------------------- Late Latin Charter Treebank 2 ------------------------------- Version 1.2 of the Late Latin Charter Treebank 2 (LLCT2) contains a number of minor corrections, and replaces the version 1.0 published at Zenodo in 2019. The collection includes Early Medieval Latin documentary texts from Italy between AD 774--897 with morphological and syntactic annotation, in a Latin Dependency Treebank (LDT) compatible linguistic annotation, and a CoNLL treebank format. For a detailed description of the Late Latin Charter Treebanks, see the pre-print of the paper 'Late Latin Charter Treebank: contents and annotation', to be published in Corpora, 16:2 (2021), at the institutional repository of the University of Helsinki. See also Korkiakangas, T. and Lassila, M. (2013), Abbreviations, fragmentary words, formulaic language: treebanking medieval charter material, in Mambrini, F., Passarotti, M. and Sporleder, C., Proceedings of the third workshop on annotation of corpora for research in the humanities, pp. 61--72, and Korkiakangas, T. and Passarotti, M. (2011), Challenges in Annotating Medieval Latin Charters, in «Journal of Language Technology and Computational Linguistics», 26, pp. 103--114. Cited from https://zenodo.org/record/3633614#.X0PUCJMzbDJ August 24, 2020 HistCorp inclusion date ------------------------ August 24, 2020 Website -------- https://zenodo.org/record/3633614#.X0PUCJMzbDJ Licence -------- Creative Commons Attribution 4.0 International (https://creativecommons.org/licenses/by/4.0/legalcode) The HistCorp files ------------------- On the HistCorp page, the Latin texts from the Late Latin Charter Treebank 2 are provided in a plain text format ('txt'), a tokenised format ('tok'), and a CoNLL format with linguistic annotation ('anno'). The plain text file was created from the original CoNLL file, by extracting the words and sentences from the CoNLL structure, adding one sentence on each line in the resulting plain text file, and also adding the metadata from the CoNLL file in a TEI-compatible format at the top of each file. In addition, the number of tokens has been calculated based on the tokenised version of the file. Similarly, the tokenised file was created from the original CoNLL files, by extracting the words, one on each line, with sentence boundaries marked by an empty line. The linguistically annotated file is the same as the original CoNLL file, except that metadata has been added at the top of the file. Size: 257,918 tokens.