------------------------------- The Index Thomisticus Treebank ------------------------------- The Index Thomisticus Treebank contains Latin texts of Thomas Aquinas (Medieval Latin) enhanced with complex and interlinked morphological, syntactic (around 450,000 nodes and more than 26,000 sentences) and semantic/pragmatic annotation (around 28,000 nodes and 2,000 sentences). The texts of Thomas Aquinas are taken from the Index Thomisticus corpus. Built by father Roberto Busa SJ, the Index Thomisticus is considered to be a pathfinder resource in humanities computing and computational linguistics. The text are excerpted from Summa contra Gentiles (entirely annotated), Scriptum super Sententiis Magistri Petri Lombardi and Summa Theologiae. The annotation guidelines for the syntactic and the semantic/pragmatic levels of annotation resembles those for the so called "analytical" and "tectogrammatical" layers of the Prague Dependency Treebank respectively. The theoretical framework that motivates the annotation style is Functional Generative Description (P. Sgall, E. Hajicová, and J. Panevová. 1986. The Meaning of the Sentence in its Semantic and Pragmatic Aspects, D. Reidel, Dordrecht, NL). The Index Thomisticus Treebank can be browsed here through the PML-TQ web interface. Cited from https://itreebank.marginalia.it/view/ittb.php August 24, 2020 HistCorp inclusion date ------------------------ August 24, 2020 Website -------- https://itreebank.marginalia.it/view/ittb.php Licence -------- Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (https://creativecommons.org/licenses/by-nc-sa/3.0/) The HistCorp files ------------------- On the HistCorp page, the Latin texts from the Index Thomisticus Treebank are provided in a plain text format ('txt'), a tokenised format ('tok'), and a tab-separated CoNLL-format with linguistic annotation ('anno'). The plain text files were created from the original CoNLL files, by extracting the words and sentences from the CoNLL structure, adding one sentence on each line in the resulting plain text file, and also adding the metadata from the README file in a TEI-compatible format at the top of each file. In addition, the number of tokens for each file has been calculated based on the tokenised version of the file. Similarly, the tokenised files were created from the original CoNLL files, by extracting the words, one on each line, with sentence boundaries marked by an empty line. The linguistically annotated files are the same as the original CoNLL files, except that metadata has been added at the top of each file. Size: 5 texts, with a total of 469,306 tokens.