-------------------------------
The Index Thomisticus Treebank
-------------------------------

The Index Thomisticus Treebank contains Latin texts of Thomas Aquinas
(Medieval Latin) enhanced with complex and interlinked morphological,
syntactic (around 450,000 nodes and more than 26,000 sentences) and
semantic/pragmatic annotation (around 28,000 nodes and 2,000
sentences). The texts of Thomas Aquinas are taken from the Index
Thomisticus corpus. Built by father Roberto Busa SJ, the Index
Thomisticus is considered to be a pathfinder resource in humanities
computing and computational linguistics. 

The text are excerpted from Summa contra Gentiles (entirely
annotated), Scriptum super Sententiis Magistri Petri Lombardi and
Summa Theologiae.

The annotation guidelines for the syntactic and the semantic/pragmatic
levels of annotation resembles those for the so called "analytical"
and "tectogrammatical" layers of the Prague Dependency Treebank
respectively. The theoretical framework that motivates the annotation
style is Functional Generative Description (P. Sgall, E. Hajicová, and
J. Panevová. 1986. The Meaning of the Sentence in its Semantic and
Pragmatic Aspects, D. Reidel, Dordrecht, NL). The Index Thomisticus
Treebank can be browsed here through the PML-TQ web interface. 

	Cited from https://itreebank.marginalia.it/view/ittb.php
	August 24, 2020


HistCorp inclusion date
------------------------
August 24, 2020


Website
--------
https://itreebank.marginalia.it/view/ittb.php


Licence
--------
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
(https://creativecommons.org/licenses/by-nc-sa/3.0/) 


The HistCorp files
-------------------
On the HistCorp page, the Latin texts from the Index Thomisticus
Treebank are provided in a plain text format ('txt'), a tokenised
format ('tok'), and a tab-separated CoNLL-format with linguistic
annotation ('anno').

The plain text files were created from the original CoNLL files, by
extracting the words and sentences from the CoNLL structure, adding
one sentence on each line in the resulting plain text file, and also
adding the metadata from the README file in a TEI-compatible format at
the top of each file. In addition, the number of tokens for each file
has been calculated based on the tokenised version of the file.

Similarly, the tokenised files were created from the original CoNLL
files, by extracting the words, one on each line, with sentence
boundaries marked by an empty line.

The linguistically annotated files are the same as the original CoNLL
files, except that metadata has been added at the top of each file.


Size: 5 texts, with a total of 469,306 tokens.