-------------------------- Latin Dependency Treebank -------------------------- The Ancient Greek and Latin Dependency Treebank (AGLDT) is the earliest treebank for Ancient Greek and Latin. The project started at Tufts University in 2006 and is currently under development and maintenance at Leipzig University-Tufts University. The Ancient Greek and Latin Dependency Treebanks are built from the work of dedicated students and researchers from across the world. Over 200 people have annotated texts; the hard work of those who have contributed their annotations as part of the official treebanks are within the data. Cited from https://perseusdl.github.io/treebank_data/ September 13, 2017 For Latin, the following texts are included: Author Text --------------------------------------- Augustus Res Gestae Caesar Commentarii de Bello Gallico Cicero In Catilinam Jerome Vulgata Vergil Aeneid Ovid Metamorphoses Petronius Satyricon Phaedrus Fabulae Propertius Elegiae Sallust Bellum Catilinae Suetonius Life of Augustus Tacitus Historiae HistCorp inclusion date ------------------------ January 30, 2017 Website -------- https://perseusdl.github.io/treebank_data/ Licence -------- Creative Commons Attribution-ShareAlike 3.0 United States https://creativecommons.org/licenses/by-sa/3.0/us/ The HistCorp files ------------------- On the HistCorp page, the Latin texts from the AGLDT corpus are provided in a plain text format ('txt'), a tokenised format ('tok'), and in a morphologically and syntactically annotated format ('anno'). The plain text files were created from the original AGLDT xml files, by extracting the text parts of the XML files, and also adding metadata from the XML files in a TEI-compatible format at the top of each file. In addition, the number of tokens for each file has been calculated based on the tokenised version of the file. The tokenised files were created by extracting the words and sentence boundaries from the original XML files. The tagged and parsed files are unchanged from the ones found on the AGLDT webpage. Information from the README file in the AGLDT package: ----- The data have been semi-automatically annotated. The full tagset can be consulted in TAGSET.xml. Each word is specified for a number of attributes describing it. The @pos attribute is a 9-character long string where each character has a particular meaning depending on its position. In TAGSET.xml this logic is documented in all detail (the file is derived from the one used in Arethusa, the online annotation environment used for annotation). In TAGSET.txt there is a more easily readable version of the tagset. Data have been annotated using the following guidelines: * [Guidelines for the Syntactic Annotation of Latin Treebanks (1.3)](http://nlp.perseus.tufts.edu/syntax/treebank/ldt/1.5/docs/guidelines.pdf) (GSALT) In the present release the following new texts have been added: * Res Gestae * Historiae Res Gestae, which were treebanked following a different annotation scheme (as for syntactic labels), have been automatically converted to the common annotation scheme of the GSALT (aporiae in the conversion may be present, in that). The original syntactic labels (see Harrington-tagset.pdf and Harrington-tagset-instructions.pdf) have been preserved in the attribute @hrngtn. The following texts have undergone a major revision in order to improve their form and consistency within themselves and with the most recently annotated texts, i.e., Fabulae, Life of Augustus, and Historiae: * In Catilinam * Aeneis * Commentarii de Bello Gallico * Elegiae * In Catilinam More precisely, these texts have been modified thus: * Addition of punctuation * Addition of missing sentences and paragraphs * Sentences restored in their correct order * Enclitic particles (-que, -ve, -ne) restored in their correct position * univerbated coordinating elements (neque, nec) * Part of speech chosen on the basis of Lewis-Short's A Latin Dictionary and - if problems arise - Allen and Greenough’s A New Latin Grammar * Tagset correction for gerund and gerundive * Some corrections related to the distinction adjective/pronouns and deponent/passive * APOS is annotated as appositive and not as apposition (i.e., the label is on the noun considered to be the appositive) * Some corrections related to verbal valency, the distinction between adverbial/attributive participles, personal constructions (e.g., videor) * Normalization of the use of AuxY and auxZ The following texts lack the preceding modifications, but punctuation has been added: * Bellum Catilinae * Metamorphoses * Satyricon * Vulgata The structure of the original XML files (i.e., the one according to the XML schema which is digested in the Perseids platform, where annotations are peformed) has been changed in order to make it more informative and easier to query. The treebank root element identifies the version of the release (@version) and the cts for each text (@cts). The (pseudo-TEI) header element contains information/credits about the creation of the file. The biblStruct element contains information about the ancient author and text, which helps interpretation of @cts. The original structure of sentence and word elements is preserved with some normalization concerning non-linguistically relevant nodes: @span has been deleted and some normalization has been applied to the display of cts:urn values within sentence (these values are available on a sentence level, and sometimes also on a word level). ----- Size: 12 texts, with a total of 79,121 tokens.