------------------------- Perseus Digital Library ------------------------- The Perseus Digital Library at Tufts University provides access to a collection of Greek and Roman materials, including many of the canonical texts read today. Cited from http://www.circe.be/perseus-digital-library/ August 24, 2020 HistCorp inclusion date ------------------------ August 24, 2020 Website -------- http://www.perseus.tufts.edu/hopper/ Licence -------- Creative Commons Attribution-ShareAlike 3.0 United States (https://creativecommons.org/licenses/by-sa/3.0/us/) The HistCorp files ------------------- On the HistCorp page, the Latin texts from the Perseus Digital Library are provided in a plain text format ('txt'), and a tokenised format ('tok'). The plain text files were created from the original Perseus xml files, by extracting the text parts of the XML files, and also adding metadata from the XML files in a TEI-compatible format at the top of each file. In addition, the number of tokens for each file has been calculated based on the tokenised version of the file. In the tokenised files, the texts are split into one token on each line. Tokenisation was performed using the UDPipe tokeniser (https://ufal.mff.cuni.cz/udpipe) with the Latin language model provided as a baseline model in the CoNLL17 Shared Task (latin-ud-2.0-conll17-170315.udpipe). Size: 422 texts, with a total of 7,625,906 tokens.