------------------------- Perseus Digital Library ------------------------- The Perseus Digital Library at Tufts University provides access to a collection of Greek and Roman materials, including many of the canonical texts read today. Cited from http://www.circe.be/perseus-digital-library/ August 21, 2020 HistCorp inclusion date ------------------------ August 21, 2020 Website -------- http://www.perseus.tufts.edu/hopper/ Licence -------- Creative Commons Attribution-ShareAlike 3.0 United States (https://creativecommons.org/licenses/by-sa/3.0/us/) The HistCorp files ------------------- On the HistCorp page, the Greek texts from the Perseus Digital Library are provided in a plain text format ('txt'), and a tokenised format ('tok'). The plain text files were created from the original Perseus xml files, by extracting the text parts of the XML files, and also adding metadata from the XML files in a TEI-compatible format at the top of each file. In addition, the number of tokens for each file has been calculated based on the tokenised file. In the tokenised files, the texts are split into one token on each line. Tokenisation was performed using the UDPipe tokeniser (https://ufal.mff.cuni.cz/udpipe) with the Greek language model provided as a baseline model in the CoNLL17 Shared Task (greek-ud-2.0-conll17-170315.udpipe). Size: 798 texts, with a total of 11,906,120 tokens.