------------------------ Swedish Gutenberg texts ------------------------ The Swedish Gutenberg texts on the HistCorp page are a subset of the texts provided by Project Gutenberg (http://www.gutenberg.org). HistCorp inclusion date ------------------------ May 4, 2017 Website -------- http://www.gutenberg.org Contact information -------------------- http://www.gutenberg.org/wiki/Gutenberg:Contact_Information Licence -------- http://www.gutenberg.org/license The HistCorp files ------------------- The Swedish Gutenberg texts on the HistCorp page are a subset of the texts included by Project Gutenberg (http://www.gutenberg.org), and are provided in a plain text format ('txt'), and in a tokenised format ('tok'). The plain text files have been semi-automatically stripped from Gutenberg-specific metadata, and extratextual information such as page numbering, footnotes and underscore signs marking emphasis etc. Metadata is instead given in a TEI-compatible format at the top of each file. When assigning metadata, the number of tokens has been calculated based on the tokenised version of the file. In the tokenised files, the texts are split into one token on each line. Tokenisation was performed using the UDPipe tokeniser (https://ufal.mff.cuni.cz/udpipe) with the Swedish language model provided as a baseline model in the CoNLL17 Shared Task (swedish-ud-2.0-conll17-170315.udpipe). Size: 14 texts, with a total of 942,011 tokens. Genre: books (see metadata for each file, for more detailed information on the genres included).