---------------------- Dutch Gutenberg texts ---------------------- The Dutch Gutenberg texts on the HistCorp page are a subset of the texts provided by Project Gutenberg (http://www.gutenberg.org). HistCorp inclusion date ------------------------ June 14, 2017 Website -------- http://www.gutenberg.org Contact information -------------------- http://www.gutenberg.org/wiki/Gutenberg:Contact_Information Licence -------- http://www.gutenberg.org/license The HistCorp files ------------------- The Dutch Gutenberg texts on the HistCorp page are a subset of the texts included by Project Gutenberg (http://www.gutenberg.org), and are provided in a plain text format ('txt'), and in a tokenised format ('tok'). The plain text files have been semi-automatically stripped from Gutenberg-specific metadata, and extratextual information such as page numbering, footnotes and underscore signs marking emphasis etc. Metadata is instead given in a TEI-compatible format at the top of each file. In the tokenised files, the texts are split into one token on each line. Tokenisation was performed using the UDPipe tokeniser (https://ufal.mff.cuni.cz/udpipe) with the Dutch language model provided as a baseline model in the CoNLL17 Shared Task (dutch-ud-2.0-conll17-170315.udpipe). Metadata has also been added in a TEI-compatible format at the top of each txt file. The metadata information was mainly extracted from the metadata stated on the corpus website. In addition, the number of tokens for each file has been calculated based on the tokenised version of the file. Size: 34 texts, with a total of 2,548,401 tokens. Genre: books (see metadata for each file, for more detailed information on the genres included).