--------------------------------------------------- The Lampeter Corpus of Early Modern English Tracts --------------------------------------------------- The Lampeter Corpus of Early Modern English Tracts is a collection of texts on various subject matter published between 1640 and 1740 - a time that is marked by the rise of mass publication, the development of a public discourse in many areas of everyday life and, last but not least, the standardisation of British English. Cited from http://www.helsinki.fi/varieng/CoRD/corpora/LC/ November 8, 2017 HistCorp inclusion date ------------------------ November 8, 2017 Website -------- http://ota.ox.ac.uk/desc/2400 Licence -------- Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License (http://creativecommons.org/licenses/by-nc-sa/3.0/) The HistCorp files ------------------- On the HistCorp page, the texts from the Lampeter Corpus of Early Modern English Tracts are provided in a plain text format ('txt'), and in a tokenised format ('tok'). The plain text files were created from the original Lampeter SGML files, by extracting the text parts of the SGML files. In the tokenised files, the texts are split into one token on each line. Tokenisation was performed using the UDPipe tokeniser (https://ufal.mff.cuni.cz/udpipe) with the English language model provided as a baseline model in the CoNLL17 Shared Task (english-ud-2.0-conll17-170315.udpipe). Metadata has also been added in a TEI-compatible format at the top of each txt file. The metadata information was mainly extracted from the original SGML files and metadata stated on the corpus website. In addition, the number of tokens for each file has been calculated based on the tokenised version of the file. Size: 120 texts, with a total of 1,316,404 tokens. Genre: pamphlets/tracts (with different subgenres stated in the metadata)