---------------------------------------------------
The Lampeter Corpus of Early Modern English Tracts
---------------------------------------------------

The Lampeter Corpus of Early Modern English Tracts is a collection of
texts on various subject matter published between 1640 and 1740 -
a time that is marked by the rise of mass publication, the development
of a public discourse in many areas of everyday life and, last but not
least, the standardisation of British English.

	Cited from http://www.helsinki.fi/varieng/CoRD/corpora/LC/
	November 8, 2017


HistCorp inclusion date
------------------------
November 8, 2017


Website
--------
http://ota.ox.ac.uk/desc/2400


Licence
--------
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
License (http://creativecommons.org/licenses/by-nc-sa/3.0/) 


The HistCorp files
-------------------
On the HistCorp page, the texts from the Lampeter Corpus of Early
Modern English Tracts are provided in a plain text format ('txt'), and
in a tokenised format ('tok').

The plain text files were created from the original Lampeter SGML
files, by extracting the text parts of the SGML files.

In the tokenised files, the texts are split into one token on each
line. Tokenisation was performed using the UDPipe tokeniser
(https://ufal.mff.cuni.cz/udpipe) with the English language model
provided as a baseline model in the CoNLL17 Shared Task
(english-ud-2.0-conll17-170315.udpipe).

Metadata has also been added in a TEI-compatible format at the top of
each txt file. The metadata information was mainly extracted from the
original SGML files and metadata stated on the corpus website. In
addition, the number of tokens for each file has been calculated based
on the tokenised version of the file.


Size: 120 texts, with a total of 1,316,404 tokens.

Genre: pamphlets/tracts (with different subgenres stated in the metadata)