--------
RIDGES
--------

The RIDGES project (Register in Diachronic German Science) is an
investigation into the development of the German scientific language
in the early modern and modern periods, ranging from the mid 15th to
the 20th century. 

Within the RIDGES project we analyze scientific texts on all
linguistic levels (syntax, word formation, lexis, phraseology, textual
structure, etc.), so as to be able to identify and describe
developments and trends in the data. 

	Cited from https://www.linguistik.hu-berlin.de/en/institut-en/professuren-en/korpuslinguistik/research/ridges-projekt
	May 23, 2018


HistCorp inclusion date
------------------------
May 23, 2018


Website
--------
https://www.linguistik.hu-berlin.de/en/institut-en/professuren-en/korpuslinguistik/research/ridges-projekt


Licence
--------
Creative Commons Attribution 3.0 Unported License
(http://creativecommons.org/licenses/by/3.0/)


Citation
---------
Lüdeling, Anke; Odebrecht, Carolin; Perlitz, Laura; Zeldes, Amir;
RIDGES-Herbology (Version 8.0), Humboldt-Universität zu Berlin.
https://korpling.org/ridges/. http://hdl.handle.net/11022/0000-0007-C6A3-1


The HistCorp files
-------------------
On the HistCorp page, the German texts from 'RIDGES' are provided in
a plain text format with the original historical spelling preserved
('txt'), a tokenised format with the original historical spelling
preserved ('tok'), a plain text format where the words have been
manually normalised to a standardised modern spelling ('norm'), and for
a subset of the corpus a parsed tab-separated format based on the
normalised spelling ('anno').

The plain text files (both in the original spelling, and in the
normalised spelling) were created by extracting each token (words and
punctuations) from the original Paula XML files, adding space between
each token, except for some punctuation. In addition, metadata has
been added in a TEI-compatible format at the top of each file. The
metadata information was mainly extracted from the metadata stated in
the Paula-formatted files that are part of the RIDGES package. In
addition, the number of tokens for each file has been calculated based
on the tokenised file.

In the tokenised files, the texts are split into one token on each
line, following the tokenisation in the CoNLL file.

The parsed files are part of the original RIDGES package, and are
presented in a tab-separated column format (CoNLL format), with
information on sentence boundaries, word forms, lemmas, part-of-speech
tags, morphological tags, and dependency information.

Size: 58 texts, with a total of 254,630 tokens.

Genres: alchemy, astronomy, botany, gardening, kitchen, linguistics,
medicine, and religion.