-------- RIDGES -------- The RIDGES project (Register in Diachronic German Science) is an investigation into the development of the German scientific language in the early modern and modern periods, ranging from the mid 15th to the 20th century. Within the RIDGES project we analyze scientific texts on all linguistic levels (syntax, word formation, lexis, phraseology, textual structure, etc.), so as to be able to identify and describe developments and trends in the data. Cited from https://www.linguistik.hu-berlin.de/en/institut-en/professuren-en/korpuslinguistik/research/ridges-projekt May 23, 2018 HistCorp inclusion date ------------------------ May 23, 2018 Website -------- https://www.linguistik.hu-berlin.de/en/institut-en/professuren-en/korpuslinguistik/research/ridges-projekt Licence -------- Creative Commons Attribution 3.0 Unported License (http://creativecommons.org/licenses/by/3.0/) Citation --------- Lüdeling, Anke; Odebrecht, Carolin; Perlitz, Laura; Zeldes, Amir; RIDGES-Herbology (Version 8.0), Humboldt-Universität zu Berlin. https://korpling.org/ridges/. http://hdl.handle.net/11022/0000-0007-C6A3-1 The HistCorp files ------------------- On the HistCorp page, the German texts from 'RIDGES' are provided in a plain text format with the original historical spelling preserved ('txt'), a tokenised format with the original historical spelling preserved ('tok'), a plain text format where the words have been manually normalised to a standardised modern spelling ('norm'), and for a subset of the corpus a parsed tab-separated format based on the normalised spelling ('anno'). The plain text files (both in the original spelling, and in the normalised spelling) were created by extracting each token (words and punctuations) from the original Paula XML files, adding space between each token, except for some punctuation. In addition, metadata has been added in a TEI-compatible format at the top of each file. The metadata information was mainly extracted from the metadata stated in the Paula-formatted files that are part of the RIDGES package. In addition, the number of tokens for each file has been calculated based on the tokenised file. In the tokenised files, the texts are split into one token on each line, following the tokenisation in the CoNLL file. The parsed files are part of the original RIDGES package, and are presented in a tab-separated column format (CoNLL format), with information on sentence boundaries, word forms, lemmas, part-of-speech tags, morphological tags, and dependency information. Size: 58 texts, with a total of 254,630 tokens. Genres: alchemy, astronomy, botany, gardening, kitchen, linguistics, medicine, and religion.