---------------------------------------------------------------------------- The Nottingham Corpus of Early Modern German Midwifery and Women's Medicine ---------------------------------------------------------------------------- The Nottingham Corpus of Early Modern German Midwifery and Women’s Medicine (ca. 1500-1700), or the GeMi Corpus, provides a representative sample of the earliest German-language medical writing to appear in print, particularly in the areas of midwifery and women’s medicine related to pregnancy and childbirth. It is the first corpus devoted exclusively to Fachsprache, or specialised language usage, thus complementing extant corpora for German that focus more on several genres or just literary discourse. This is a particularly interesting time in the history of midwifery and gynaecology, for it is the period when there is increased vernacularization of European languages in scientific/medical writing, scholastic-based models of medicine are being abandoned, and practising midwives such as Louise Bourgeois and Justine Siegemund – frustrated with the perceived inadequecies of texts written by learned physicians – begin writing their own midwifery treatises. This corpus has been developed as part of the research project “Evidentiality and Genre in the Histories of English and German”: A corpus-driven investigation into the connection between genre and the development of evidential markers in the histories of English and German from the early modern period to the present. After some initial broad investigations, it was decided to focus on the development of evidential markers in the domain of scientific writing, specifically medical writing. However, it quickly became apparent that there were far more corpus resources available for early modern English than for early modern German, and thus the current corpus was created to partly remedy this. The electronic corpus was created between August 2015 and August 2016. The resource is a text corpus, with digital text files, in plain text, and TEI XML versions. Texts are taken primarily from digital facsimile copies available on-line via the University of Würzburg’s web-based library interface, Digitale Volltexte zur Geschichte der deutschen Fach- und Wissenschaftssprachen (http://www.fachtexte.germanistik.uni-wuerzburg.de/), particularly from the subcategory of Gynäkologie ‘gynaecology’ (http://www.fachtexte.germanistik.uni-wuerzburg.de/suche.php?suche=sachbereich&formular=go&sachbereich=30). Where this is not the case (mainly in the texts by Rösslin and Rüff), a 1910 facsimile copy (ed. Gustav Klein) was used as the basis of transcription instead. Diplomatic transcription, double keying, and proofreading. The data represent partial copies of the works in question (sometimes the majority of text was keyed in, but there are always some omissions in one form or another). Cited from https://ota.bodleian.ox.ac.uk/repository/xmlui/handle/20.500.12024/2562?show=full March 13, 2020 HistCorp inclusion date ------------------------ March 13, 2020 Website -------- https://ota.bodleian.ox.ac.uk/repository/xmlui/handle/20.500.12024/2562 Cite ----- Richard J Whitt, 2016, The Nottingham Corpus of Early Modern German Midwifery and Women's Medicine (ca. 1500-1700), Oxford Text Archive, http://hdl.handle.net/20.500.12024/2562. Licence -------- Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) (http://creativecommons.org/licenses/by-nc-sa/3.0/) The HistCorp files ------------------- On the HistCorp page, the German texts from the 'GeMi' corpus are provided in a plain text format ('txt'), and a tokenised format ('tok'). The plain text files are the same as in the original GeMi 'RAW Files' package, except that metadata has been added in a TEI-compatible format at the top of each txt file. The metadata information was mainly extracted from the metadata stated in the TEI files that are part of the GeMi package. In addition, the number of tokens for each file has been calculated based on the tokenised version of the file. In the tokenised files, the texts are split into one token on each line. Tokenisation was performed using the UDPipe tokeniser (https://ufal.mff.cuni.cz/udpipe) with the German language model provided as a baseline model in the CoNLL17 Shared Task (german-ud-2.0-conll17-170315.udpipe). Before tokenisation, all text within angle brackets was removed, since this is the way comments seem to be coded in the files. Size: 20 texts, with a total of 128,915 tokens. Genre: medicine (in particular midwifery).