----------------------------------------------------- The (open part of the) EDGeS Diachronic Bible Corpus ----------------------------------------------------- The EDGeS Diachronic Bible Corpus is a diachronically and synchronically parallel corpus of Bible translations in Dutch, English, German and Swedish, with texts from the 14th century until today. On the HistCorp platform, the public license subset of the corpus is provided. HistCorp inclusion date ------------------------ October 15, 2020 Website -------- https://spraakbanken.gu.se/en/resources/openedges Licence -------- Creative Commons BY-NC-SA 4.0 (https://creativecommons.org/licenses/by-nc-sa/4.0/) Cite ----- Bouma et al. (2020), The EDGeS Diachronic Bible Corpus. In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 5232--5239, Marseille, 11--16 May 2020. The HistCorp files ------------------- On the HistCorp page, the Swedish texts from 'EDGeS Diachronic Bible Corpus' are provided in a plain text format ('txt'), a tokenised format ('tok') and an annotated format ('anno') in which bible chapters and verses are annotated. The 'anno' files are the same as in the original EDGeS package, except that metadata has been added in a TEI-compatible format at the top of each txt file. The number of sentences and words are calculated on the basis of the segmentation in the tokenised file. The plain text files are identical to the 'anno' files, except that the chapter and verse annotation has been removed. In the tokenised files, the texts are split into one token on each line. Tokenisation was performed using the UDPipe tokeniser (https://ufal.mff.cuni.cz/udpipe) with the Swedish language model provided as a baseline model in the CoNLL17 Shared Task (swedish-ud-2.0-conll17-170315.udpipe). Size: 142 texts, with a total of 1,528,010 tokens. Genre: religion/biblical.