Language Technology: Research and Development

Note that this page has been migrated from a previous server. There is thus a risk that not all links work correctly

Credits: 15 hp
Syllabus: 5LN714
Teachers: Sara Stymne, Eva Pettersson, Harald Hammarström, Fabienne Cap
Examiner and course coordinator: Sara Stymne

News

  • 2018-01-04: The schedule for the final workshop on January 12 are now available. Note that the workshop will run from 10.00 (sharp) to 16.15, followed by a social event.
  • 2017-12-06: Do not forget to register for EasyChair and send Sara an email with your name and the email address you used in EasyChair. This is vital for the reviewing to work!
  • 2017-11-13: Note that there has been schedule changes for the seminars on Wednesday November 15. Earlier there was a mismatch between the timeedit schedule and the webpage schedule.
  • 2017-09-28: The deadlines for reexamination have been updated, see below.
  • 2017-09-22: The take home exam is now available in studentportalen.
  • 2017-09-12: Note that the schedule for the first seminar has changed!

Schedule

Date Time Room Content Reading
L1
30/8
10-12
Turing
Introduction, UD (Sara), Historical texts (Eva), Morphology (Harald)

L2
6/9
10-12
Turing
Science and research Okasha
S3
13/9
10-12
2-0023, 2-0028, 3-0012
Seminar - research papers
See list below
S3
13/9
15-17
2-0023 (2)
Seminar - research papers
See list below
S3
14/9
10-12
2-0028 (1), 9-1017 (3)
Seminar - research papers
See list below
L4
21/9
10-12
Turing
Language technology research and development
Cunningham, Hovy & Spruit, Lee
S5
27/9
10-12
2-0023(1)
Seminar - research papers
See list below
S5
27/9
15-17
3-0012(2)
Seminar - research papers
See list below
S5
28/9
10-12
2-0025(3)
Seminar - research papers
See list below
T
3/10
10-12
Chomsky
LaTeX tutorial 1

L6
4/10
15-17
Turing
R&D projects - from proposal to implementation
Zobel 10-11, 13
S7
10/10
10-12
2-0024 (3)
Seminar - research papers See list below

11/10
10-12
Turing (1)
Seminar - research papers See list below

12/10
10-12
2-0023 (2)
Seminar - research papers See list below
S8
16/10
12-14
9-2029 (3)
Seminar - project proposals


18/10
10-12
2-0028 (2), 3-0012 (1)
Seminar - project proposals

S10
31/10
10-12
2-0027 (1)
Seminar - progress report

S10
1/11
14-16
2-0028 (2)
Seminar - progress report

S10
3/11
10-12
9-1061 (3)
Seminar - progress report

T
7/11
10-12
Chomsky
LaTeX tutorial 2

L11
8/11
10-12
Turing
Dissemination of research results
Zobel 1-9, 14
S12
15/11
10-12
2-0022 (1)
Seminar - progress report
S12
15/11
12-14
3-0012 (3)
Seminar - progress report
S12
15/11
14-16
3-0012 (2)
Seminar - progress report
L13
22/11
10-12
Turing
Review of scientific articles
Zobel 12
S14
29/11
10-12
7-0017 (1), 7-0015 (3)
Seminar - progress report
S14
29/11
16-18
2-0028 (2)
Seminar - progress report
S15
6/12
12-14
2-0028 (2)
Seminar - progress report
S15
6/12
14-16
2-0023 (1)
Seminar - progress report
S15
8/12
13-15
9-0029 (3)
Seminar - progress report
S16
12/1
10.00-16.15
7-0043
Seminar - term paper presentations

S16
12/1
16.15-
Department lunchroom (9-30??)
Social event

All lectures will be given by Sara. The seminars will be led by the seminar leader for each research group. Unless otherwise stated for the seminars, group 1 will be in the first room, group 2 in the second room, and group 3 in the third room.

Content

The course gives a theoretical and practical introduction to research and development in language technology. The theoretical part covers basic philosophy of science, research methods in language technology, project planning, and writing and reviewing of scientific papers. The practical part consists of a small project within a research area common to a subgroup of course participants, including a state-of-the-art survey in a reading group, the planning and implementation of a research task, and the writing of a paper according to the standards for scientific publications in language technology. The research areas for 2017 are:
  1. Universal Dependencies
  2. Morphology
  3. Historical texts

Examination

The course is examined by means of five assignments with different weights (see below). In order to pass the course, a student must pass all of these assignments. In order to pass the course with distinction, a student must pass at least 50% of the weighted graded assignments with distinction.

Assignments

  1. Take home exam on philosophy of science (15%)
    • This assignment will be based on your reading of Okasha's book. You will be asked to discuss issues in the philosophy of science and (sometimes) relate them to the area of language technology. The questions will be handed out September 22, and the report should be handed in September 29.
  2. Research paper presentation and discussion (15%)
    • You will present one of the papers discussed in the seminars. The task is to introduce the paper and lead the discussion, not to make a formal presentation. In addition you shall take active part in the discussion of all other papers discussed in the seminars. The seminars thus have obligatory attendance. This assignment is not graded and does not qualify for distinction.
  3. Project proposal (15%)
    • You will put together a 3-page proposal describing the project you are going to work on for the rest of the course, using the Swedish Research Council's guidelines for research programs, Appendix A (ignore other parts of the application as well as optional sections of the research program). You will also give a short presentation of the proposal in a seminar (8 minutes with slides). The deadline for the proposal is October 13, and the seminars will take place October 17-18.

      More information here

  4. Review of term papers (15%)
  5. Term paper (40%)
    • You will report your project in a paper following the guidelines of Transactions of the Association for Computational Linguistics (except that the page limit for your papers is 4-7 pages + references). The deadline is December 13 for the first version, which should be a complete finished version of your paper, and January 12 for the revised version. On January 12, you will also give an oral (obligatory) presentation of the paper. In addition there is obligatory attendance on all group seminars related to the project, where you will give a short progress report each seminar.

      Your grade for this assignment will mostly be based on your final report, but also your oral presentation and the first version of the report will be taken into account, as well as how well you managed to address your review comments.

      In addition, at the beginning of the course you should hand in your preference for which topic/project group you want to join. You should give your first and second preference for topics, in an email to Sara, with a deadline of September 4. In case it is not possible to give everyone one of their preferred topics, we will make a random selection between students with the same preference. In case someone fails to hand in a preference, they will be assigned any topic.

Deadlines

Here is a summary of all deadlines in the course.

TaskDeadlineExtra deadline
Choose your preferred topicsSeptember 4-
Hand in take home examSeptember 29October 27
Project proposalOctober 13November 3
Present project proposalOctober 17--18-
First version of project reportDecember 13January 12
Reviews on peer's project papersDecember 22January 12
Final seminarJanuary 12-
Final project reportJanuary 12February 2

In addition, all seminars listed in the schedule above have obligatory attendance! In case you fail to attend a seminar, contact the course coordinator to discuss how to compensate for this.

We use Studentportalen for all submissions, unless otherwise notified. All submissions are due at 23.59. Studentportalen then closes for submissions and we will not accept later submissions via email. If you miss a submission deadline you will have another submission chance around one month after the original submission deadline, see details in the table above. Reexamination for the oral tasks will be organized in connection with each written baseline. It is highly recommended to respect the original deadlines! In case you also fail to hand in your submission at this deadline, the next opportunity is during the next version of the course, autumn 2018.

Please take note of our general course assessement and examination policy. If there are special circumstances that make a regular submission impossible, you should inform us in good time before a deadline in order to have the chance to be granted an extra assessment opportunity. We decide on each case individually and extraordinary circumstances must apply.

Final Seminar/Workshop

The final seminar will be organized as a workshop with term paper presentations. The time slot for each paper is expected to be 15 minutes, to be divided into 12 minutes presentation and 3 minutes discussion. The session chairs will enforce the times strictly.

Research Groups

Below are preliminary groups and articles for the research seminars. The articles for the September 13 seminar are already decided. The articles for the remaining seminars are subject to change. The group division may also change, but we will contact each student individually if this is the case.

Note that only students who have passed the prerequisite courses may take the course!

Groups Members Papers
1: Universal Dependencies Vivian (Yang)Sep 13: Nivre et al. 2016
Paula PSep 13: Wisniewski and Lacroix 2017
GretaSep 27: Tiedemann 2015
JennySep 27: Dozat and Manning 2017
ZulipiyeSep 27: Nivre and Fang 2017
Paola MOct 11: Futrell et al. 2015
MaryOct 11: Östling 2015
2: Morphology HaoSep 13: Moon et al. 2009
JoanaSep 13: Chrupala 2008, Chapter 6
Sheffield (Xuefeng)Sep 27: Kann and Schütze 2016
SvenSep 27: Chan 2006
RenfunOct 11: Ma et al 2016
GabiOct 11: Östling and Tiedemann 2017
3: Historical texts OliverSep 13: Piotrowski 2012, chapter 6
YuhanSep 13: Piotrowski 2012, chapter 7
GongboSep 28: Pettersson et al. 2014
MaxSep 28: Korchagina 2017
YuchanOct 10: Hardmeier 2016
SaminaOct 10: Clark and Araki 2011
YiwenOct 10: Bollmann 2011

Reading

Below are a list of course literature. Note that the literature for each topic is largely a suggestion to get you started. You will also need to find your own literature related to your personal project. A subset of the articles below for each topic will be studied in detail at seminars.

Science and Research

Universal Dependencies

Morphology

  • Erwin Chan. Learning probabilistic paradigms for morphology in a latent class model. In Proceedings of the Eighth Meeting of the ACL Special Interest Group on Computational Phonology and Morphology at HLT-NAACL 2006, pages 69--78. Association for Computational Linguistics, New York City, USA, 2006. [.pdf ]
  • Grzegorz Chrupala. Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. PhD thesis, Dublin City University, 2008. [.pdf ]
  • Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407, 1990. [DOI | .pdf ]
  • Harald Hammarström and Lars Borin. Unsupervised learning of morphology. Computational Linguistics, 37(2):309--350, 2011. [.pdf ]
  • Mans Hulden. Foma: a finite-state compiler and library. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pages 29--32. Association for Computational Linguistics, 2009. [.pdf ]
  • Katharina Kann and Hinrich Schütze. MED: The LMU system for the SIGMORPHON 2016 shared task on morphological reinflection. In Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 62--70. Association for Computational Linguistics, Berlin, Germany, August 2016. [http ]
  • Bilal Khaliq. Unsupervised Learning of Arabic Non-Concatenative Morphology. PhD thesis, University of Sussex, 2015. [.pdf ]
  • Jianqiang Ma, Verena Henrich, and Erhard Hinrichs. Letter sequence labeling for compound splitting. In Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 76--81, Berlin, Germany, August 2016. Association for Computational Linguistics. [.pdf ]
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. Distributed representations of words and phrases and their compositionality. In Christopher J. C. Burges, Léon Bottou, Zoubin Ghahramani, and Kilian Q. Weinberger, editors, Advances in Neural Information Processing Systems 26 (NIPS 2013), pages 3111--3119. Neural Information Processing Systems, Lake Tahoe, Nevada, 2013. [.pdf ]
  • Taesun Moon, Katrin Erk, and Jason Baldridge. Unsupervised morphological segmentation and clustering with document boundaries. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 668--677. Association for Computational Linguistics, Singapore, August 2009. [.pdf ]
  • Robert Östling and Jörg Tiedemann. Continuous multilinguality with language vectors. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 644--649. Association for Computational Linguistics, 2017. [.pdf ]
  • Radim Řehůřek and Petr Sojka. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 45--50, Valletta, Malta, May 2010. ELRA. [.pdf ]
  • Benjamin Snyder and Regina Barzilay. Unsupervised multilingual learning for morphological segmentation. In Proceedings of ACL-08: HLT, pages 737--745, Columbus, Ohio, June 2008. Association for Computational Linguistics. [.pdf ]

Historical texts

  • Bollmann, Marcel (2011). POS Tagging for Historical Texts with Sparse Training Data. In Proceedings of the 7th Linguistic Annotation Workshop & Interoperability with Discourse, pages 11–18, Sofia, Bulgaria, August 8-9, 2013. Association for Computational Linguistics. [pdf]
  • Bollmann, Marcel; Petran, Florian; and Dipper, Stephanie (2011). Rule-Based Normalization of Historical Texts. In Proceedings of the International Workshop on Language Technologies for Digital Humanities and Cultural Heritage, pages 34–42, Hissar, Bulgaria. [pdf]
  • Clark, Eleanor and Araki, Kenji (2011). Text Normalization in Social Media: Progress, Problems and Applications for a Pre-Processing System of Casual English. In Procedia - Social and Behavioral Sciences 27 (2011) 2–11. [pdf]
  • Harald Hammarström, Shafqat Mumtaz Virk and Markus Forsberg (2017). Poor Man's OCR Post-Correction: Unsupervised Recognition of Variant Spelling Applied to a Multilingual Document Collection In Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage. Göttingen, Germany, June 1-2, 2017. [pdf]
  • Hardmeier, Christian (2016). A Neural Model for Part-of-Speech Tagging in Historical Texts. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 922–931, Osaka, Japan, December 11-17 2016. [pdf]
  • Jurish, Bryan (2008). Finding canonical forms for historical German text. In A. Storrer, A. Geyken, A. Siebert, and K.-M. Würzner, editors, Text Resources and Lexical Knowledge: Selected Papers from the 9th Conference on Natural Language Processing (KONVENS 2008), pages 27–37. Mouton de Gruyter, Berlin, 2008. [pdf]
  • Korchagina, Natalia (2017). Normalizing Medieval German Texts: from rules to deep learning. In Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language. [pdf]
  • Pettersson, Eva; Megyesi, Beáta; and Nivre, Joakim (2014). A Multilingual Evaluation of Three Spelling Normalisation Methods for Historical Text. In: Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH) @ EACL 2014, pages 32–41,Gothenburg, Sweden, April 26 2014. [pdf]
  • Pettersson, Eva; Megyesi, Beáta; and Tiedemann Jörg (2013). An SMT Approach to Automatic Annotation of Historical Text. In: Proceedings of the Workshop on Computational Historical Linguistics at NODALIDA 2013. NEALT Proceedings Series 18; Linköping Electronic Conference Proceedings 87:54–69. [pdf]
  • Piotrowski, Michael (2012). Natural Language Processing for Historical Texts, chapter 3: Spelling in Historical Texts. (freely accessible in digital format when logged in at the department)
  • Piotrowski, Michael (2012). Natural Language Processing for Historical Texts, chapter 6: Handling Spelling Variation. (freely accessible in digital format when logged in at the department)
  • Piotrowski, Michael (2012). Natural Language Processing for Historical Texts, chapter 7: NLP Tools for Historical Languages. (freely accessible in digital format when logged in at the department)
  • Sánchez-Marco, Cristina; Boleda, Gemma; and Padró, Lluís (2011). Extending the tool, or how to annotate historical language varieties. In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pages 1–9, Portland, OR, USA, 24 June 2011. Association for Computational Linguistics. [pdf]