Language Technology: Research and Development
Note that this page has been migrated from a previous server. There is thus a risk that not all links work correctly
Credits: 15 hp
Syllabus: 5LN714
Teachers: Sara Stymne, Eva Pettersson, Harald Hammarström, Fabienne Cap
Examiner and course coordinator: Sara Stymne
News
- 2018-01-04: The schedule for the final workshop on January 12 are now available. Note that the workshop will run from 10.00 (sharp) to 16.15, followed by a social event.
- 2017-12-06: Do not forget to register for EasyChair and send Sara an email with your name and the email address you used in EasyChair. This is vital for the reviewing to work!
- 2017-11-13: Note that there has been schedule changes for the seminars on Wednesday November 15. Earlier there was a mismatch between the timeedit schedule and the webpage schedule.
- 2017-09-28: The deadlines for reexamination have been updated, see below.
- 2017-09-22: The take home exam is now available in studentportalen.
- 2017-09-12: Note that the schedule for the first seminar has changed!
Schedule
Date | Time | Room | Content | Reading | |
---|---|---|---|---|---|
L1 |
30/8 |
10-12 |
Turing |
Introduction, UD (Sara), Historical texts (Eva), Morphology (Harald) |
|
L2 |
6/9 |
10-12 |
Turing |
Science and research | Okasha |
S3 |
13/9 |
See list below |
|||
S3 |
13/9 |
15-17 |
2-0023 (2) |
Seminar - research papers |
See list below |
S3 |
14/9 |
10-12 |
2-0028 (1), 9-1017 (3) |
Seminar - research papers |
See list below |
L4 |
21/9 |
10-12 |
Turing |
Language technology research and development |
Cunningham, Hovy & Spruit, Lee |
S5 |
27/9 |
10-12 |
2-0023(1) |
Seminar - research papers |
See list below |
S5 |
27/9 |
15-17 |
3-0012(2) |
Seminar - research papers |
See list below |
S5 |
28/9 |
10-12 |
2-0025(3) |
Seminar - research papers |
See list below |
T |
3/10 |
10-12 |
Chomsky |
LaTeX tutorial 1 |
|
L6 |
4/10 |
15-17 |
Turing |
R&D projects - from proposal to implementation |
Zobel 10-11, 13 |
S7 |
10/10 |
10-12 |
2-0024 (3) |
Seminar - research papers | See list below |
11/10 |
10-12 |
Turing (1) |
Seminar - research papers | See list below |
|
12/10 |
10-12 |
2-0023 (2) |
Seminar - research papers | See list below |
|
S8 |
16/10 |
12-14 |
9-2029 (3) |
Seminar - project proposals |
|
18/10 |
10-12 |
2-0028 (2), 3-0012 (1) |
Seminar - project proposals |
||
S10 |
31/10 |
10-12 |
2-0027 (1) |
Seminar - progress report | |
S10 |
1/11 |
14-16 |
2-0028 (2) |
Seminar - progress report | |
S10 |
3/11 |
10-12 |
9-1061 (3) |
Seminar - progress report | |
T |
7/11 |
10-12 |
Chomsky |
LaTeX tutorial 2 |
|
L11 |
8/11 |
10-12 |
Turing |
Dissemination of research results | Zobel 1-9, 14 |
S12 |
15/11 |
10-12 |
2-0022 (1) |
Seminar - progress report | |
S12 |
15/11 |
12-14 |
3-0012 (3) |
Seminar - progress report | |
S12 |
15/11 |
14-16 |
3-0012 (2) |
Seminar - progress report | |
L13 |
22/11 |
10-12 |
Turing |
Review of scientific articles |
Zobel 12 |
S14 |
29/11 |
10-12 |
7-0017 (1), 7-0015 (3) |
Seminar - progress report | |
S14 |
29/11 |
16-18 |
2-0028 (2) |
Seminar - progress report | |
S15 |
6/12 |
12-14 |
2-0028 (2) |
Seminar - progress report | |
S15 |
6/12 |
14-16 |
2-0023 (1) |
Seminar - progress report | |
S15 |
8/12 |
13-15 |
9-0029 (3) |
Seminar - progress report | |
S16 |
12/1 |
10.00-16.15 |
7-0043 |
Seminar - term paper presentations |
|
S16 |
12/1 |
16.15- |
Department lunchroom (9-30??) |
Social event |
All lectures will be given by Sara. The seminars will be led by the seminar leader for each research group. Unless otherwise stated for the seminars, group 1 will be in the first room, group 2 in the second room, and group 3 in the third room.
Content
The course gives a theoretical and practical introduction to research and development in language technology. The theoretical part covers basic philosophy of science, research methods in language technology, project planning, and writing and reviewing of scientific papers. The practical part consists of a small project within a research area common to a subgroup of course participants, including a state-of-the-art survey in a reading group, the planning and implementation of a research task, and the writing of a paper according to the standards for scientific publications in language technology. The research areas for 2017 are:- Universal Dependencies
- Morphology
- Historical texts
Examination
The course is examined by means of five assignments with different weights (see below). In order to pass the course, a student must pass all of these assignments. In order to pass the course with distinction, a student must pass at least 50% of the weighted graded assignments with distinction.Assignments
- Take home exam on philosophy of science (15%)
- This assignment will be based on your reading of Okasha's book. You will be asked to discuss issues in the philosophy of science and (sometimes) relate them to the area of language technology. The questions will be handed out September 22, and the report should be handed in September 29.
- Research paper presentation and discussion (15%)
- You will present one of the papers discussed in the seminars. The task is to introduce the paper and lead the discussion, not to make a formal presentation. In addition you shall take active part in the discussion of all other papers discussed in the seminars. The seminars thus have obligatory attendance. This assignment is not graded and does not qualify for distinction.
- Project proposal (15%)
- You will put together a 3-page proposal describing the project you are going to work on for the rest of the course, using the Swedish Research Council's guidelines for research programs, Appendix A (ignore other parts of the application as well as optional sections of the research program). You will also give a short presentation of the proposal in a seminar (8 minutes with slides). The deadline for the proposal is October 13, and the seminars will take place October 17-18.
- Review of term papers (15%)
- You will review two term papers written by your course mates using the guidelines of Transactions of the Association for Computational Linguistics. You will receive the papers on December 14 and the reviews are due December 22.
- Term paper (40%)
- You will report your project in a paper following the guidelines of
Transactions of the Association for Computational Linguistics (except that the page limit for your papers is 4-7 pages + references).
The deadline is December 13 for the first version, which should be a complete finished version of your paper, and January 12 for the revised version. On January 12, you will also give an oral (obligatory) presentation of the paper. In addition there is obligatory attendance on all group seminars related to the project, where you will give a short progress report each seminar.
Your grade for this assignment will mostly be based on your final report, but also your oral presentation and the first version of the report will be taken into account, as well as how well you managed to address your review comments.
In addition, at the beginning of the course you should hand in your preference for which topic/project group you want to join. You should give your first and second preference for topics, in an email to Sara, with a deadline of September 4. In case it is not possible to give everyone one of their preferred topics, we will make a random selection between students with the same preference. In case someone fails to hand in a preference, they will be assigned any topic.
- You will report your project in a paper following the guidelines of
Transactions of the Association for Computational Linguistics (except that the page limit for your papers is 4-7 pages + references).
The deadline is December 13 for the first version, which should be a complete finished version of your paper, and January 12 for the revised version. On January 12, you will also give an oral (obligatory) presentation of the paper. In addition there is obligatory attendance on all group seminars related to the project, where you will give a short progress report each seminar.
Deadlines
Here is a summary of all deadlines in the course.
Task | Deadline | Extra deadline |
Choose your preferred topics | September 4 | - |
Hand in take home exam | September 29 | October 27 |
Project proposal | October 13 | November 3 |
Present project proposal | October 17--18 | - |
First version of project report | December 13 | January 12 |
Reviews on peer's project papers | December 22 | January 12 |
Final seminar | January 12 | - |
Final project report | January 12 | February 2 |
In addition, all seminars listed in the schedule above have obligatory attendance! In case you fail to attend a seminar, contact the course coordinator to discuss how to compensate for this.
We use Studentportalen for all submissions, unless otherwise notified. All submissions are due at 23.59. Studentportalen then closes for submissions and we will not accept later submissions via email. If you miss a submission deadline you will have another submission chance around one month after the original submission deadline, see details in the table above. Reexamination for the oral tasks will be organized in connection with each written baseline. It is highly recommended to respect the original deadlines! In case you also fail to hand in your submission at this deadline, the next opportunity is during the next version of the course, autumn 2018.
Please take note of our general course assessement and examination policy. If there are special circumstances that make a regular submission impossible, you should inform us in good time before a deadline in order to have the chance to be granted an extra assessment opportunity. We decide on each case individually and extraordinary circumstances must apply.
Final Seminar/Workshop
The final seminar will be organized as a workshop with term paper presentations. The time slot for each paper is expected to be 15 minutes, to be divided into 12 minutes presentation and 3 minutes discussion. The session chairs will enforce the times strictly.Research Groups
Below are preliminary groups and articles for the research seminars. The articles for the September 13 seminar are already decided. The articles for the remaining seminars are subject to change. The group division may also change, but we will contact each student individually if this is the case.Note that only students who have passed the prerequisite courses may take the course!
Groups | Members | Papers |
1: Universal Dependencies | Vivian (Yang) | Sep 13: Nivre et al. 2016 |
Paula P | Sep 13: Wisniewski and Lacroix 2017 | |
Greta | Sep 27: Tiedemann 2015 | |
Jenny | Sep 27: Dozat and Manning 2017 | |
Zulipiye | Sep 27: Nivre and Fang 2017 | |
Paola M | Oct 11: Futrell et al. 2015 | |
Mary | Oct 11: Östling 2015 | |
2: Morphology | Hao | Sep 13: Moon et al. 2009 |
Joana | Sep 13: Chrupala 2008, Chapter 6 | |
Sheffield (Xuefeng) | Sep 27: Kann and Schütze 2016 | |
Sven | Sep 27: Chan 2006 | |
Renfun | Oct 11: Ma et al 2016 | |
Gabi | Oct 11: Östling and Tiedemann 2017 | |
3: Historical texts | Oliver | Sep 13: Piotrowski 2012, chapter 6 |
Yuhan | Sep 13: Piotrowski 2012, chapter 7 | |
Gongbo | Sep 28: Pettersson et al. 2014 | |
Max | Sep 28: Korchagina 2017 | |
Yuchan | Oct 10: Hardmeier 2016 | |
Samina | Oct 10: Clark and Araki 2011 | |
Yiwen | Oct 10: Bollmann 2011 |
Reading
Below are a list of course literature. Note that the literature for each topic is largely a suggestion to get you started. You will also need to find your own literature related to your personal project. A subset of the articles below for each topic will be studied in detail at seminars.
Science and Research
- Okasha, S. (2002) Philosophy of Science: A Very Short Introduction. Oxford University Press. Chapters 1-3 and 5. (Obligatory)
- Zobel, J. (2004) Writing for Computer Science. Second Edition. Springer.
- Cunningham, H. (1999) A definition and short history of Language Engineering. Natural Language Engineering 5 (1), 1-16.
- Hovy, D. and Spruit, S. L. (2016) The Social Impact of Natural Language Processing. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 591-598.
- Lee, L. (2004) "I'm sorry Dave, I'm afraid I can't do that": Linguistics, Statistics, and Natural Language Processing circa 2001. In Computer Science: Reflections on the Field, Reflections from the Field, 111-118.
Universal Dependencies
- Berzak, Y., Kenney, J., Spadine, C., Wang, J.-X., Lam, L., Mori, K.S., Garza, S. and Katz, B. (2016) Universal Dependencies for Learner English. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 737-746.
- De Marneffe, M.-C., Dozat, T., Silveira, N., Haverinen, K., Ginter, F., Nivre, J. and Manning, C.D. (2014) Universal Stanford Dependencies: A Cross-Linguistic Typology In Proceedings of the Ninth International Conference on Language Resources and Evaluation, 4585-4592.
- Dozat, T., Qi, P. and Manning, C. D. (2017) Stanford's Graph-based Neural Dependency Parser at the CoNLL 2017 Shared Task. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, 20-30
- Futrell, R., Mahowald, K. and Gibson, E. (2015) Quantifying Word Order Freedom in Dependency Corpora. In Proceedings of the Third International Conference on Dependency Linguistics, 91–100.
- Levshina, N., (2017) Does Syntactic Informativity Predict Word Length? A Cross-linguistic Study Based on the Universal Dependencies Corpora In Proceedings of the NoDaLiDa Workshop on Universal Dependencies (UDW 2017).
- McDonald, R., Nivre, J., Quirmbach-Brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Täckström, O., Bedini, C., Bertomeu Castelló, N. and Lee, J. (2013) Universal Dependency Annotation for Multilingual Parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 92-97.
- McDonald, R., Petrov, S. and Hall, K. (2011) Multi-Source Transfer of Delexicalized Dependency Parsers. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 62-72.
- Nivre, J. (2015) Towards a Universal Grammar of Natural Language Processing. In Computational Linguistics and Intelligent Text Processing, 3-16.
- Nivre, J., de Marneffe, M.-C., Ginter, F., Goldberg, Y., Hajic, J., Manning, C.D., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., Tsarfaty, R. and Zeman, D. (2016) Universal Dependencies v1: A Multilingual Treebank Collection. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016).
- Nivre, J. and Fang, C. (2017) Universal Dependency Evaluation In Proceedings of the NoDaLiDa Workshop on Universal Dependencies (UDW 2017).
- Östling, R. (2015) Word Order Typology through Multilingual Word Alignment. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Short Papers), 205–211.
- Straka, M., Hajic, J. and Straková, J. (2016) UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016).
- Swanson, B. and Charniak, E. (2014) Data Driven Language Transfer Hypotheses. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 169–173.
- Tiedemann, J. (2015) Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels. In Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015), 340--349.
- Waleed, A., Mulcaire, G., Ballesteros, M., Dyer, C., Smith and N. (2016) Many Languages, One Parser. Transactions of the Association for Computational Linguistics 4, 431-444.
- Wisniewski, G. and Lacroix, O. (2017) A Systematic Comparison of Syntactic Representations of Dependency Parsing In Proceedings of the NoDaLiDa Workshop on Universal Dependencies (UDW 2017).
- Zeman, D., Marecek, D., Popel, M., Ramasamy, L., Stepánek, J., Zabokrtský, Z., Hajic, J. (2012) HamleDT: To Parse or Not to Parse?. In Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2735-2741.
Morphology
- Erwin Chan. Learning probabilistic paradigms for morphology in a latent class model. In Proceedings of the Eighth Meeting of the ACL Special Interest Group on Computational Phonology and Morphology at HLT-NAACL 2006, pages 69--78. Association for Computational Linguistics, New York City, USA, 2006. [.pdf ]
- Grzegorz Chrupala. Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. PhD thesis, Dublin City University, 2008. [.pdf ]
- Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407, 1990. [DOI | .pdf ]
- Harald Hammarström and Lars Borin. Unsupervised learning of morphology. Computational Linguistics, 37(2):309--350, 2011. [.pdf ]
- Mans Hulden. Foma: a finite-state compiler and library. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pages 29--32. Association for Computational Linguistics, 2009. [.pdf ]
- Katharina Kann and Hinrich Schütze. MED: The LMU system for the SIGMORPHON 2016 shared task on morphological reinflection. In Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 62--70. Association for Computational Linguistics, Berlin, Germany, August 2016. [http ]
- Bilal Khaliq. Unsupervised Learning of Arabic Non-Concatenative Morphology. PhD thesis, University of Sussex, 2015. [.pdf ]
- Jianqiang Ma, Verena Henrich, and Erhard Hinrichs. Letter sequence labeling for compound splitting. In Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 76--81, Berlin, Germany, August 2016. Association for Computational Linguistics. [.pdf ]
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. Distributed representations of words and phrases and their compositionality. In Christopher J. C. Burges, Léon Bottou, Zoubin Ghahramani, and Kilian Q. Weinberger, editors, Advances in Neural Information Processing Systems 26 (NIPS 2013), pages 3111--3119. Neural Information Processing Systems, Lake Tahoe, Nevada, 2013. [.pdf ]
- Taesun Moon, Katrin Erk, and Jason Baldridge. Unsupervised morphological segmentation and clustering with document boundaries. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 668--677. Association for Computational Linguistics, Singapore, August 2009. [.pdf ]
- Robert Östling and Jörg Tiedemann. Continuous multilinguality with language vectors. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 644--649. Association for Computational Linguistics, 2017. [.pdf ]
- Radim Řehůřek and Petr Sojka. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 45--50, Valletta, Malta, May 2010. ELRA. [.pdf ]
- Benjamin Snyder and Regina Barzilay. Unsupervised multilingual learning for morphological segmentation. In Proceedings of ACL-08: HLT, pages 737--745, Columbus, Ohio, June 2008. Association for Computational Linguistics. [.pdf ]
Historical texts
- Bollmann, Marcel (2011). POS Tagging for Historical Texts with Sparse Training Data. In Proceedings of the 7th Linguistic Annotation Workshop & Interoperability with Discourse, pages 11–18, Sofia, Bulgaria, August 8-9, 2013. Association for Computational Linguistics. [pdf]
- Bollmann, Marcel; Petran, Florian; and Dipper, Stephanie (2011). Rule-Based Normalization of Historical Texts. In Proceedings of the International Workshop on Language Technologies for Digital Humanities and Cultural Heritage, pages 34–42, Hissar, Bulgaria. [pdf]
- Clark, Eleanor and Araki, Kenji (2011). Text Normalization in Social Media: Progress, Problems and Applications for a Pre-Processing System of Casual English. In Procedia - Social and Behavioral Sciences 27 (2011) 2–11. [pdf]
- Harald Hammarström, Shafqat Mumtaz Virk and Markus Forsberg (2017). Poor Man's OCR Post-Correction: Unsupervised Recognition of Variant Spelling Applied to a Multilingual Document Collection In Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage. Göttingen, Germany, June 1-2, 2017. [pdf]
- Hardmeier, Christian (2016). A Neural Model for Part-of-Speech Tagging in Historical Texts. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 922–931, Osaka, Japan, December 11-17 2016. [pdf]
- Jurish, Bryan (2008). Finding canonical forms for historical German text. In A. Storrer, A. Geyken, A. Siebert, and K.-M. Würzner, editors, Text Resources and Lexical Knowledge: Selected Papers from the 9th Conference on Natural Language Processing (KONVENS 2008), pages 27–37. Mouton de Gruyter, Berlin, 2008. [pdf]
- Korchagina, Natalia (2017). Normalizing Medieval German Texts: from rules to deep learning. In Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language. [pdf]
- Pettersson, Eva; Megyesi, Beáta; and Nivre, Joakim (2014). A Multilingual Evaluation of Three Spelling Normalisation Methods for Historical Text. In: Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH) @ EACL 2014, pages 32–41,Gothenburg, Sweden, April 26 2014. [pdf]
- Pettersson, Eva; Megyesi, Beáta; and Tiedemann Jörg (2013). An SMT Approach to Automatic Annotation of Historical Text. In: Proceedings of the Workshop on Computational Historical Linguistics at NODALIDA 2013. NEALT Proceedings Series 18; Linköping Electronic Conference Proceedings 87:54–69. [pdf]
- Piotrowski, Michael (2012). Natural Language Processing for Historical Texts, chapter 3: Spelling in Historical Texts. (freely accessible in digital format when logged in at the department)
- Piotrowski, Michael (2012). Natural Language Processing for Historical Texts, chapter 6: Handling Spelling Variation. (freely accessible in digital format when logged in at the department)
- Piotrowski, Michael (2012). Natural Language Processing for Historical Texts, chapter 7: NLP Tools for Historical Languages. (freely accessible in digital format when logged in at the department)
- Sánchez-Marco, Cristina; Boleda, Gemma; and Padró, Lluís (2011). Extending the tool, or how to annotate historical language varieties. In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pages 1–9, Portland, OR, USA, 24 June 2011. Association for Computational Linguistics. [pdf]