Language Technology: Research and Development

Note that this page has been migrated from a previous server. There is thus a risk that not all links work correctly

Credits: 15 hp
Syllabus: 5LN714
Staff
Course coordinator and examiner: Sara Stymne
Lectures: Sara Stymne
Group leaders: Meriem Beloucif, Beáta Megyesi, Johan Sjons

News

230109: The final presentation schedule is available
221007. Note that there are a couple of schedule changes for November lectures
220627: First tentative and preliminary version of the course web page.

General information

This page contains the general information for the course Language Technology: Research and Development, autumn 2022. Course information is available here, and slides will be posted in the annotated schedule available here. We will also use the Studium system for handing in assignments, keeping track of your progress, posting Zoom links, and similar information.

Schedule

Preliminary schedule.

Date Time Room Content Reading

L1
30/8
10-12
Zoom and 22-1017
Introduction, Digging, NN & Language, Low-resource

L2
~~2/9~~
~~13-15~~
~~2-K1024~~
Science, research, and development Okasha, Cunningham, Lee

L2
5/9
8-10
22-1017
Science, research, and development Okasha, Cunningham, Lee

L3
6/9
10-12
Chomsky+Turing
Science, research, and development 2: debate session
Okasha, Hovy and Spruit

S1
7/9
15-17
2-0025 (lrl), 2-0026 (nnl), 2-0027 (dig)
Seminar - research papers

Lx
13/9
13-15
Ångström: Polhemssalen 10134
UPPMAX lecture

S2
19/9
13-15
2-0023 (lrl), 2-0028 (nnl), 2-0025 (dig)
Seminar - research papers

L4
23/9
8-10
22-1017
R&D projects - from proposal to implementation
Zobel 10-11, 13

S3
27/9
10-12
22-1008 (lrl), 2-K1072 (nnl), 22-0025 (dig)
Seminar - research papers

S4
11/10
13-16
Blåsenhus 12:232 (lrl), Blåsenhus 12:234 (nnl), Blåsenhus 13:132 (dig)
Seminar - project proposals

S5
26/10
10-12
2-0028 (lrl), 2-0026 (nnl), 2-0025 (dig)
Seminar - progress report

L5
3/11
10-12
Blåsenhus 21:136
Dissemination of research results
Zobel 1-9, 14

S6
9/11
10-12
22-1008 (lrl), 9-3068 (nnl), 22-1005 (dig)
Seminar - progress report, theme: ethics
Hovy and Spruit; Bender et al.

Lab
16/11
10-12
Chomsky+Turing
Latex tutorial

S8
23/11
10-12
22-1008 (lrl), 22-1005 (nnl), 9-3068 (dig)
Seminar - progress report

L6
29/11
10-12
Blåsenhus: Bertil Hammer (24:K104)
Review of scientific articles
Zobel 14

S9
7/12
10-12
22-1008 (lrl), 9-3068 (nnl), 22-1005 (dig)
Seminar - progress report

FS
12/1
8-16
7-0042, 7-0043
Final workshop - term paper presentations

12/1
16-
?
Social event?

	Date	Time	Room	Content	Reading
L1	30/8	10-12	Zoom and 22-1017	Introduction, Digging, NN & Language, Low-resource
L2	~~2/9~~	~~13-15~~	~~2-K1024~~	Science, research, and development	Okasha, Cunningham, Lee
L2	5/9	8-10	22-1017	Science, research, and development	Okasha, Cunningham, Lee
L3	6/9	10-12	Chomsky+Turing	Science, research, and development 2: debate session	Okasha, Hovy and Spruit
S1	7/9	15-17	2-0025 (lrl), 2-0026 (nnl), 2-0027 (dig)	Seminar - research papers
Lx	13/9	13-15	Ångström: Polhemssalen 10134	UPPMAX lecture
S2	19/9	13-15	2-0023 (lrl), 2-0028 (nnl), 2-0025 (dig)	Seminar - research papers
L4	23/9	8-10	22-1017	R&D projects - from proposal to implementation	Zobel 10-11, 13
S3	27/9	10-12	22-1008 (lrl), 2-K1072 (nnl), 22-0025 (dig)	Seminar - research papers
S4	11/10	13-16	Blåsenhus 12:232 (lrl), Blåsenhus 12:234 (nnl), Blåsenhus 13:132 (dig)	Seminar - project proposals
S5	26/10	10-12	2-0028 (lrl), 2-0026 (nnl), 2-0025 (dig)	Seminar - progress report
L5	3/11	10-12	Blåsenhus 21:136	Dissemination of research results	Zobel 1-9, 14
S6	9/11	10-12	22-1008 (lrl), 9-3068 (nnl), 22-1005 (dig)	Seminar - progress report, theme: ethics	Hovy and Spruit; Bender et al.
Lab	16/11	10-12	Chomsky+Turing	Latex tutorial
S8	23/11	10-12	22-1008 (lrl), 22-1005 (nnl), 9-3068 (dig)	Seminar - progress report
L6	29/11	10-12	Blåsenhus: Bertil Hammer (24:K104)	Review of scientific articles	Zobel 14
S9	7/12	10-12	22-1008 (lrl), 9-3068 (nnl), 22-1005 (dig)	Seminar - progress report
FS	12/1	8-16	7-0042, 7-0043	Final workshop - term paper presentations
	12/1	16-	?	Social event?

All lectures will be given by Sara. The seminars will be led by the seminar leader for each research group. Note that attendance is obligatory at all seminars. The course is campus-based.

Content

The course gives a theoretical and practical introduction to research and development in language technology. The theoretical part covers basic philosophy of science, research methods in language technology, project planning, and writing and reviewing of scientific papers. The practical part consists of a small project within a research area common to a subgroup of course participants, including a state-of-the-art survey in a reading group, the planning and implementation of a research task, and the writing of a paper according to the standards for scientific publications in language technology. The research areas, with teachers, for 2022 are:

Digging the past: Digital Philology and the Analysis of Historical Sources (dig) - Beáta Megyesi
Low-resource languages (lrl) - Meriem Beloucif
Neural networks and language (nnl) - Johan Sjons

Examination

The course is examined by means of five assignments with different weights (see below). In order to pass the course, a student must pass each of one of these. In order to pass the course with distinction, a student must pass at least 50% of the weighted graded assignments with distinction.

Assignments

Take home exam on philosophy of science (15%)
- This assignment will be based on your reading of Okasha's book. You will be asked to discuss issues in the philosophy of science and (sometimes) relate them to the area of language technology. The questions will be handed out September 8, and the report should be handed in September 14.
Research paper presentation and discussion (15%)
- You will present one of the papers discussed in the seminars. The task is to introduce the paper and lead the discussion, not to make a formal presentation (briefly summarize the paper (~2 min), discuss the main points being made, bring up difficult to understand parts, initiate a discussion by proposing themes to discuss). In addition you shall take active part in the discussion of all other papers discussed in the seminars. The seminars have obligatory attendance; if you miss a seminar, you have to write a short report instead. This assignment is not graded and does not qualify for distinction.
Project proposal (15%)
- You will put together a research proposal consisting of two parts, using an adapted version of the Swedish Research Council's guidelines for research plans. The major part is a 3-page scientific proposal describing the project you are going to work on for the rest of the course. In addition you should write a short popular science abstract describing your proposal in such a way that it is accessible to the general public, consisting of maximum 2000 characters. See further instructions.
  You will also give a short presentation of the proposal in a seminar (8 minutes with slides, plus time for questions and discussions). The deadline for the written proposal is October 6, and the seminars will take place October 11.
Review of term papers (15%)
- You will review two term papers written by your course mates. You will use a set of guidelines which will be specified later. You will receive the papers on December 14 and the reviews are due December 22.
Term paper (40%)
- You will report your project in a paper following the guidelines of Transactions of the Association for Computational Linguistics (except that the page limit for your papers is 4-7 pages + references). The term paper should be written as a standard computational linguistics paper in TACL, and contain research questions, an introduction motivating and describing the work, an overview of related work and a description of how it relates to your work, a description of your experiments, a presentation and analysis of the results, and a conclusion. The deadline is December 13 for the first version and January 13 for the revised final version. The first version should be a complete version of the project, without any missing sections or parts, which should then be revised, taking review comments into account, for the final version. On January 12, you should give an oral presentation of the paper. As part of your work on the project, it is obligatory to attend the progress report seminars as well as the final workshop. If you miss a seminar, you have to write a short report instead.
  The main grade for the term paper is based on the final report, but we also take the into account the first version, how well you incorporate the review comments, and your oral presentation into account.

Note that you will also practice writing and presenting for different audiences during the course. The final paper and presentation are targeted at experts in language technology (but not necessarily experts in your particular research theme). Your scientific proposal and presentation is targeted at academics, but not necessarily in language technology, but potentially in neighboring fields as well, such as linguistics or computer science (which typically make up reviewing boards at agencies such as the Swedish research council). Your popular science proposal abstract are targeted at the general public, and should not require any prior knowledge of language technolgy to understand.

All assignments you hand in during the course are individual. We welcome discussions between members of each theme (and between the themes as well), but the final reports you hand in should be your own work, written in your own words. You should use community standards for citing work that is related to your work. Note that you should always write about other work in your own words; changing a few words in each sentence from a paper you read is not acceptable. You should also remember to give credit to images (and only reproduce images if they are published under a permissive license like Creative Commons) and code. If you use or build on code by someone else, you should clearly state that in your report, and provide an appropriate citation and/or link to the code

Submitting and Reviewing Term papers

We plan on using a real conference management system for submission and review of papers. Detailed instructions will be provided.

Final Seminar/Workshop

The final seminar will be organized as a workshop with term paper presentations. The plan is for the final workshop to be on Campus only.

Detailed instructions will be provided before the seminar.

Research Groups

Your first task in the course is to make a wish for which research topic to work on. Send a ranked list of your preference for the three topics by email to Sara, at the latest Thursday September 1, at 13.00. You may indicate if your preference for your first choice is a very strong preference. If you fail to make a wish by this deadline you will be arbitrarily assigned to a topic. We will try our best to respect everyone's (strong) wishes, but if it turns out not to be possible, we will resort to random decisions.

Groups Members Papers

Digging the past - Bea Iliana Sep 7: Piotrowski, Chap 3, 2012

Matilde Sep 7: Piotrowski, Chap 6, 2012

Yaru Sep 7: Van Strien et al., 2020

Zhaorui Sep 19: Piotrowski, Chap 7, 2012

Ebba Sep 19: Bollman, 2019

Félicien Sep 19: Hedderich et al., 2021

(Bea) Sep 27: Hammond et al., 2013

Yahui Sep 27: Hovy and Lavid, 2010

Mathias Sep 27: Peng et al. 2021

Low resource langauges - Meriem Kristóf Sep 7: Hedderich et al., 2021

Hengyu Sep 7: Kann et al., 2019

Micaella Sep 7: Chapter 2 from B.P King, 2015

Jie Sep 19: Zoph et al., 2016

Aleksandra Sep 19: Gupta et al., 2018

Moa Sep 19: Li et al., 2020

Ingrid Sep 27: Cruz and Chen, 2019

Alex Sep 27: Yang et al., 2022

(Meriem) Sep 27: Abdaoui et al., 2021

Neural networks and language - Johan Maya Sep 7: Linzen and Baroni, 2021

Yiu Kei Sep 7: Van Schijndel, Mueller and Linzen, 2020

Ali Sep 7: McCoy, Frank and Linzen, 2020

Yini Sep 19: Marvin and Linzen, 2018

Björn Sep 19: Warstadt, Singh and Bowman, 2019

Nicole Sep 19: McCoy et al., 2021

Agnieszka Sep 27: Lakretz, Dehaene and King, 2020

Thea Sep 27: Sinha et al., 2020

(Johan) Sep 27: Jaeger and Buz, 2017

Computational resources

For those who need access to a cluster for their computational needs, you will get access to the Snowy cluster at UPPMAX.

In order to use the UPPMAX cluster, you will first have to apply for an account. You should then apply to two projects:

UPPMAX 2022/2-17: This is a course project, and it gives you access to storage, under the folder /proj/uppmax2022-2-17. You can create a personal folder there, and store your data. Please remember that this is a shared space, though, so try and remove files no longer needed.
UPPMAX 2020/2-2: Use this account when you run GPU jobs, since members of it get priority in the GPU queue. FOR CPU jobs, please use the course project.

Information about using Snowy, is available here. Note that you login to Rackham. You can only run light jobs directly on Rackham (like copying files, looking at files, et.c.). In order to run heavy jobs, you need to write a Slurm script, and execute it on Snowy. See the UPPMAX SLURM user guide to learn more about it. Here is an example Slurm script, from last year's MT course.

This year we will also have a lecture with UPPMAX staff, including a visit to the server hall.

Deadlines

Here is a summary of all deadlines in the course.

Task Deadline Extra deadline

Choose your preferred topics September 1, 13:00 -

Take home exam September 14 November 11

Literature seminars In class October 21

Project proposal October 6 November 4

Present project proposal October 11 By agreement

First version of project report December 13 January 13

Reviews on peer's project papers December 22 January 20 (February 17)

Final seminar January 12 By agreement

Final project report January 13 February 17

All deadlines are at 23.59 on the respective date unless otherwise noted.

Note that it is important for you to finish the course on time, since it is a requirement for starting your master thesis. So try to avoid resorting to the backup deadlines, since it will likely mean you cannot finish the course on time!

Reading

Science and Research

Okasha, S. (2002) Philosophy of Science: A Very Short Introduction. Oxford University Press. Obligatory
Either the first or second edition are possible to use.
Zobel, J. (2004) Writing for Computer Science. Second Edition. Springer.

Bender, E. M., Gebru, T., McMillan-Major, A. and Shmitchell, S. (2020) On the dangers of stochastic parrots. Can language models be too big? FAccT'21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency
Cunningham, H. (1999) A definition and short history of Language Engineering. Natural Language Engineering 5 (1), 1-16.
Hovy, D. and Spruit, S. L. (2016) The Social Impact of Natural Language Processing. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 591-598.
Lee, L. (2004) "I'm sorry Dave, I'm afraid I can't do that": Linguistics, Statistics, and Natural Language Processing circa 2001. In Computer Science: Reflections on the Field, Reflections from the Field, 111-118.

Digging the past

Gustavo Aguilar, Sudipta Kar, Thamar Solorio (2020) LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation. In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 1803–1813
Yuri Bizzoni, Stefania Degaetano-Ortlieb, Peter Fankhauser, and Elke Teich (2020) Linguistic Variation and Change in 250 Years of English Scientific Writing: A Data-Driven Approach. Front. Artif. Intell.
Marcel Bollmann (2011) POS Tagging for Historical Texts with Sparse Training Data. In Proceedings of the 7th Linguistic Annotation Workshop & Interoperability with Discourse, pages 11–18, Sofia, Bulgaria, August 8-9, 2013. Association for Computational Linguistics.
Marcel Bollmann, Florian Petran, and Stephanie Dipper (2011) Rule-Based Normalization of Historical Texts. In Proceedings of the International Workshop on Language Technologies for Digital Humanities and Cultural Heritage, pages 34–42, Hissar, Bulgaria.
Marcel Bollmann, Florian Petran, Stefanie Dipper, Julia Krasselt. (2014) CorA: A web-based annotation tool for historical and other non-standard language data. Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH) @ EACL 2014, pages 86–90, Gothenburg, Sweden, April 26 2014.
Marcel Bollmann. (2019) A Large-Scale Comparison of Historical Text Normalization Systems. In Proceedings of NAACL-HLT 2019, pages 3885–3898 Minneapolis, Minnesota, June 2 - June 7, 2019 [SEM 2]
Claire Bowern (2020). Semantic change and Semantic Stability: Variation is Key. In Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change, pages 48–55 Florence, Italy, August 2, 2019. c 2019 Association for Computational Linguistics
Julian Brooke, Adam Hammond and Graeme Hirst. (2015) GutenTag: An NLP-driven Tool for Digital Humanities Research in the Project Gutenberg Corpus. In Proceedings of NAACL-HLT Fourth Workshop on Computational Linguistics for Literature, pages 42–47.
Mark Davies. 2012. Expanding Horizons in Historical Linguistics with the 400-Million Word Corpus of Historical American English. Corpora, 7(2):121–157.
Rob van der Goot and Özlem Çetinoğlu (2021) Lexical Normalization for Code-switched Data and its Effect on POS Tagging. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics.
Adam Hammond, Julian Brooke, and Graeme Hirst (2013). A Tale of Two Cultures: Bringing Literary Analysis and Computational Linguistics Together. In Proceedings of the Second Workshop on Computational Linguistics for Literature, pages 1–8. [SEM 3]
Christian Hardmeier (2016) A Neural Model for Part-of-Speech Tagging in Historical Texts. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 922–931, Osaka, Japan, December 11-17 2016.
Michael A. Hedderich, Lukas Lange Heike Adel, Jannik Strötgen & Dietrich Klakow (2021) A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2545–2568 June 6–11, 2021. [SEM 2]
Mark J Hill and Simon Hengchen. 2019. Quantifying the impact of dirty OCR on historical text analysis: Eighteenth Century Collections Online as a case study. Digital Scholarship in the Humanities, 34(4):825–843.
Eduard Hovy and Julia Lavid (2010). Towards a ‘Science’ of Corpus Annotation: A New Methodological Challenge for Corpus Linguistics. International Journal of Translation Vol. 22, No. 1, Jan-Jun 2010 [SEM 3]
Gerhard Jäger (2018) Computational historical linguistics. De Gruyter
Natalia Korchagina, (2017) Normalizing Medieval German Texts: from rules to deep learning. In Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language.
Andrey Kutuzov, Lilja Øvrelid, Terrence Szymanski, and Erik Velldal. 2018. Diachronic word embeddings and semantic shifts: A survey. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1384–1397, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Xutan Peng, Yi Zheng, Chenghua Lin and Advaith Siddharthan. (2021) Summarising Historical Text in Modern Languages. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. [SEM 3]
Eva Pettersson, Beáta Megyesi, and Joakim Nivre (2014) A Multilingual Evaluation of Three Spelling Normalisation Methods for Historical Text. In Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH) @ EACL 2014, pages 32–41, Gothenburg, Sweden, April 26 2014.
Michael Piotrowski (2012) Natural Language Processing for Historical Texts, chapter 3: Spelling in Historical Texts (freely accessible in digital format when logged in at the department) [SEM 1]
Michael Piotrowski (2012) Natural Language Processing for Historical Texts, chapter 6: Handling Spelling Variation (freely accessible in digital format when logged in at the department) [SEM 1]
Michael Piotrowski (2012) Natural Language Processing for Historical Texts, chapter 7: NLP Tools for Historical Languages (freely accessible in digital format when logged in at the department) [SEM 2]
Cristina Sánchez-Marco, Gemma Boleda, and Lluís Padró (2011) Extending the tool, or how to annotate historical language varieties. In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pages 1–9, Portland, OR, USA, 24 June 2011. Association for Computational Linguistics. [https://www.aclweb.org/anthology/W11-1501/]
Martin Schmitt and Hinrich Schütze (2021) Language Models for Lexical Inference in Context. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics.
Daniel van Strien, Kaspar Beelen, Mariona Coll Ardanuy, Kasra Hosseini, Barbara McGillivray, and Giovanni Colavizza. (2020) Assessing the impact of OCR quality on downstream NLP tasks. In Proceedings of ICAART (1), pages 484–496. [SEM 1]

Low-resource languages

Abdaoui, Amine, Mohamed Berrimi, Mourad Oussalah and Abdelouahab Moussaoui. DziriBERT: a Pre-trained Language Model for the Algerian Dialect. ArXiv, 2021. [SEM 3]
Jan Christian Blaise Bombio Cruz and Charibeth Cheng. Evaluating Language Model Finetuning Techniques for Low-resource Languages. ArXiv, 2019. [SEM 3]
Gupta, Rahul and Sahu, Saurabh and Espy-Wilson, Carol and Narayanan, Shrikanth. Semi-supervised and Transfer learning approaches for low resource sentiment classification In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5109–5113. [SEM 2]
Michael A.Hedderich, Lukas Lange, Heike Adel, Jannik Strötgen, and Dietrich Klakow.A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios. In In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies., pp. 2545–2568. [SEM 1]
Katharina Kann, Kyunghyun Cho, and Samuel R. Bowman. Towards Realistic Practices In Low-Resource Natural Language Processing: The Development Set. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3342–3349. [SEM 1]
Benjamin King. Practical Natural Language Processing for Low-Resource Languages. PhD thesis, University of Michigan. 2015. [SEM 1]
Xiuhong Li, Zhe Li, Jiabao Sheng, and Wushour Slamu. Low-Resource Text Classification via Cross-lingual Language Model Fine-tuning. In Proceedings of the 19th Chinese National Conference on Computational Linguistics , pp. 994–1005. [SEM 2]
Yang, Ziqing and Xu, Zihang and Cui, Yiming and Wang, Baoxin and Lin, Min and Wu, Dayong and Chen, Zhigang. CINO: A Chinese Minority Pre-trained Language Model. ArXiv. 2022. [SEM 3]
Zihan Wang, Karthikeyan K, Stephen Mayhew, and Dan Roth. Extending Multilingual BERT to Low-Resource Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 2649–2656.
Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight. (2016) Transfer Learning for Low-Resource Neural Machine Translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 1568-1575.

SEM 2

Neural networks and languages

Abdou, M., V. Ravishankar, A. Kulmizev, and A. Søgaard (2022). Word order does matter and shuffled language models know it. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 6907–6919.
Brouwer, H., F. Delogu, N. J. Venhuizen, and M. W. Crocker (2021). Neurobehavioral correlates of surprisal in language comprehension: A neurocomputational model. Frontiers in Psychology 12, 615538.
Chaves, R. P. (2020). What don’t rnn language models learn about filler-gap dependencies? Proceedings of the Society for Computation in Linguistics 3 (1), 20–30.
Church, K. (2011). A pendulum swung too far. Linguistic Issues in Language Technology — LiLT 6.
Gibson, E. and J. Thomas (1999). Memory limitations and structural forgetting: The perception of complex ungrammatical sentences as grammatical. Language and Cognitive Processes 14 (3), 225–248.
Gupta, A., G. Kvernadze, and V. Srikumar (2021). Bert & family eat word salad: Experiments with text understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, Volume 35, pp. 12946–12954.
Harmon, Z. and V. Kapatsinski (2021). A theory of repetition and retrieval in language production. Psychological Review 128 (6), 1112.
Jacobs, C. L. (2021). Quantifying context with and without statistical language models. Handbook of Cognitive Mathematics, 1–29.
Jaeger, T. F. and E. Buz (2017). Signal reduction and linguistic encoding. The handbook of psycholinguistics, 38–81. [SEM 3]
Lakretz, Y., S. Dehaene, and J.-R. King (2020). What limits our capacity to process nested long-range dependencies in sentence comprehension? Entropy 22 (4), 446. [SEM 3]
Linzen, T. (2019). What can linguistics and deep learning contribute to each other? response to Pater. Language 95 (1), e99–e108.
Linzen, T. and M. Baroni (2021). Syntactic structure from deep learning. Annual Review of Linguistics 7, 195–212. [SEM 1]
Marvin, R. and T. Linzen (2018). Targeted syntactic evaluation of language models. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, pp. 1192–1202. Association for Computational Linguistics. [SEM 2]
McCoy, R. T., R. Frank, and T. Linzen (2020). Does syntax need to grow on trees? Sources of hierarchical inductive bias in sequence-to-sequence networks. Transactions of the Association for Computational Linguistics 8, 125–140. [SEM 1]
McCoy, R. T., P. Smolensky, T. Linzen, J. Gao, and A. Celikyilmaz (2021).How much do language models copy from their training data? Evaluating linguistic novelty in text generation using raven. arXiv preprint arXiv:2111.09509 [SEM 2]
Niu, J. and G. Penn (2020). Grammaticality and language modelling. In Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, pp. 110–119.
Sinha, K., P. Parthasarathi, J. Pineau, and A. Williams (2020). Unnatural language inference. arXiv preprint arXiv:2101.00010 [SEM 3]
Van Schijndel, M. and T. Linzen (2018). Modeling garden path effects without explicit hierarchical syntax. In Proceedings of the 40th Annual Conference of the Cognitive Science Society, ed. T Rogers, M Rau, J Zhu, C Kalish, pp. 2603–8. Austin, TX: Cogn. Sci. Soc.
Van Schijndel, M. and T. Linzen (2021). Single-stage prediction models do not explain the magnitude of syntactic disambiguation difficulty. Cognitive science 45 (6), e12988.
Van Schijndel, M., A. Mueller, and T. Linzen (2020). Quantity doesn’t buy quality syntax with neural language models. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5831–5837. [SEM 1]
Warstadt, A., A. Singh, and S. R. Bowman (2019). Neural network acceptability judgments. Transactions of the Association for Computational Linguistics 7, 625–641. [SEM 2]
Wilcox, E. G., J. Gauthier, J. Hu, P. Qian, and R. Levy (2020). On the predictive power of neural language models for human real-time comprehension behavior. arXiv preprint arXiv:2006.01912

Groups	Members	Papers
Digging the past - Bea	Iliana	Sep 7: Piotrowski, Chap 3, 2012
	Matilde	Sep 7: Piotrowski, Chap 6, 2012
	Yaru	Sep 7: Van Strien et al., 2020
	Zhaorui	Sep 19: Piotrowski, Chap 7, 2012
	Ebba	Sep 19: Bollman, 2019
	Félicien	Sep 19: Hedderich et al., 2021
	(Bea)	Sep 27: Hammond et al., 2013
	Yahui	Sep 27: Hovy and Lavid, 2010
	Mathias	Sep 27: Peng et al. 2021
Low resource langauges - Meriem	Kristóf	Sep 7: Hedderich et al., 2021
	Hengyu	Sep 7: Kann et al., 2019
	Micaella	Sep 7: Chapter 2 from B.P King, 2015
	Jie	Sep 19: Zoph et al., 2016
	Aleksandra	Sep 19: Gupta et al., 2018
	Moa	Sep 19: Li et al., 2020
	Ingrid	Sep 27: Cruz and Chen, 2019
	Alex	Sep 27: Yang et al., 2022
	(Meriem)	Sep 27: Abdaoui et al., 2021
Neural networks and language - Johan	Maya	Sep 7: Linzen and Baroni, 2021
	Yiu Kei	Sep 7: Van Schijndel, Mueller and Linzen, 2020
	Ali	Sep 7: McCoy, Frank and Linzen, 2020
	Yini	Sep 19: Marvin and Linzen, 2018
	Björn	Sep 19: Warstadt, Singh and Bowman, 2019
	Nicole	Sep 19: McCoy et al., 2021
	Agnieszka	Sep 27: Lakretz, Dehaene and King, 2020
	Thea	Sep 27: Sinha et al., 2020
	(Johan)	Sep 27: Jaeger and Buz, 2017

Task	Deadline	Extra deadline
Choose your preferred topics	September 1, 13:00	-
Take home exam	September 14	November 11
Literature seminars	In class	October 21
Project proposal	October 6	November 4
Present project proposal	October 11	By agreement
First version of project report	December 13	January 13
Reviews on peer's project papers	December 22	January 20 (February 17)
Final seminar	January 12	By agreement
Final project report	January 13	February 17