Language Technology: Research and Development
Note that this page has been migrated from a previous server. There is thus a risk that not all links work correctly
Credits: 15 hp
Syllabus: 5LN714
Staff
Course coordinator and examiner: Sara Stymne
Lectures: Sara Stymne
Group leaders: Meriem Beloucif, Beáta Megyesi, Johan Sjons
News
- 230109: The final presentation schedule is available
- 221007. Note that there are a couple of schedule changes for November lectures
- 220627: First tentative and preliminary version of the course web page.
General information
This page contains the general information for the course Language Technology: Research and Development, autumn 2022. Course information is available here, and slides will be posted in the annotated schedule available here. We will also use the Studium system for handing in assignments, keeping track of your progress, posting Zoom links, and similar information.Schedule
Preliminary schedule.
Date | Time | Room | Content | Reading | |
---|---|---|---|---|---|
L1 |
30/8 |
10-12 |
Zoom and 22-1017 |
Introduction, Digging, NN & Language, Low-resource |
|
Science, research, and development | Okasha, Cunningham, Lee |
||||
L2 |
5/9 |
8-10 |
22-1017 |
Science, research, and development | Okasha, Cunningham, Lee |
L3 |
6/9 |
10-12 |
Chomsky+Turing |
Science, research, and development 2: debate session |
Okasha, Hovy and Spruit |
S1 |
7/9 |
15-17 |
2-0025 (lrl), 2-0026 (nnl), 2-0027 (dig) |
Seminar - research papers |
|
Lx |
13/9 |
13-15 |
Ångström: Polhemssalen 10134 |
UPPMAX lecture |
|
S2 |
19/9 |
13-15 |
2-0023 (lrl), 2-0028 (nnl), 2-0025 (dig) |
Seminar - research papers |
|
L4 |
23/9 |
8-10 |
22-1017 |
R&D projects - from proposal to implementation |
Zobel 10-11, 13 |
S3 |
27/9 |
10-12 |
22-1008 (lrl), 2-K1072 (nnl), 22-0025 (dig) |
Seminar - research papers | |
S4 |
11/10 |
13-16 |
Blåsenhus 12:232 (lrl), Blåsenhus 12:234 (nnl), Blåsenhus 13:132 (dig) |
Seminar - project proposals |
|
S5 |
26/10 |
10-12 |
2-0028 (lrl), 2-0026 (nnl), 2-0025 (dig) |
Seminar - progress report | |
L5 |
3/11 |
10-12 |
Blåsenhus 21:136 |
Dissemination of research results | Zobel 1-9, 14 |
S6 |
9/11 |
10-12 |
22-1008 (lrl), 9-3068 (nnl), 22-1005 (dig) |
Seminar - progress report, theme: ethics | Hovy and Spruit; Bender et al. |
Lab |
16/11 |
10-12 |
Chomsky+Turing |
Latex tutorial | |
S8 |
23/11 |
10-12 |
22-1008 (lrl), 22-1005 (nnl), 9-3068 (dig) |
Seminar - progress report | |
L6 |
29/11 |
10-12 |
Blåsenhus: Bertil Hammer (24:K104) |
Review of scientific articles |
Zobel 14 |
S9 |
7/12 |
10-12 |
22-1008 (lrl), 9-3068 (nnl), 22-1005 (dig) |
Seminar - progress report | |
FS |
12/1 |
8-16 |
7-0042, 7-0043 |
Final workshop - term paper presentations | |
12/1 |
16- |
? |
Social event? |
All lectures will be given by Sara. The seminars will be led by the seminar leader for each research group. Note that attendance is obligatory at all seminars. The course is campus-based.
Content
The course gives a theoretical and practical introduction to research and development in language technology. The theoretical part covers basic philosophy of science, research methods in language technology, project planning, and writing and reviewing of scientific papers. The practical part consists of a small project within a research area common to a subgroup of course participants, including a state-of-the-art survey in a reading group, the planning and implementation of a research task, and the writing of a paper according to the standards for scientific publications in language technology. The research areas, with teachers, for 2022 are:- Digging the past: Digital Philology and the Analysis of Historical Sources (dig) - Beáta Megyesi
- Low-resource languages (lrl) - Meriem Beloucif
- Neural networks and language (nnl) - Johan Sjons
Examination
The course is examined by means of five assignments with different weights (see below). In order to pass the course, a student must pass each of one of these. In order to pass the course with distinction, a student must pass at least 50% of the weighted graded assignments with distinction.Assignments
- Take home exam on philosophy of science (15%)
- This assignment will be based on your reading of Okasha's book. You will be asked to discuss issues in the philosophy of science and (sometimes) relate them to the area of language technology. The questions will be handed out September 8, and the report should be handed in September 14.
- Research paper presentation and discussion (15%)
- You will present one of the papers discussed in the seminars. The task is to introduce the paper and lead the discussion, not to make a formal presentation (briefly summarize the paper (~2 min), discuss the main points being made, bring up difficult to understand parts, initiate a discussion by proposing themes to discuss). In addition you shall take active part in the discussion of all other papers discussed in the seminars. The seminars have obligatory attendance; if you miss a seminar, you have to write a short report instead. This assignment is not graded and does not qualify for distinction.
- Project proposal (15%)
- You will put together a research proposal consisting of two parts, using an adapted version of the Swedish Research Council's guidelines for research plans. The major part is a 3-page scientific proposal describing
the project you are going to work on for the rest of the course. In addition you should write a short popular science abstract describing your proposal in such a way that it is accessible to the general public, consisting of maximum 2000 characters. See further instructions.
You will also give a short presentation of the proposal in a seminar (8 minutes with slides, plus time for questions and discussions). The deadline for the written proposal is October 6, and the seminars will take place October 11.
- You will put together a research proposal consisting of two parts, using an adapted version of the Swedish Research Council's guidelines for research plans. The major part is a 3-page scientific proposal describing
the project you are going to work on for the rest of the course. In addition you should write a short popular science abstract describing your proposal in such a way that it is accessible to the general public, consisting of maximum 2000 characters. See further instructions.
- Review of term papers (15%)
- You will review two term papers written by your course mates. You will use a set of guidelines which will be specified later. You will receive the papers on December 14 and the reviews are due December 22.
- Term paper (40%)
- You will report your project in a paper following the guidelines of
Transactions of the Association for Computational Linguistics (except that the page limit for your papers is 4-7 pages + references). The term paper should be written as a standard computational linguistics paper in TACL, and contain research questions, an introduction motivating and describing the work, an overview of related work and a description of how it relates to your work, a description of your experiments, a presentation and analysis of the results, and a conclusion.
The deadline is December 13 for the first version and January 13 for the revised final version. The first version should be a complete version of the project, without any missing sections or parts, which should then be revised, taking review comments into account, for the final version. On January 12, you should give an oral presentation of the paper. As part of your work on the project, it is obligatory to attend the progress report seminars as well as the final workshop. If you miss a seminar, you have to write a short report instead.
The main grade for the term paper is based on the final report, but we also take the into account the first version, how well you incorporate the review comments, and your oral presentation into account.
- You will report your project in a paper following the guidelines of
Transactions of the Association for Computational Linguistics (except that the page limit for your papers is 4-7 pages + references). The term paper should be written as a standard computational linguistics paper in TACL, and contain research questions, an introduction motivating and describing the work, an overview of related work and a description of how it relates to your work, a description of your experiments, a presentation and analysis of the results, and a conclusion.
The deadline is December 13 for the first version and January 13 for the revised final version. The first version should be a complete version of the project, without any missing sections or parts, which should then be revised, taking review comments into account, for the final version. On January 12, you should give an oral presentation of the paper. As part of your work on the project, it is obligatory to attend the progress report seminars as well as the final workshop. If you miss a seminar, you have to write a short report instead.
Note that you will also practice writing and presenting for different audiences during the course. The final paper and presentation are targeted at experts in language technology (but not necessarily experts in your particular research theme). Your scientific proposal and presentation is targeted at academics, but not necessarily in language technology, but potentially in neighboring fields as well, such as linguistics or computer science (which typically make up reviewing boards at agencies such as the Swedish research council). Your popular science proposal abstract are targeted at the general public, and should not require any prior knowledge of language technolgy to understand.
All assignments you hand in during the course are individual. We welcome discussions between members of each theme (and between the themes as well), but the final reports you hand in should be your own work, written in your own words. You should use community standards for citing work that is related to your work. Note that you should always write about other work in your own words; changing a few words in each sentence from a paper you read is not acceptable. You should also remember to give credit to images (and only reproduce images if they are published under a permissive license like Creative Commons) and code. If you use or build on code by someone else, you should clearly state that in your report, and provide an appropriate citation and/or link to the code
Submitting and Reviewing Term papers
We plan on using a real conference management system for submission and review of papers. Detailed instructions will be provided.Final Seminar/Workshop
The final seminar will be organized as a workshop with term paper presentations. The plan is for the final workshop to be on Campus only.Detailed instructions will be provided before the seminar.
Research Groups
Your first task in the course is to make a wish for which research topic to work on. Send a ranked list of your preference for the three topics by email to Sara, at the latest Thursday September 1, at 13.00. You may indicate if your preference for your first choice is a very strong preference. If you fail to make a wish by this deadline you will be arbitrarily assigned to a topic. We will try our best to respect everyone's (strong) wishes, but if it turns out not to be possible, we will resort to random decisions.
Groups | Members | Papers |
Digging the past - Bea | Iliana | Sep 7: Piotrowski, Chap 3, 2012 |
Matilde | Sep 7: Piotrowski, Chap 6, 2012 | |
Yaru | Sep 7: Van Strien et al., 2020 | |
Zhaorui | Sep 19: Piotrowski, Chap 7, 2012 | |
Ebba | Sep 19: Bollman, 2019 | |
Félicien | Sep 19: Hedderich et al., 2021 | |
(Bea) | Sep 27: Hammond et al., 2013 | |
Yahui | Sep 27: Hovy and Lavid, 2010 | |
Mathias | Sep 27: Peng et al. 2021 | |
Low resource langauges - Meriem | Kristóf | Sep 7: Hedderich et al., 2021 |
Hengyu | Sep 7: Kann et al., 2019 | |
Micaella | Sep 7: Chapter 2 from B.P King, 2015 | |
Jie | Sep 19: Zoph et al., 2016 | |
Aleksandra | Sep 19: Gupta et al., 2018 | |
Moa | Sep 19: Li et al., 2020 | |
Ingrid | Sep 27: Cruz and Chen, 2019 | |
Alex | Sep 27: Yang et al., 2022 | |
(Meriem) | Sep 27: Abdaoui et al., 2021 | |
Neural networks and language - Johan | Maya | Sep 7: Linzen and Baroni, 2021 |
Yiu Kei | Sep 7: Van Schijndel, Mueller and Linzen, 2020 | |
Ali | Sep 7: McCoy, Frank and Linzen, 2020 | |
Yini | Sep 19: Marvin and Linzen, 2018 | |
Björn | Sep 19: Warstadt, Singh and Bowman, 2019 | |
Nicole | Sep 19: McCoy et al., 2021 | |
Agnieszka | Sep 27: Lakretz, Dehaene and King, 2020 | |
Thea | Sep 27: Sinha et al., 2020 | |
(Johan) | Sep 27: Jaeger and Buz, 2017 |
Computational resources
For those who need access to a cluster for their computational needs, you will get access to the Snowy cluster at UPPMAX.
In order to use the UPPMAX cluster, you will first have to apply for an account. You should then apply to two projects:
- UPPMAX 2022/2-17: This is a course project, and it gives you access to storage, under the folder
/proj/uppmax2022-2-17
. You can create a personal folder there, and store your data. Please remember that this is a shared space, though, so try and remove files no longer needed. - UPPMAX 2020/2-2: Use this account when you run GPU jobs, since members of it get priority in the GPU queue. FOR CPU jobs, please use the course project.
Information about using Snowy, is available here. Note that you login to Rackham. You can only run light jobs directly on Rackham (like copying files, looking at files, et.c.). In order to run heavy jobs, you need to write a Slurm script, and execute it on Snowy. See the UPPMAX SLURM user guide to learn more about it. Here is an example Slurm script, from last year's MT course.
This year we will also have a lecture with UPPMAX staff, including a visit to the server hall.
Deadlines
Here is a summary of all deadlines in the course.
Task | Deadline | Extra deadline |
Choose your preferred topics | September 1, 13:00 | - |
Take home exam | September 14 | November 11 |
Literature seminars | In class | October 21 |
Project proposal | October 6 | November 4 |
Present project proposal | October 11 | By agreement |
First version of project report | December 13 | January 13 |
Reviews on peer's project papers | December 22 | January 20 (February 17) |
Final seminar | January 12 | By agreement |
Final project report | January 13 | February 17 |
All deadlines are at 23.59 on the respective date unless otherwise noted.
Note that it is important for you to finish the course on time, since it is a requirement for starting your master thesis. So try to avoid resorting to the backup deadlines, since it will likely mean you cannot finish the course on time!
Reading
Science and Research
- Okasha, S. (2002) Philosophy of Science: A Very Short Introduction. Oxford University Press. Obligatory
Either the first or second edition are possible to use. - Zobel, J. (2004) Writing for Computer Science. Second Edition. Springer.
- Bender, E. M., Gebru, T., McMillan-Major, A. and Shmitchell, S. (2020) On the dangers of stochastic parrots. Can language models be too big? FAccT'21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency
- Cunningham, H. (1999) A definition and short history of Language Engineering. Natural Language Engineering 5 (1), 1-16.
- Hovy, D. and Spruit, S. L. (2016) The Social Impact of Natural Language Processing. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 591-598.
- Lee, L. (2004) "I'm sorry Dave, I'm afraid I can't do that": Linguistics, Statistics, and Natural Language Processing circa 2001. In Computer Science: Reflections on the Field, Reflections from the Field, 111-118.
Digging the past
- Gustavo Aguilar, Sudipta Kar, Thamar Solorio (2020) LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation. In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 1803–1813
- Yuri Bizzoni, Stefania Degaetano-Ortlieb, Peter Fankhauser, and Elke Teich (2020) Linguistic Variation and Change in 250 Years of English Scientific Writing: A Data-Driven Approach. Front. Artif. Intell.
- Marcel Bollmann (2011) POS Tagging for Historical Texts with Sparse Training Data. In Proceedings of the 7th Linguistic Annotation Workshop & Interoperability with Discourse, pages 11–18, Sofia, Bulgaria, August 8-9, 2013. Association for Computational Linguistics.
- Marcel Bollmann, Florian Petran, and Stephanie Dipper (2011) Rule-Based Normalization of Historical Texts. In Proceedings of the International Workshop on Language Technologies for Digital Humanities and Cultural Heritage, pages 34–42, Hissar, Bulgaria.
- Marcel Bollmann, Florian Petran, Stefanie Dipper, Julia Krasselt. (2014) CorA: A web-based annotation tool for historical and other non-standard language data. Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH) @ EACL 2014, pages 86–90, Gothenburg, Sweden, April 26 2014.
- Marcel Bollmann. (2019) A Large-Scale Comparison of Historical Text Normalization Systems. In Proceedings of NAACL-HLT 2019, pages 3885–3898 Minneapolis, Minnesota, June 2 - June 7, 2019 [SEM 2]
- Claire Bowern (2020). Semantic change and Semantic Stability: Variation is Key. In Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change, pages 48–55 Florence, Italy, August 2, 2019. c 2019 Association for Computational Linguistics
- Julian Brooke, Adam Hammond and Graeme Hirst. (2015) GutenTag: An NLP-driven Tool for Digital Humanities Research in the Project Gutenberg Corpus. In Proceedings of NAACL-HLT Fourth Workshop on Computational Linguistics for Literature, pages 42–47.
- Mark Davies. 2012. Expanding Horizons in Historical Linguistics with the 400-Million Word Corpus of Historical American English. Corpora, 7(2):121–157.
- Rob van der Goot and Özlem Çetinoğlu (2021) Lexical Normalization for Code-switched Data and its Effect on POS Tagging. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics.
- Adam Hammond, Julian Brooke, and Graeme Hirst (2013). A Tale of Two Cultures: Bringing Literary Analysis and Computational Linguistics Together. In Proceedings of the Second Workshop on Computational Linguistics for Literature, pages 1–8. [SEM 3]
- Christian Hardmeier (2016) A Neural Model for Part-of-Speech Tagging in Historical Texts. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 922–931, Osaka, Japan, December 11-17 2016.
- Michael A. Hedderich, Lukas Lange Heike Adel, Jannik Strötgen & Dietrich Klakow (2021) A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2545–2568 June 6–11, 2021. [SEM 2]
- Mark J Hill and Simon Hengchen. 2019. Quantifying the impact of dirty OCR on historical text analysis: Eighteenth Century Collections Online as a case study. Digital Scholarship in the Humanities, 34(4):825–843.
- Eduard Hovy and Julia Lavid (2010). Towards a ‘Science’ of Corpus Annotation: A New Methodological Challenge for Corpus Linguistics. International Journal of Translation Vol. 22, No. 1, Jan-Jun 2010 [SEM 3]
- Gerhard Jäger (2018) Computational historical linguistics. De Gruyter
- Natalia Korchagina, (2017) Normalizing Medieval German Texts: from rules to deep learning. In Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language.
- Andrey Kutuzov, Lilja Øvrelid, Terrence Szymanski, and Erik Velldal. 2018. Diachronic word embeddings and semantic shifts: A survey. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1384–1397, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
- Xutan Peng, Yi Zheng, Chenghua Lin and Advaith Siddharthan. (2021) Summarising Historical Text in Modern Languages. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. [SEM 3]
- Eva Pettersson, Beáta Megyesi, and Joakim Nivre (2014) A Multilingual Evaluation of Three Spelling Normalisation Methods for Historical Text. In Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH) @ EACL 2014, pages 32–41, Gothenburg, Sweden, April 26 2014.
- Michael Piotrowski (2012) Natural Language Processing for Historical Texts, chapter 3: Spelling in Historical Texts (freely accessible in digital format when logged in at the department) [SEM 1]
- Michael Piotrowski (2012) Natural Language Processing for Historical Texts, chapter 6: Handling Spelling Variation (freely accessible in digital format when logged in at the department) [SEM 1]
- Michael Piotrowski (2012) Natural Language Processing for Historical Texts, chapter 7: NLP Tools for Historical Languages (freely accessible in digital format when logged in at the department) [SEM 2]
- Cristina Sánchez-Marco, Gemma Boleda, and Lluís Padró (2011) Extending the tool, or how to annotate historical language varieties. In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pages 1–9, Portland, OR, USA, 24 June 2011. Association for Computational Linguistics. [https://www.aclweb.org/anthology/W11-1501/]
- Martin Schmitt and Hinrich Schütze (2021) Language Models for Lexical Inference in Context. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics.
- Daniel van Strien, Kaspar Beelen, Mariona Coll Ardanuy, Kasra Hosseini, Barbara McGillivray, and Giovanni Colavizza. (2020) Assessing the impact of OCR quality on downstream NLP tasks. In Proceedings of ICAART (1), pages 484–496. [SEM 1]
Low-resource languages
- Abdaoui, Amine, Mohamed Berrimi, Mourad Oussalah and Abdelouahab Moussaoui. DziriBERT: a Pre-trained Language Model for the Algerian Dialect. ArXiv, 2021. [SEM 3]
- Jan Christian Blaise Bombio Cruz and Charibeth Cheng. Evaluating Language Model Finetuning Techniques for Low-resource Languages. ArXiv, 2019. [SEM 3]
- Gupta, Rahul and Sahu, Saurabh and Espy-Wilson, Carol and Narayanan, Shrikanth. Semi-supervised and Transfer learning approaches for low resource sentiment classification In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5109–5113. [SEM 2]
- Michael A.Hedderich, Lukas Lange, Heike Adel, Jannik Strötgen, and Dietrich Klakow.A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios. In In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies., pp. 2545–2568. [SEM 1]
- Katharina Kann, Kyunghyun Cho, and Samuel R. Bowman. Towards Realistic Practices In Low-Resource Natural Language Processing: The Development Set. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3342–3349. [SEM 1]
- Benjamin King. Practical Natural Language Processing for Low-Resource Languages. PhD thesis, University of Michigan. 2015. [SEM 1]
- Xiuhong Li, Zhe Li, Jiabao Sheng, and Wushour Slamu. Low-Resource Text Classification via Cross-lingual Language Model Fine-tuning. In Proceedings of the 19th Chinese National Conference on Computational Linguistics , pp. 994–1005. [SEM 2]
- Yang, Ziqing and Xu, Zihang and Cui, Yiming and Wang, Baoxin and Lin, Min and Wu, Dayong and Chen, Zhigang. CINO: A Chinese Minority Pre-trained Language Model. ArXiv. 2022. [SEM 3]
- Zihan Wang, Karthikeyan K, Stephen Mayhew, and Dan Roth. Extending Multilingual BERT to Low-Resource Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 2649–2656.
- Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight. (2016) Transfer Learning for Low-Resource Neural Machine Translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 1568-1575. [SEM 2]
Neural networks and languages
- Abdou, M., V. Ravishankar, A. Kulmizev, and A. Søgaard (2022). Word order does matter and shuffled language models know it. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 6907–6919.
- Brouwer, H., F. Delogu, N. J. Venhuizen, and M. W. Crocker (2021). Neurobehavioral correlates of surprisal in language comprehension: A neurocomputational model. Frontiers in Psychology 12, 615538.
- Chaves, R. P. (2020). What don’t rnn language models learn about filler-gap dependencies? Proceedings of the Society for Computation in Linguistics 3 (1), 20–30.
- Church, K. (2011). A pendulum swung too far. Linguistic Issues in Language Technology — LiLT 6.
- Gibson, E. and J. Thomas (1999). Memory limitations and structural forgetting: The perception of complex ungrammatical sentences as grammatical. Language and Cognitive Processes 14 (3), 225–248.
- Gupta, A., G. Kvernadze, and V. Srikumar (2021). Bert & family eat word salad: Experiments with text understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, Volume 35, pp. 12946–12954.
- Harmon, Z. and V. Kapatsinski (2021). A theory of repetition and retrieval in language production. Psychological Review 128 (6), 1112.
- Jacobs, C. L. (2021). Quantifying context with and without statistical language models. Handbook of Cognitive Mathematics, 1–29.
- Jaeger, T. F. and E. Buz (2017). Signal reduction and linguistic encoding. The handbook of psycholinguistics, 38–81. [SEM 3]
- Lakretz, Y., S. Dehaene, and J.-R. King (2020). What limits our capacity to process nested long-range dependencies in sentence comprehension? Entropy 22 (4), 446. [SEM 3]
- Linzen, T. (2019). What can linguistics and deep learning contribute to each other? response to Pater. Language 95 (1), e99–e108.
- Linzen, T. and M. Baroni (2021). Syntactic structure from deep learning. Annual Review of Linguistics 7, 195–212. [SEM 1]
- Marvin, R. and T. Linzen (2018). Targeted syntactic evaluation of language models. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, pp. 1192–1202. Association for Computational Linguistics. [SEM 2]
- McCoy, R. T., R. Frank, and T. Linzen (2020). Does syntax need to grow on trees? Sources of hierarchical inductive bias in sequence-to-sequence networks. Transactions of the Association for Computational Linguistics 8, 125–140. [SEM 1]
- McCoy, R. T., P. Smolensky, T. Linzen, J. Gao, and A. Celikyilmaz (2021).How much do language models copy from their training data? Evaluating linguistic novelty in text generation using raven. arXiv preprint arXiv:2111.09509 [SEM 2]
- Niu, J. and G. Penn (2020). Grammaticality and language modelling. In Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, pp. 110–119.
- Sinha, K., P. Parthasarathi, J. Pineau, and A. Williams (2020). Unnatural language inference. arXiv preprint arXiv:2101.00010 [SEM 3]
- Van Schijndel, M. and T. Linzen (2018). Modeling garden path effects without explicit hierarchical syntax. In Proceedings of the 40th Annual Conference of the Cognitive Science Society, ed. T Rogers, M Rau, J Zhu, C Kalish, pp. 2603–8. Austin, TX: Cogn. Sci. Soc.
- Van Schijndel, M. and T. Linzen (2021). Single-stage prediction models do not explain the magnitude of syntactic disambiguation difficulty. Cognitive science 45 (6), e12988.
- Van Schijndel, M., A. Mueller, and T. Linzen (2020). Quantity doesn’t buy quality syntax with neural language models. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5831–5837. [SEM 1]
- Warstadt, A., A. Singh, and S. R. Bowman (2019). Neural network acceptability judgments. Transactions of the Association for Computational Linguistics 7, 625–641. [SEM 2]
- Wilcox, E. G., J. Gauthier, J. Hu, P. Qian, and R. Levy (2020). On the predictive power of neural language models for human real-time comprehension behavior. arXiv preprint arXiv:2006.01912