Language Technology: Research and Development
Note that this page has been migrated from a previous server. There is thus a risk that not all links work correctly
Credits: 15 hp
Syllabus: 5LN714
Staff
Course coordinator and examiner: Sara Stymne
Teachers: Sara Stymne, Beáta Megyesi, Paola Merlo
Assistant: Samuel Douglas
Alumni guests: Allison Adams, Luise Dürlich, Elena Fano
News
- 220107: A preliminary schedule for the final workshop is now available. Note that it is still subject to change, since a few more studnets may need to present by Zoom, which will lead to adjustments.
- 211018: You can now find information about how to use computational resources at Uppmax for your project.
- 210928: Student papers from previous years are now available in Studium
- 210902: Note that the second lecture will also be online only due to health reasons. The same Zoom link as for the first lecture will be used.
- 210830: Note that the first lecture will be online only due to health reasons. If you did not receive the Zoom link by email, please contact Sara!
- 210624: First tentative and preliminary version of the course web page.
General information
This page contains the general information for the course Language Technology: Research and Development, autumn 2021. Course information is available here, and slides will be posted in the annotated schedule available here. We will also use the Studium system for handing in assignments, keeping track of your progress, posting Zoom links, and similar information.Schedule
Preliminary schedule.
Date | Time | Room | Content | Reading | |
---|---|---|---|---|---|
L1 |
31/8 |
14-16 |
Zoom ( |
Introduction, Digging,Beyond, Xling |
|
L2 |
3/9 |
10-12 |
Zoom ( |
Science, research, and development | Okasha, Cunningham, Lee |
L3 |
7/9 |
14-16 |
16-0042 (limited access by Zoom) |
Science, research, and development 2: debate session |
Okasha, Hovy and Spruit |
S1 |
10/9 |
10-12 |
9-1016 (xling), Zoom (btb) |
Seminar - research papers |
|
S1 |
14/9 |
10-12 |
7-0017 (dig) |
Seminar - research papers |
|
S2 |
20/9 |
10-12 |
7-1020 (dig), 7-1013 (xling), Zoom (btb) |
Seminar - research papers |
|
L4 |
23/9 |
10-12 |
2-0076 |
R&D projects - from proposal to implementation |
Zobel 10-11, 13 |
S3 |
27/9 |
10-12 |
7-1013 (dig), Zoom (btb) |
Seminar - research papers | |
S3 |
28/9 |
10-12 |
9-1016 (xling) |
Seminar - research papers | |
L5 |
30/9 |
10-12 |
1-0062 |
Alumni lecture |
Zobel 10-11, 13 |
S4 |
13/10 |
9-12 |
7-1013 (dig), 6-0022 (xling), Zoom (btb) |
Seminar - project proposals |
|
S5 |
25/10 |
10-12 |
7-1013 (dig), Zoom (btb) |
Seminar - progress report | |
S5 |
26/10 |
13-15 |
2-0028 (xling) |
Seminar - progress report | |
L6 |
3/11 |
10-12 |
22-1017 |
Dissemination of research results, | Zobel 1-9, 14 |
S6 |
10/11 |
10-12 |
2-0026 (dig), Zoom (xling) |
Seminar - progress report, theme: ethics | Hovy and Spruit; Bender et al. |
S6 |
10/11 |
16-18 |
Zoom (btb) |
Seminar - progress report, theme: ethics | Hovy and Spruit; Bender et al. |
Lab |
23/11 |
14-16 |
Chomsky+Turing |
Latex tutorial | |
S8 |
24/11 |
10-12 |
2-0027 (dig), 2-0026 (xling) |
Seminar - progress report | |
S8 |
24/11 |
16-18 |
Zoom (btb) |
Seminar - progress report | |
L7 |
1/12 |
10-12 |
16-2043 |
Review of scientific articles |
Zobel 14 |
S9 |
8/12 |
10-12 |
Blåsenhus 21:237 (dig) |
Seminar - progress report | |
S9 |
8/12 |
16-18 |
Zoom (btb) |
Seminar - progress report | |
S9 |
9/12 |
10-12 |
9-1017 (xling) |
Seminar - progress report | |
FS |
13/1 |
8-16 |
6-1023 (Geigersalen), 7-0043 |
Final workshop - term paper presentations | |
TBA |
All lectures will be given by Sara. The seminars will be led by the seminar leader for each research group. Note that attendance is obligatory at all seminars.
Teaching mode, Covid-related information
The aim is that the course will mainly be campus-based, if the situation permits. This may change on very short notice, though. We aim for the majority of lectures to be held on Campus. If there is a need, due to special circumstances, we will also use Zoom during lectures. The seminar group led by Paola will mostly meet on Zoom. The other seminar groups will mainly meet on Campus when possible. We will avoid hybrid seminars, since that has not worked well in the past, which means that some other seminars may also be held online.For all seminars given remotely, we require that students have the camera turned on.
Please respect the current regulations and stay home if you are not feeling well, and maintain social distancing! Note that this also applies to teachers, so any Campus activities may be moved entirely online on short notice. It may also be the case that regulations change on short notice. Please always check Studium and your email before going to Campus!
This information will be continually updated throughout the term.
Content
The course gives a theoretical and practical introduction to research and development in language technology. The theoretical part covers basic philosophy of science, research methods in language technology, project planning, and writing and reviewing of scientific papers. The practical part consists of a small project within a research area common to a subgroup of course participants, including a state-of-the-art survey in a reading group, the planning and implementation of a research task, and the writing of a paper according to the standards for scientific publications in language technology. The research areas, with teachers, for 2021 are:- Digging the past: Digital Philology and the Analysis of Historical Sources (dig) - Beáta Megyesi
- Beyond the benchmarks: Linguistically-oriented analysis and generalisations in Neural Networks (btb) - Paola Merlo
- Cross-lingual natural language processing (xling) - Sara Stymne
Examination
The course is examined by means of five assignments with different weights (see below). In order to pass the course, a student must pass each of one of these. In order to pass the course with distinction, a student must pass at least 50% of the weighted graded assignments with distinction.Assignments
- Take home exam on philosophy of science (15%)
- This assignment will be based on your reading of Okasha's book. You will be asked to discuss issues in the philosophy of science and (sometimes) relate them to the area of language technology. The questions will be handed out September 8, and the report should be handed in September 16.
- Research paper presentation and discussion (15%)
- You will present one of the papers discussed in the seminars. The task is to introduce the paper and lead the discussion, not to make a formal presentation (briefly summarize the paper (~2 min), discuss the main points being made, bring up difficult to understand parts, initiate a discussion by proposing themes to discuss). In addition you shall take active part in the discussion of all other papers discussed in the seminars. The seminars have obligatory attendance; if you miss a seminar, you have to write a short report instead. This assignment is not graded and does not qualify for distinction.
- Project proposal (15%)
- You will put together a research proposal consisting of two parts, using an adapted version of the Swedish Research Council's guidelines for research plans. The major part is a 3-page scientific proposal describing the project you are going to work on for the rest of the course. In addition you should write a short popular science abstract describing your proposal in such a way that it is accessible to the general public, consisting of maximum 2000 characters. You will also give a short presentation of the proposal in a seminar (8 minutes with slides, plus time for questions and discussions). The deadline for the written proposal is October 8, and the seminars will take place October 13.
- Review of term papers (15%)
- You will review two term papers written by your course mates. You will use a set of guidelines which will be specified later. You will receive the papers on December 14 and the reviews are due December 22.
- Term paper (40%)
- You will report your project in a paper following the guidelines of
Transactions of the Association for Computational Linguistics (except that the page limit for your papers is 4-7 pages + references).
The deadline is December 13 for the first version and January 14 for the revised version. On January 13, you will also give an oral presentation of the paper. As part of your work on the project, it is obligatory to attend the progress report seminars as well as the final workshop. If you miss a seminar, you have to write a short report instead.
- You will report your project in a paper following the guidelines of
Transactions of the Association for Computational Linguistics (except that the page limit for your papers is 4-7 pages + references).
The deadline is December 13 for the first version and January 14 for the revised version. On January 13, you will also give an oral presentation of the paper. As part of your work on the project, it is obligatory to attend the progress report seminars as well as the final workshop. If you miss a seminar, you have to write a short report instead.
Note that you will also practice writing and presenting for different audiences during the course. The final paper and presentation are targeted at experts in language technology (but not necessarily experts in your particular research theme). You scientific proposal and presentation is targeted at academics, but not necessarily in language technology, but potentially in neighboring fields as well, such as linguistics or computer science (which typically make up reviewing boards as egencies such as the Swedish research council). Your popular science proposal abstract are targeted at the general public, and should not require any prior knowledge of language technolgy to understand.
All assignments you hand in during the course are individual. We welcome discussions between members of each theme (and between the themes as well), but the final reports you hand in should be your own work, written in your own words. You should use community standards for citing work that is related to your work. Note that you should always write about other work in your own words; changing a few words in each sentence from a paper you read is not acceptable. You should also remember to give credit to images (and only reproduce images if they are published under a permissive license like Creative Commons) and code. If you use or build on code by someone else, you should clearly state that in your reports.
Submitting and Reviewing Term papers
We will use EasyChair for submission and review of papers. Detailed instructions have been provided by email, and on lecture slides.Final Seminar/Workshop
The final seminar will be organized as a workshop with term paper presentations. The plan is for the final workshop to be on Campus only (given that restrictions allow). In case you are not able to attend on Campus (i.e. due to medical conditions, travel bans, et.c.) let Sara know beforehand, in order to arrange some Zoom talks in such cases. The time slot for each paper is 15 minutes, to be divided into 12 minutes presentation and 3 minutes discussion. The session chairs will enforce the times strictly.Research Groups
Your first task in the course is to make a wish for which research topic to work on. Send a ranked list of your preference for the three topics by email to Sara, at the latest Friday September 3, at 13.00. Please also specify if you prefer to have the seminars online or on campus (or if you are fine with either option). It is especially important that you let us know if you have an approved reason for following the teaching online (i.e. medical reasons or travel restriction stopping you from travelling to Uppsala). You may also indicate if your preference for your first choice is a very strong preference. If you fail to make a wish by this deadline you will be arbitrarily assigned to a topic. We will try our best to respect everyone's wishes, but if it turns out not to be possible, we will resort to random decisions. This applies to both topic choice and campus/online preference (unless you have an approved online reason).
Groups | Members | Papers |
Digging the past - Bea | Emma | Sep 14: Piotrowski, Chap 3, 2012 |
Jae Eun | Sep 14: Piotrowski, Chap 6, 2012 | |
Flor | Sep 14: Van Strien et al., 2020 | |
Claire | Sep 20: Piotrowski, Chap 7, 2012 | |
Ziming | Sep 20: Hedderich et al., 2021 | |
Martina | Sep 20: Hedderich et al., 2021 | |
Laura | Sep 20: Bollman, 2014 | |
Chen | Sep 27: Hammond et al., 2013 | |
Kai-Yan | Sep 27: Hovy and Lavid, 2010 | |
Nikolina | Sep 27: Peng et al. 2021 | |
Beyond the benchmarks - Paola | Paloma | Sep 10: Baroni, 2021 |
Eirini | Sep 10:Linzen and Baroni, 2020 | |
Klaudia | Sep 10:Linzen, 2021 | |
Jiayi | Sep 20:Futrell et al, 2019 | |
Viktorija | Sep 20:Kann et al. 2019 | |
Lingqing | Sep 20:Gulordava et al., 2018 | |
Eva | Sep 27:Rodriguez and Merlo, 2020 | |
Yongchao | Sep 27:Thrush et al., 2020 | |
Chuchu | Sep 27:Thrush et al., 2020 | |
Justyna | Sep 27:Wilcox et al., 2018 | |
Cross-lingual NLP - Sara | Kätriin | Sep 10: Yarowsky et al. 2001 |
Marek | Sep 10: Artexte et al. 2020 | |
James | Sep 10: Wu and Dredze, 2019 | |
Rafal | Sep 20: Smith et al. 2018 | |
Kris | Sep 20: Kondratyuk and Straka, 2019 | |
Yifan | Sep 20: Üstün et al. 2020 | |
Zhe | Sep 28: Turc et al., 2021 | |
Oreen | Sep 28: Chaudhary et al. 2019 | |
Angeliki | Sep 28: Lin et al. 2019 | |
Siyi | Sep 28: Lin et al. 2019 |
Computational resources
For those who need access to a cluster for their computational needs, you will get access to the Snowy cluster at UPPMAX.
In order to use the UPPMAX cluster, you will first have to apply for an account. You should then apply to two projects:
- UPPMAX 2021/2-13: This is a course project, and it gives you access to storage, under the folder
/proj/uppmax2021-2-13
. You can create a personal folder there, and store your data. Please remember that this is a shared space, though, so try and remove files no longer needed. - UPPMAX 2020/2-2: Use this account when you run your jobs, since members of it get priority in the queue.
Information about using Snowy, is available here. Note that you login to Rackham. You can only run light jobs directly on Rackham (like copying files, looking at files, et.c.). In order to run heavy jobs, you need to write a Slurm script, and execute it on Snowy. See the UPPMAX SLURM user guide to learn more about it. Here is an example Slurm script, from last year's MT course.
Deadlines
Here is a summary of all deadlines in the course.
Task | Deadline | Extra deadline |
Choose your preferred topics | September 3, 13:00 | - |
Take home exam | September 16 | November 12 |
Project proposal | October 8 | November 5 |
Present project proposal | October 13 | By agreement |
First version of project report | December 13 | January 14 |
Reviews on peer's project papers | December 22 | February 18 |
Final seminar | January 13 | By agreement |
Final project report | January 14 | February 18 |
All deadlines are at 23.59 on the respective date unless otherwise noted.
Note that it is important for you to finish the course on time, since it is a requirement for starting your master thesis. So try to avoid resorting to the backup deadlines, since it will likely mean you cannot finish the course on time!
Reading
Science and Research
- Okasha, S. (2002) Philosophy of Science: A Very Short Introduction. Oxford University Press. Chapters 1-3 and 5.Obligatory
- Zobel, J. (2004) Writing for Computer Science. Second Edition. Springer.
- Bender, E. M., Gebru, T., McMillan-Major, A. and Shmitchell, S. (2020) On the dangers of stochastic parrots. Can language models be too big? FAccT'21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency
- Cunningham, H. (1999) A definition and short history of Language Engineering. Natural Language Engineering 5 (1), 1-16.
- Hovy, D. and Spruit, S. L. (2016) The Social Impact of Natural Language Processing. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 591-598.
- Lee, L. (2004) "I'm sorry Dave, I'm afraid I can't do that": Linguistics, Statistics, and Natural Language Processing circa 2001. In Computer Science: Reflections on the Field, Reflections from the Field, 111-118.
Digging the past
- Gustavo Aguilar, Sudipta Kar, Thamar Solorio (2020) LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation. In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 1803–1813
- Yuri Bizzoni, Stefania Degaetano-Ortlieb, Peter Fankhauser, and Elke Teich (2020) Linguistic Variation and Change in 250 Years of English Scientific Writing: A Data-Driven Approach. Front. Artif. Intell. Marcel Bollmann (2011) POS Tagging for Historical Texts with Sparse Training Data. In Proceedings of the 7th Linguistic Annotation Workshop & Interoperability with Discourse, pages 11–18, Sofia, Bulgaria, August 8-9, 2013. Association for Computational Linguistics.
- Marcel Bollmann, Florian Petran, and Stephanie Dipper (2011) Rule-Based Normalization of Historical Texts. In Proceedings of the International Workshop on Language Technologies for Digital Humanities and Cultural Heritage, pages 34–42, Hissar, Bulgaria.
- Marcel Bollmann, Florian Petran, Stefanie Dipper, Julia Krasselt. (2014) CorA: A web-based annotation tool for historical and other non-standard language data. Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH) @ EACL 2014, pages 86–90, Gothenburg, Sweden, April 26 2014. [SEM 2]
- Claire Bowern (2020). Semantic change and Semantic Stability: Variation is Key. In Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change, pages 48–55 Florence, Italy, August 2, 2019. c 2019 Association for Computational Linguistics
- Julian Brooke, Adam Hammond and Graeme Hirst. (2015) GutenTag: An NLP-driven Tool for Digital Humanities Research in the Project Gutenberg Corpus. In Proceedings of NAACL-HLT Fourth Workshop on Computational Linguistics for Literature, pages 42–47.
- Mark Davies. 2012. Expanding Horizons in Historical Linguistics with the 400-Million Word Corpus of Historical American English. Corpora, 7(2):121–157.
- Rob van der Goot and Özlem Çetinoğlu (2021) Lexical Normalization for Code-switched Data and its Effect on POS Tagging. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics.
- Adam Hammond, Julian Brooke, and Graeme Hirst (2013). A Tale of Two Cultures: Bringing Literary Analysis and Computational Linguistics Together. In Proceedings of the Second Workshop on Computational Linguistics for Literature, pages 1–8. [SEM 3]
- Christian Hardmeier (2016) A Neural Model for Part-of-Speech Tagging in Historical Texts. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 922–931, Osaka, Japan, December 11-17 2016.
- Michael A. Hedderich, Lukas Lange Heike Adel, Jannik Strötgen & Dietrich Klakow (2021) A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2545–2568 June 6–11, 2021. [SEM 2]
- Mark J Hill and Simon Hengchen. 2019. Quantifying the impact of dirty OCR on historical text analysis: Eighteenth Century Collections Online as a case study. Digital Scholarship in the Humanities, 34(4):825–843.
- Eduard Hovy and Julia Lavid (2010). Towards a ‘Science’ of Corpus Annotation: A New Methodological Challenge for Corpus Linguistics. International Journal of Translation Vol. 22, No. 1, Jan-Jun 2010 [SEM 3]
- Gerhard Jäger (2018) Computational historical linguistics. De Gruyter
- Natalia Korchagina, (2017) Normalizing Medieval German Texts: from rules to deep learning. In Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language.
- Andrey Kutuzov, Lilja Øvrelid, Terrence Szymanski, and Erik Velldal. 2018. Diachronic word embeddings and semantic shifts: A survey. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1384–1397, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
- Xutan Peng, Yi Zheng, Chenghua Lin and Advaith Siddharthan. (2021) Summarising Historical Text in Modern Languages. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. [SEM 3]
- Eva Pettersson, Beáta Megyesi, and Joakim Nivre (2014) A Multilingual Evaluation of Three Spelling Normalisation Methods for Historical Text. In Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH) @ EACL 2014, pages 32–41, Gothenburg, Sweden, April 26 2014.
- Michael Piotrowski (2012) Natural Language Processing for Historical Texts, chapter 3: Spelling in Historical Texts (freely accessible in digital format when logged in at the department) [SEM 1]
- Michael Piotrowski (2012) Natural Language Processing for Historical Texts, chapter 6: Handling Spelling Variation (freely accessible in digital format when logged in at the department) [SEM 1]
- Michael Piotrowski (2012) Natural Language Processing for Historical Texts, chapter 7: NLP Tools for Historical Languages (freely accessible in digital format when logged in at the department) [SEM 2]
- Cristina Sánchez-Marco, Gemma Boleda, and Lluís Padró (2011) Extending the tool, or how to annotate historical language varieties. In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pages 1–9, Portland, OR, USA, 24 June 2011. Association for Computational Linguistics. [https://www.aclweb.org/anthology/W11-1501/]
- Martin Schmitt and Hinrich Schütze (2021) Language Models for Lexical Inference in Context. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics.
- Daniel van Strien, Kaspar Beelen, Mariona Coll Ardanuy, Kasra Hosseini, Barbara McGillivray, and Giovanni Colavizza. (2020) Assessing the impact of OCR quality on downstream NLP tasks. In Proceedings of ICAART (1), pages 484–496. [SEM 1]
Cross-Lingual NLP
- Wasi Uddin Ahmad, Zhisong Zhang, Xuezhe Ma, Eduard Hovy, Kai-Wei Chang, and Nanyun Peng. (2019) On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing. In arXiv preprint, arXiv:1811.00570v3
- Waleed Ammar, George Mulcaire, Miguel Ballesteros, Chris Dyer, and Noah Smith. (2016) Many languages, one parser.. In TACL, 4:431-444.
- Mikel Artetxe, Sebastian Ruder, Dani Yogatama, Gorka Labaka, and Eneko Agirre. (2020) A Call for More Rigor in Unsupervised Cross-lingual Learning. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics [SEM 1]
- Mikel Artetxe and Holger Schwenk. (2020) Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond. In TACL 7:597-610.
- Lauriane Aufrant. (2018) Training parsers for low-resourced languages: improving cross-lingual transfer with monolingual knowledge. PhD thesis, Paris Saclay.
- Aditi Chaudhary, Jiateng Xie, Zaid Sheikh, Graham Neubig, Jaime Carbonell. (2019) A Little Annotation does a Lot of Good: A Study in Bootstrapping Low-resource Named Entity Recognizers. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 5164-5174. [SEM 3]
- A. Conneau, G. Lample, L. Denoyer, MA. Ranzato, and H. Jégou. (2017) Word Translation Without Parallel Data. arXiv preprint arXiv:1710.04087
- Goran Glavaš, Robert Litschko, Sebastian Ruder, and Ivan Vulić. (2019) How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 710-721.
- Rob van der Goot, Nikola Ljubešić, Ian Matroos, Malvina Nissim, and Barbara Plank. (2018) Bleaching Text: Abstract Features for Cross-lingual Gender Prediction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 383-389.
- Jiang Guo, Wanxiang Che, Haifeng Wang, and Ting Liu. (2016) A universal framework for inductive transfer parsing across multi-typed treebanks.. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, 12-22.
- Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, and Jeffrey Dean. (2017) Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. In TACL, 5:339-351.
- Dan Kondratyuk, Milan Straka. (2019) 75 Languages, 1 Model: Parsing Universal Dependencies Universally. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2779-2795. [SEM 2]
- Miryam de Lhoneux, Johannes Bjerva, Isabelle Augenstein, and Anders Søgaard. (2018) Parameter sharing between dependency parsers for related languages. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 4992-4997.
- Yu-Hsiang Lin, Chian-Yu Chen, Jean Lee, Zirui Li, Yuyan Zhang, Mengzhou Xia, Shruti Rijhwani, Junxian He, Zhisong Zhang, Xuezhe Ma, Antonios Anastasopoulos, Patrick Littell, and Graham Neubig. (2019) Choosing Transfer Languages for Cross-Lingual Learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3125-3135.[SEM 3]
- Stephen Mayhew, Chen-Tse Tsai and Dan Roth. (2017) Cheap Translation for Cross-Lingual Named Entity Recognition. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2536-2545.
- Phoebe Mulcaire, Swabha Swayamdipta, and Noah A. Smith. (2018) Polyglot Semantic Role Labeling. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Short Papers), 667-672.
- Tahira Naseem, Regina Barzilay, and Amir Globerson. (2012) Selective sharing for multilingual dependency parsin. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, 629-637.
- Robert Östling and Jörg Tiedemann. (2017) Continuous multilinguality with language vectors. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, 644-649.
- Barbara Plank and Željko Agić. (2018) Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 614-620.
- Edoardo Maria Ponti, Roi Reichart, Anna Korhonen, and Ivan Vulič. (2018) Isomorphic Transfer of Syntactic Structures in Cross-Lingual NLP. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1531-1542.
- Aaron Smith, Bernd Bohnet, Miryam de Lhoneux, Joakim Nivre, Yan Shao, and Sara Stymne. (2018) 82 Treebanks, 34 Models: Universal Dependency Parsing with Multi-Treebank Model. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, 113-123.[SEM 2]
- Jörg Tiedemann. (2012) Character-Based Pivot Translation for Under-Resourced Languages and Domains. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, 141-151.
- Jörg Tiedemann. (2015) Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels. In Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015), 340--349.
- Iulia Turc, Kenton Lee, Jacob Eisenstein, Ming-Wei Chang, Kristina Toutanova (2021) Revisiting the Primacy of English in Zero-shot Cross-lingual Transfer. arXiv preprint arXiv:2106.16171 [SEM 3]
- Ahmet Üstün, Arianna Bisazza, Gosse Bouma, Gertjan van Noord (2020) UDapter: Language Adaptation for Truly Universal Dependency Parsing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2302-2315. [SEM 2]
- Shijie Wu and Mark Dredze. (2019) Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 833-844. [SEM 1]
- David Yarowsky, Grace Ngai, and Richard Wicentowski. (2001) Inducing multilingual text analysis tools via robust projection across aligned corpora. In Proceedings of the first international conference on Human language technology research, 1-8. [SEM 1]
- Daniel Zeman, Jan Hajič, Martin Popel, Martin Potthast, Milan Straka, Filip Ginter, Joakim Nivre, and Slav Petrov. (2018) CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, 1-22.
- Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight. (2016) Transfer Learning for Low-Resource Neural Machine Translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 1568-1575.
- Yanyan Zou and Wei Lu. (2018) Learning Cross-lingual Distributed Logical Representations for Semantic Parsing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 673-679.
Beyond the benchmarks
- Marco Baroni (2021) On the proper role of linguistically-oriented deep net analysis in linguistic theorizing. arXiv preprint arXiv: 2106.08694. [SEM 1]
- Samuel R. Bowman and George Dahl (2021), What Will it Take to Fix Benchmarking in Natural Language Understanding? Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
- M Finlayson, A Mueller, S Shieber, S Gehrmann, T Linzen, Y Belinkov, (2021), Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Models, ArXiv preprint arXiv:2106.06087.
- Richard Futrell, Ethan Wilcox, Takashi Morita, Miguel Ballesteros, Roger Levy. (2019) Neural Language Models as Psycholinguistic Subjects: Representation NAACL-HLT 2019 [SEM 2]
- Katharina Kann, Alex Warstadt, Adina Williams and Samuel R. Bowman (2019) Verb Argument Structure Alternations in Word and Sentence Embeddings. Proceedings of the Society for Computation in Linguistics (SCiL) 2019. [SEM 2]
- Jennifer Hu, Jon Gauthier, Ethan Wilcox, Peng Qian, and Roger Levy. A systematic assessment of syntactic generalization in neural language models. ACL 2020
- Kristina Gulordava, Piotr Bojanowski, Edouard Grave,Tal Linzen and Marco Baroni. (2018). Colorless Green Recurrent Networks Dream Hierarchically. In Proceedings of NAACL, 1195-1205. [SEM 2]
- Beth Levin. (1993) English Verb Classes and Alternations: A Preliminary Investigation. University of Chicago Press.
- Tal Linzen and Marco Baroni. (2020) Syntactic Structure from Deep Learning. Annual Reviews of Linguistics, Vol. 6. [SEM 1]
- Tal Linzen (2020) How Can We Accelerate Progress Towards Human-like Linguistic Generalization? ACL 2020 [SEM 1]
- Paola Merlo. (2019). Probing Word and Sentence Embeddings for Long-Distance Dependencies Effects in French and English. In Proceedings of the Second Blackbox NLP Workshop on Analyzing and Interpreting Neural Networks for NLP, 158-172
- Paola Merlo and Maria A. Rodriguez, Cross-Lingual Word Embeddings and the Structure of the Human Bilingual Lexicon CoNLL 2019
- Paola Merlo and Suzanne Stevenson. (2001). Automatic Verb Classification Based on Statistical Distribution of Argument Structure. Computational Linguistics 27(3), 373-408.
- Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Jan Hajic, Christopher D. Manning, Sampo Pyysalo, Sebastian Schuster, Francis Tyers and Daniel Zeman. (2020). Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection. In Proceedings of LREC.
- Shauli Ravfogel, Grusha Prasad, Tal Linzen, Yoav Goldberg (2021), Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction, ArXiv preprint arXiv:2105.06965.
- Luigi Rizzi. (2013). Locality. Lingua 130, 169-186
- Rodriguez and Merlo, Word associations and the distance properties of context-aware word embeddings, CoNLL (2020). [SEM 3]
- Tristan Thrush, Ethan Wilcox and Roger Levy (2020) Investigating Novel Verb Learning in BERT: Selectional Preference Classes and Alternation-Based Syntactic Generalization, Blackbox NLP. [SEM 3]
- Ethan Wilcox, Roger Levy, Takashi Morita and Richard Futrell. (2018). What do RNN Language Models Learn about the Filler-Gap Dependency? Proceedings of Blackbox NLP at EMNLP 2018, 211-221. [SEM 3]
- Ethan Wilcox, Peng Qian, Richard Futrell, Ryosuke Kohita, Roger P. Levy, and Miguel Ballesteros. (2020) Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models. EMNLP 2020
- Ethan Wilcox, Jon Gauthier, Jennifer Hu, Peng Qian and Roger Levy. (2020). On the Predictive Power of Neural Language Models for Human Real-Time Comprehension Behavior. CogSci 2020.