UPPSALA UNIVERSITET : Inst. f. lingvistik och filologi : STP
Uppsala universitet
Hoppa över länkar


Schedule
Content
Examination
Assignments
Reading


Language Technology: Research and Development

Note that this page has been migrated from a previous server. There is thus a risk that not all links work correctly

Credits: 15 hp
Syllabus: 5LN714
Staff
Course coordinator and examiner: Sara Stymne
Lectures: Sara Stymne
Group leaders: Meriem Beloucif, Beáta Megyesi, Johan Sjons

News

General information

This page contains the general information for the course Language Technology: Research and Development, autumn 2022. Course information is available here, and slides will be posted in the annotated schedule available here. We will also use the Studium system for handing in assignments, keeping track of your progress, posting Zoom links, and similar information.

Schedule

Preliminary schedule.
Date Time Room Content Reading
L1
30/8
10-12
Zoom and 22-1017
Introduction, Digging, NN & Language, Low-resource

L2
2/9
13-15
2-K1024
Science, research, and development Okasha, Cunningham, Lee
L2
5/9
8-10
22-1017
Science, research, and development Okasha, Cunningham, Lee
L3
6/9
10-12
Chomsky+Turing
Science, research, and development 2: debate session
Okasha, Hovy and Spruit
S1
7/9
15-17
2-0025 (lrl), 2-0026 (nnl), 2-0027 (dig)
Seminar - research papers

Lx
13/9
13-15
Ångström: Polhemssalen 10134
UPPMAX lecture

S2
19/9
13-15
2-0023 (lrl), 2-0028 (nnl), 2-0025 (dig)
Seminar - research papers

L4
23/9
8-10
22-1017
R&D projects - from proposal to implementation
Zobel 10-11, 13
S3
27/9
10-12
22-1008 (lrl), 2-K1072 (nnl), 22-0025 (dig)
Seminar - research papers

S4
11/10
13-16
Blåsenhus 12:232 (lrl), Blåsenhus 12:234 (nnl), Blåsenhus 13:132 (dig)
Seminar - project proposals

S5
26/10
10-12
2-0028 (lrl), 2-0026 (nnl), 2-0025 (dig)
Seminar - progress report

L5
3/11
10-12
Blåsenhus 21:136
Dissemination of research results
Zobel 1-9, 14
S6
9/11
10-12
22-1008 (lrl), 9-3068 (nnl), 22-1005 (dig)
Seminar - progress report, theme: ethics
Hovy and Spruit; Bender et al.
Lab
16/11
10-12
Chomsky+Turing
Latex tutorial
S8
23/11
10-12
22-1008 (lrl), 22-1005 (nnl), 9-3068 (dig)
Seminar - progress report
L6
29/11
10-12
Blåsenhus: Bertil Hammer (24:K104)
Review of scientific articles
Zobel 14
S9
7/12
10-12
22-1008 (lrl), 9-3068 (nnl), 22-1005 (dig)
Seminar - progress report
FS
12/1
8-16
7-0042, 7-0043
Final workshop - term paper presentations


12/1
16-
?
Social event?

All lectures will be given by Sara. The seminars will be led by the seminar leader for each research group. Note that attendance is obligatory at all seminars. The course is campus-based.

Content

The course gives a theoretical and practical introduction to research and development in language technology. The theoretical part covers basic philosophy of science, research methods in language technology, project planning, and writing and reviewing of scientific papers. The practical part consists of a small project within a research area common to a subgroup of course participants, including a state-of-the-art survey in a reading group, the planning and implementation of a research task, and the writing of a paper according to the standards for scientific publications in language technology. The research areas, with teachers, for 2022 are:
  1. Digging the past: Digital Philology and the Analysis of Historical Sources (dig) - Beáta Megyesi
  2. Low-resource languages (lrl) - Meriem Beloucif
  3. Neural networks and language (nnl) - Johan Sjons

Examination

The course is examined by means of five assignments with different weights (see below). In order to pass the course, a student must pass each of one of these. In order to pass the course with distinction, a student must pass at least 50% of the weighted graded assignments with distinction.

Assignments

  1. Take home exam on philosophy of science (15%)
    • This assignment will be based on your reading of Okasha's book. You will be asked to discuss issues in the philosophy of science and (sometimes) relate them to the area of language technology. The questions will be handed out September 8, and the report should be handed in September 14.
  2. Research paper presentation and discussion (15%)
    • You will present one of the papers discussed in the seminars. The task is to introduce the paper and lead the discussion, not to make a formal presentation (briefly summarize the paper (~2 min), discuss the main points being made, bring up difficult to understand parts, initiate a discussion by proposing themes to discuss). In addition you shall take active part in the discussion of all other papers discussed in the seminars. The seminars have obligatory attendance; if you miss a seminar, you have to write a short report instead. This assignment is not graded and does not qualify for distinction.
  3. Project proposal (15%)
    • You will put together a research proposal consisting of two parts, using an adapted version of the Swedish Research Council's guidelines for research plans. The major part is a 3-page scientific proposal describing the project you are going to work on for the rest of the course. In addition you should write a short popular science abstract describing your proposal in such a way that it is accessible to the general public, consisting of maximum 2000 characters. See further instructions.
      You will also give a short presentation of the proposal in a seminar (8 minutes with slides, plus time for questions and discussions). The deadline for the written proposal is October 6, and the seminars will take place October 11.
  4. Review of term papers (15%)
    • You will review two term papers written by your course mates. You will use a set of guidelines which will be specified later. You will receive the papers on December 14 and the reviews are due December 22.
  5. Term paper (40%)
    • You will report your project in a paper following the guidelines of Transactions of the Association for Computational Linguistics (except that the page limit for your papers is 4-7 pages + references). The term paper should be written as a standard computational linguistics paper in TACL, and contain research questions, an introduction motivating and describing the work, an overview of related work and a description of how it relates to your work, a description of your experiments, a presentation and analysis of the results, and a conclusion. The deadline is December 13 for the first version and January 13 for the revised final version. The first version should be a complete version of the project, without any missing sections or parts, which should then be revised, taking review comments into account, for the final version. On January 12, you should give an oral presentation of the paper. As part of your work on the project, it is obligatory to attend the progress report seminars as well as the final workshop. If you miss a seminar, you have to write a short report instead.

      The main grade for the term paper is based on the final report, but we also take the into account the first version, how well you incorporate the review comments, and your oral presentation into account.

Note that you will also practice writing and presenting for different audiences during the course. The final paper and presentation are targeted at experts in language technology (but not necessarily experts in your particular research theme). Your scientific proposal and presentation is targeted at academics, but not necessarily in language technology, but potentially in neighboring fields as well, such as linguistics or computer science (which typically make up reviewing boards at agencies such as the Swedish research council). Your popular science proposal abstract are targeted at the general public, and should not require any prior knowledge of language technolgy to understand.

All assignments you hand in during the course are individual. We welcome discussions between members of each theme (and between the themes as well), but the final reports you hand in should be your own work, written in your own words. You should use community standards for citing work that is related to your work. Note that you should always write about other work in your own words; changing a few words in each sentence from a paper you read is not acceptable. You should also remember to give credit to images (and only reproduce images if they are published under a permissive license like Creative Commons) and code. If you use or build on code by someone else, you should clearly state that in your report, and provide an appropriate citation and/or link to the code

Submitting and Reviewing Term papers

We plan on using a real conference management system for submission and review of papers. Detailed instructions will be provided.

Final Seminar/Workshop

The final seminar will be organized as a workshop with term paper presentations. The plan is for the final workshop to be on Campus only.

Detailed instructions will be provided before the seminar.

Research Groups

Your first task in the course is to make a wish for which research topic to work on. Send a ranked list of your preference for the three topics by email to Sara, at the latest Thursday September 1, at 13.00. You may indicate if your preference for your first choice is a very strong preference. If you fail to make a wish by this deadline you will be arbitrarily assigned to a topic. We will try our best to respect everyone's (strong) wishes, but if it turns out not to be possible, we will resort to random decisions.

Groups Members Papers
Digging the past - Bea IlianaSep 7: Piotrowski, Chap 3, 2012
MatildeSep 7: Piotrowski, Chap 6, 2012
YaruSep 7: Van Strien et al., 2020
ZhaoruiSep 19: Piotrowski, Chap 7, 2012
EbbaSep 19: Bollman, 2019
FélicienSep 19: Hedderich et al., 2021
(Bea)Sep 27: Hammond et al., 2013
YahuiSep 27: Hovy and Lavid, 2010
MathiasSep 27: Peng et al. 2021
Low resource langauges - Meriem KristófSep 7: Hedderich et al., 2021
HengyuSep 7: Kann et al., 2019
MicaellaSep 7: Chapter 2 from B.P King, 2015
JieSep 19: Zoph et al., 2016
AleksandraSep 19: Gupta et al., 2018
MoaSep 19: Li et al., 2020
IngridSep 27: Cruz and Chen, 2019
AlexSep 27: Yang et al., 2022
(Meriem)Sep 27: Abdaoui et al., 2021
Neural networks and language - Johan MayaSep 7: Linzen and Baroni, 2021
Yiu KeiSep 7: Van Schijndel, Mueller and Linzen, 2020
AliSep 7: McCoy, Frank and Linzen, 2020
YiniSep 19: Marvin and Linzen, 2018
BjörnSep 19: Warstadt, Singh and Bowman, 2019
NicoleSep 19: McCoy et al., 2021
AgnieszkaSep 27: Lakretz, Dehaene and King, 2020
TheaSep 27: Sinha et al., 2020
(Johan)Sep 27: Jaeger and Buz, 2017

Computational resources

For those who need access to a cluster for their computational needs, you will get access to the Snowy cluster at UPPMAX.

In order to use the UPPMAX cluster, you will first have to apply for an account. You should then apply to two projects:

Information about using Snowy, is available here. Note that you login to Rackham. You can only run light jobs directly on Rackham (like copying files, looking at files, et.c.). In order to run heavy jobs, you need to write a Slurm script, and execute it on Snowy. See the UPPMAX SLURM user guide to learn more about it. Here is an example Slurm script, from last year's MT course.

This year we will also have a lecture with UPPMAX staff, including a visit to the server hall.

Deadlines

Here is a summary of all deadlines in the course.

TaskDeadlineExtra deadline
Choose your preferred topicsSeptember 1, 13:00-
Take home examSeptember 14November 11
Literature seminarsIn classOctober 21
Project proposalOctober 6November 4
Present project proposalOctober 11By agreement
First version of project reportDecember 13January 13
Reviews on peer's project papersDecember 22January 20 (February 17)
Final seminarJanuary 12By agreement
Final project reportJanuary 13February 17

All deadlines are at 23.59 on the respective date unless otherwise noted.

Note that it is important for you to finish the course on time, since it is a requirement for starting your master thesis. So try to avoid resorting to the backup deadlines, since it will likely mean you cannot finish the course on time!

Reading

Science and Research

Digging the past

Low-resource languages

Neural networks and languages