Assignment 3: Cross-Lingual Dependency Parsing

In this assignment you will get to try out cross-lingual parsing with a neural parser. The parser you will use is uuparser, which is developed by the Uppsala parsing team, with Miryam de Lhoneux as the main developer. uuparser is similar to the Kiperwasser and Goldberg parser you read about in literature seminar 2. The variant you will use is the transition-based parser.

The lab is setup to be performed on our Linux system futurum. UUparser is installed there, as well as UD and all scripts needed. Paths mentioned in this description referes to locations on futurum.

Your task

The goal of this assignment is to see how parsing for a low-resource language (which you may simulate) can be aided by using data from another language as well. To start with, pick three languages and treebanks:

  • A target language treebank (TGT)
    This is the language that you are attempting to parse. It is meant to be treated as a low resource language, but it is OK to use a high resource language and simulate such a scenario by limiting the training data. It is helpful, but not required, to choose a target language that you know.
  • A transfer language treebank that you believe will be good (GTRF)
    This should be a language with more resources than your (simulated) low-resource target language, which you think might help in parsing the target language, for instance since it is (closely) related, shares some important lingusitic features, is a contact language, or some other reason.
  • A transfer language treebank that you do not believe will be good (or at least not as good as GTRF) (BTRF)
    This should be a language with more resources than your (simulated) low-resource target language, which you think might not help in parsing the target language, for instance since it is not related, have different important lingusitic features, maybe has a different script, or some other reason(s).

For cases where a language has more than one treebank, you can pick any that fulfils the criteria. If possible you can try to match genres, but that is not required, so focus more on language choice than treebank choice.

Make sure that your target language has at least 200 sentences in total in its training and development data, and that your transfer languages have at least 500 sentences training data each.

You should then run the following three experiments. Detailed instructions on commands and how to handle data is given below.

  1. Train a monolingual parsing model on 100 sentences from TGT and record the scores for the TGT development set recorded during training.
  2. few-shot transfer
    1. Train a multilingual model on 100 sentences from TGT and 500 sentences from GTRF and record the scores for the TGT development set recorded during training.
    2. Train a multilingual model on 100 sentences from TGT and 500 sentences from BTRF and record the scores for the TGT development set recorded during training.

Note that in all cases you are using a limited amount of data, in order to keep the run time of your experiments reasonable. In a real setting it is quite likely that you would use more data for the transfer languages. Also note that in this assignment, your two transfer languages do not actually need to have more data than the target language, but this can be simulated by not using all available data.

Motivate your choice of languages in the report.

Evaluation

You should evaluate your results in three parts. NOTE: below it previously said "3a and 3b" instead of "2a and 2b". This has now been corrected.
  1. Present the UAS and LAS scores for the TGT development set for the best iteration for each of your three systems. For 2a and 2b, give both the scores for the epoch with the best average development score of TGT and TRF, and the score for the epoch with the best TGT score.
  2. Draw learning curves of how the scores for the TGT development set, develop for each epoch for systems 1, 2a, and 2b. Preferably, draw all three curves in the same plot.
  3. Do a small qualitative evaluation where you compare the errors for a few sentences that have different parses across systems. (Depending on how many errors and differences between systems there are, how long your sentences are, and how detailed your discussion is, somewhere around 1-4 sentences is suitable.)
For all three parts, you should discuss your findings and if possible try to explain them.

For the qualitiative evaluation you may use MaltEval, which you have already used in the NLP course. Note, however, that you will need to convert to Conll-X format for it to work, which can be done with this script: /common/student/courses/parsing-5LN713/assign3/conllu_to_conllx.perl (Note that MaltEval is not installed, for instructions, see: The MaltEval webpage.)

Data

Use the Universal Dependencies data (UD) version 2.13 in this assignment. The data is availble on the Linux system at: /common/student/corpora/ud-treebanks-v2.13 You need to prepare your own directory for the three languages that you are interested in, and copy the relevant parts of the data there. Note that you should keep the naming convention, i.e. the folders for each language should have the same name as in the original structure (e.g. UD_Swedish-LinES), and the training and development files should also have the same names (e.g. sv_lines-ud-train.conllu and sv_lines-ud-dev.conllu), but their sizes should be modified. You do not need to copy the test files, or any additional files, since you will not use them.

For the TGT language, you should have a training set of 100 sentences and a development set of at least 100 sentences. If your language have both these sets, copy the first 100 sentences of the train set and keep the full development set. If your language does not have a development set, copy 100 sentences from the train set to your train set, and another 100 sentences to your development set.

For the two transfer languages, you should create train sets with 500 sentences. NOTE: you also need to have dev sets here, as for the TGT language. (this information was previously omitted).

To select the first N sentences from a Conllu-file, you can use the following script:
/common/student/courses/parsing-5LN713/assign3/select-n-conllu-sentences.perl N input-file output-file
Where input-file is the original Conllu-file you are reading from, output-file is the file you write to, and N is the number of sentences you want to copy.

Parser

The parser, uuparser, is available on the Linux computer system. You can run the parser using the command uuparser. Treebanks should be given using the ISO id, i.e. the short name for each treebank, which is used in the names of the files with data (for instance sv_lines or en_ewt). After running uuparser the first time, take some time to inspect its output and make sure you udnerstand what is what.

To train uuparser for a single language, use the command:
uuparser --outdir [results directory] --datadir [your directory containing UD directories with the structure UD\_\*\*/iso\_id-ud-train/dev.conllu] --include [treebank to train on denoted by its ISO id] --disable-rlmost --json-isos /common/student/courses/parsing-5LN713/assign3/ud2.13_iso.json

To run uuparser for multiple languages, use the command:
uuparser --outdir [results directory] --datadir [your directory containing UD directories with the structure UD\_\*\*/iso\_id-ud-train/dev.conllu] --disable-rlmost --include ["treebanks to train on denoted by their ISO id"] --multiling --json-isos /common/student/courses/parsing-5LN713/assign3/ud2.13_iso.json

Note that you need to have quotes around the treebanks when you have more than one treebank (e.g. "sv_lines en_ewt")!

Note, use different output directories for the different experiments, in order not to risk that the parser over-writes previous output which you may need. Also note that it takes some time to run these experiments. Each experiment should run for the default 30 epochs. Each epoch probably takes less than a minute for 100 sentences, and somewhere around 3-6 minutes for 500-600 sentences. Thus, take into account when planning your time that you will have to wait some time to get your results!

In all experiments, use the default settings for uuparser, except for the flag --disable-rlmost which disables what is called the extended feature set, i.e. information about children of items in the stack.

The current version of one of a specification files used by the parser is UD 2.2. Since we are using UD2.13, you need to add the following flag to all commands:
--json-isos /common/student/courses/parsing-5LN713/assign3/ud2.13_iso.json

For the G task, you are not required to test uuparser for another language than the language it is trained on. But if you have a need to do so for a VG task or for your proejct, the command is:
uuparser --predict --outdir [results directory] --modeldir [model directory in the form model_dir/TRF-iso-id] --datadir [directory of UD files with the structure UD\_\*\*/iso\_id-ud-train/dev.conllu] --multiling --include [TGT-iso-id:TRF-iso-id] --json-isos /common/student/courses/parsing-5LN713/assign3/ud2.13_iso.json

The parser automatically chooses the model with the best development score. The model directory should be the one specified when training the parser, followed by the treebank name. Check that this directory contains the model "barchybrid.model". Note that the include flag needs to specify the ISO of the treebank you want to test on and the ISO for the treebank that you trained the model on (or one of them if you trained on multiple treebanks), with a colon in between.

For Distinction (VG)

For distinction, you have to design and motivate a set of expriments where you investigate one of the following issues:

  • Try out other transfer languages than the two for the basic assignment. Make your choice in some principled way. Discuss the effects of this choice.
  • Vary the size of the training data for the target and/or transfer language, and discuss the effects.
  • If any of your languages have more than one treebank, explore the effect of using data from different treebanks, and possibly also of using different parts of the treebanks.
  • Experiment with using more than one transfer language at the time, by training models with three or more languages in them (multi-source models). Think carefully about the size of the data here.

It is also possible to earn a VG by exploring some other issue. If you want to do so, it is obligatory to contact Sara beforehand, in order to get your specific extension approved. If you pursue an individual idea without approval, no VG grade will be granted.

Note, that due to the time it takes to run experiments, you are not required to run many more experiments. Up to five experiments beyond the G task should normally be enough. Also, it is not required to use much more data than in the proposed experiments, for the same reason. Carefully design the experiments you run, though, in order to run experiments that makes sense, and that you can analyse in a useful way! In all cases you should discuss and motivate your experimental design.

Note that doing a VG task is no guarantee for earning a VG. The full assignment must also be performed with an overall high quality.

Report

Report by uploading a pdf report in Studium, describing your experiments and discussing your findings.

Deadline

2024-03-11 (This was a move from March 4, due to sutdent requests, since there was an erroneous dealdine on the assignment page. Note that it is sitll recommended to do the assignment by March 4 in order to have enough time for your projects. )

Problems

Please do not hesitate to contact Sara via email in case you encounter any problems or have any questions.

Enjoy!

History

This assignment was developed by Sara Stymne, 2020; updated 2024.