Word-based SMT -- Part 1

This document describes the tasks you need to perform for part 1 of lab 2. This lab is examined in class, on April 12, 9-12. Note that you have to attend the full session and be active in order to pass the lab. If you miss this session, you will instead have to write a report, see the bottom of this page.

The purpose of this lab is to gain insight in how a word-based SMT system works, and on the importance of the language model (LM) and translation model (TM). You will change the probabilities in these models by hand, to explore what happens. Normally you would not do this, but train the probabilities on a corpus. This is thus a bit artificial, but is intended to give you a better idea of how these models work.

The lab is setup to do translation both from English to Swedish and from Swedish to English. Try out both translation directions in the beginning! Later in the lab you may choose to focus on one translation direction. Focus on translation into a language you speak well, i.e. if you are Swedish or speak Swedish well, focus on translation into Swedish, otherwise, focus on translation into English. If you do not know Swedish, there is a grammar sketch of Swedish.

Take notes during the lab. During the last hour of the session, each group will get a few minutes each to talk about there experiments, what they did, what surprised them, what they learned, et.c. I will also ask each group to report their final score(s) with their modified models.

Slides shown during the lab.

1 - Familiarize yourself with the system and run with uniform probabilities

The translation system is described here. Listen to the brief description of it by your teacher and/or read the description of it. You will probably have to go back to the description during the lab as well!

Copy all the files needed for the lab:


mkdir lab2  
cd lab2
cp /local/kurs/mt/lab2/data/* .

The model files are given twice, in order for you to be able to keep a copy of the original files when you start modifying the files yourself.

In the given model files all probabilities are equal. This likely gives bad translations. Run the sample sentences through the translation system, and study the translation suggestions and also use the automatic evaluation to explore the overall results, and find out the average rank. Feel free to add some more sentences if you want to explore something you find interesting. The commands for running the decoder are:


# for translation from Swedish to English

# show the translation results:
/local/kurs/mt/lab2/simple_decoder/translate -lm lm.eng -tmw tmw.sweeng -tmf tmf.swe -o 2  -in test_meningar.swe 
# show the ranking of results:
/local/kurs/mt/lab2/simple_decoder/translate -lm lm.eng -tmw tmw.sweeng -tmf tmf.swe -o 2  -in test_meningar.swe -eval test_meningar.eng 


# for translation from English to Swedish

# show the translation results:
/local/kurs/mt/lab2/simple_decoder/translate -lm lm.swe -tmw tmw.engswe -tmf tmf.eng -o 2  -in test_meningar.eng 
# show the ranking of results:
/local/kurs/mt/lab2/simple_decoder/translate -lm lm.swe -tmw tmw.engswe -tmf tmf.eng -o 2  -in test_meningar.eng -eval test_meningar.swe

The following questions are some things worth thinking about, and discussing with your lab partner:

How often is the correct translation at the top and how often is it missing from the n-best list? If translations are missing from the n-best list, you could also try to increase its size, using the "-n" flag. (which changes how many hypothesis are shown by the decoder. The default is 20)
Is there a difference between the output quality in the two translation directions? If so, what do you think is the cause?
What types of problems do you see in the translations.
What do you think is causing these problems?

2 - Manipulate the translation models

In this task you should adjust the probabilities for fertilities and word translations, to achieve better translations. You should not set any probabilities to 0, and they should all be real probabilities, i.e. 0 < p ≤ 1. In some cases you could improve some translations at the cost of others. There are problems you cannot solve by only manipulating the translation model. In the end, try to choose changes that makes linguistic sense!

Start by changing one or a few things, and investigate what effect that has on the results. You may compare changes that seems reasonable given your linguistic intuition with seemingly "stupid" changes. An example of a linguistically motivated change is to have a higher probability for the translation of "en" and "ett" to "a" than "an", with the motivation that "a" is much more common in English than "an". Try modifying both the word translation model (tmw) and fertility model (tmf).

Here are some questions worth thinking about:

How good are your translations after manipulating the translation model, compared to using the uniform model?
What issues can you solve by only changing the TM?
What issues can you not solve by only changing the TM?

(For the TM to be a proper probability models it should contain proper probability distributions, i.e. the probabilities should sum to 1 for each word, that is the fertilities for each word should sum to one, and the word probabilities p(s|t) should sum to 1 for each t. The given models are correct in this respect, except for rounding (which is OK), i.e. 0.33*3=0.99, and not 1. You do not have to worry about this issue in the current lab, though.)

3 - Manipulate the language model

Go on to manipulate the language model as well, in order to further improving the translations. In the given files there are 1-grams and 2-grams with equal probabilities. Try to just adjust these first, and see if you can solve some problems. Again, you should not set any probabilities to 0, and they should all be real probabilities, i.e. 0 < p ≤ 1. You may also want to add n-grams that are missing, or remove ungrammatical n-grams. You might also want to add some 3-grams. If you do, remember to change the decoder flag for order to "-o 3" when you add 3-grams.

An easy way to improve the translations is to add a lot of 3-grams from the test sets. This is not all that meaningful, and would just result in over-fitting. Instead, try to make some principled changes based both on the test sentences, and on your knowledge of English/Swedish. Try to think about what makes the uniform model bad.

Here are some issues worth thinking about.

Which problems require 3-grams to solve?
Are there still problems that cannot be solved with 3-grams, that could possibly be solved with higher order n-grams? (No need to add higher order n-grams though, but save the thoughts for part 2!)

In addition to changing the probabilities for n-grams in the file, you may also change the backoff weight for unknown n-grams. A very simple backoff strategy is used in the decoder. If, for example a 3-gram is missing, it backs off to the 2-gram, but with a penalty that can be set on the command line. This penalty is simply multiplied by the 2-gram probability. If the 2-gram is missing too, it backs off to the 1-gram, and multiplies it by the penalty yet another time. The backoff penalty is set on the command line with the flag "-b WEIGHT", and the default values 0.01.

You may also modify the LM and TM in parallel, since changes made in one of them will affect what would be good changes in the other model.

4 - Wrapping up

At the end of the session you will get a new command, that will evaluate your final system on a new set of secret sentences from the same domain. If your changes are very specific to the known test set, this evaluation can be expected to be bad, whereas it should hopefully be good if your changes are general!

Commands for running the bilnd evalaution are found below! Make sure you point towards your own TM and LM files, that you set the correct n-gram order (normally 2 or 3) and that you set the backoff penalty if you changed that! If you only worked actively on one translation direction, only run this in that direction.


# for translation from Swedish to English
/local/kurs/mt/lab2/simple_decoder/translate -lm lm.eng -tmw tmw.sweeng -tmf tmf.swe -o N  -evalBlind sweeng

# for translation from English to Swedish
/local/kurs/mt/lab2/simple_decoder/translate -lm lm.swe -tmw tmw.engswe -tmf tmf.eng -o N  -evalBlind engswe

Then there will be a joint session where everyone shares their findings and discusses the issues of the lab. For this it is good to keep in mind that the main purpose of this lab is to learn more about how word based SMT models work, and the role of the LM and TM!

Each pair should:

Give the average ranking of you final system for at least one translation direction on the known and secret test sets
Present selected interesting issues, based on their experiments, findings and discussions (a couple of minutes per pair)
Be prepared to discuss the questions posed in the lab in the full group

Some general questions to think about:

How good are your final translations?
Can you get all correct translations in relatively high positions?
Was there some sentence that was harder than the others, and why?
Were there cases where you could get improvements for some sentences at the cost of making translations for other sentences worse?
Are there differences between how hard the two translation directions are, and what are the reasons for that?

Reporting

The lab is supposed to be examined in class, on April 12, 9-12. You need to be present and active during the whole session. If you failed to attend the oral session, you instead have to write a lab report. If both persons in the pair failed to attend, it should be included in the report for part 2 of the lab. If only one person failed to attend, it should be an individual separate report. Spend around 2 hours on experimenting with the weights in the TM and LM, before writing your report. In this report, give the score(s) for your final system(s) and compare it to the uniform system and discuss what you did, and what conclusions you can draw from your work. Also discuss at least a subset of the questions asked in the lab text. Your report should be around 2 A4-pages. The report should be handed in via the student portal as a pdf. The deadline is April 28, 2017.