Syntactic Analysis (5LN455): Assignment 4

In this assignment, you will learn how to use a state-of-the-art system for dependency parsing (MaltParser) and evaluate its performance.

The assignment is structured into smaller tasks; detailed instructions on each of these tasks are given below. These instructions also specify how to report your work on that task in the lab report.

Setup

To get started, go to the MaltParser website and download the latest release of the parser, either in tar.gz or in zip format. Then, follow the installation instructions. Once you have tested your installation and know that everything works, read the Start Using MaltParser section of the User Guide. In that section you will learn how to train a parsing model on a data set (the training data), and use that model to parse unseen data (the testing data).

Please note that the testing data serves two purposes at the same time: It contains the sentences (tagged with part-of-speech information) that the parser should assign dependency analyses to, and it also contains gold-standard analyses that you can use to evaluate the performance of the parser. These gold-standard analyses are not visible to the parser during parsing. (If they were, the parser could just assign the gold-standard analysis to the sentence and would receive perfect score.)

Task 1: Train a Baseline Model

Your first task is to train a useful parsing model on realistic data. The workflow for this is exactly the same as the one that you used in the setup phase. The only things that change are

Note that training a model with this data will take quite a bit longer than training the dummy model from the setup phase.

Reporting: Report the time it took to train and parse with the model, as well as the hardware configuration of the computer that you used for this experiment (processor type, amount of memory).

Task 2: Evaluate the Baseline Model

Now that you have trained a parsing model and used it to parse the testing data, your next task is to evaluate the performance of your system. The relevant evaluation measure for this is the labelled attachment score (LAS) of the parser’s output relative to the testing data.

You can read more about LAS in section 6.1 of the KMN book. You are free to implement either the word-based or the sentence-based LAS; but word-based LAS is easier to compute.

While there are several tools available for computing LAS, you are asked to implement your own evaluator. You can use any programming language you want; the only requirement is that the evaluator should be callable from the command line. It should accept exactly two arguments: the file with the gold-standard data, and the file with the system output. For example, you should be able to do something like this:

% ./eval swedish_dep_dev.conll out.conll
Total number of edges: 9339
Number of correct edges: 6678
LAS: 0.715066

In order to write the evaluator, you need to know about the format of the data files. Please see this page for detailed information on this.

Reporting: Include the code for your evaluator in the lab report (or send it by email in case the code is more than a page).

Task 3: Selecting a Good Parsing Algorithm

MaltParser supports several parsing algorithms; this is described in the Parsing Algorithm section of the User Guide. Your next task is to select the best algorithm for the data at hand, where the ‘best’ algorithm is the one that gives the highest LAS. To make this choice, for each algorithm you need to train a separate parsing model, use it to parse the testing data, and evaluate the performance of the parser as in Task 2. You can restrict your search to the algorithms described in the Nivre and Stack families.

Reporting: Report the LAS scores for all algorithms that you tried, and write down which algorithm you picked in the end.

Task 4: Feature Engineering

MaltParser processes a sentence from left to right, and at each point takes one of a small set of possible transitions. (Details will be presented in the lectures.) In order to predict the next action, MaltParser uses feature models. Up to now, you have been using the baseline feature model for whatever algorithm you were experimenting with. However, one can often do much better than that.

Your next task is to improve the feature model by exploiting the fact that the training and testing data contain morphological features such as case, tense, and definitiveness. These are specified in the FEATS column (column 6) of the CoNLL format. Here is an example:

7 hemmet _ NOUN NN NEU|SIN|DEF|NOM 6 PA _ _

This line specifies that the word hemmet has the grammatical gender neuter (NEU), the grammatical number singular (SIN), is marked as definite (DEF), and has the grammatical case nominative (NOM). This is useful information during parsing.

Read the Feature Model section of the user guide to find out how to extract the value of the FEATS column and split it into a set of atomic features using the delimiter | (pipe). Then, create a copy of the file that holds the feature model used by the algorithm that you selected in Task 3 and make the necessary modifications. Finally, train a new parsing model with the extended feature model, use it to parse the testing data, and evaluate its performance.

Note: If you are using an algorithm from the Nivre family, then you should extract the features for Input[0] and Stack[0]. If you are using an algorithm from the Stack family, then you should use Stack[0] and Stack[1].

Reporting: Write down the lines that you added to the baseline feature model, and how this affected the LAS of the parser relative to the score that you got for Task 3. (Note that you need to retrain the parser with the new feature model in order to see changes.)

Task 5: Gold-Standard Tags Versus Predicted Tags

You only need to work on this task in case you want to get the grade Pass With Distinction (VG).

In the training and testing data that you have been using up to now, the part-of-speech (POS) tags are gold-standard tags, in the sense that they were assigned manually. In this task you will be exploring what happens in the more realistic scenario where the tags are assigned automatically.

Your specific task is to produce alternative versions of the training and testing data where the gold-standard POS tags have been replaced with automatically-assigned tags. To obtain these, you can use Hunpos, a state-of-the-art part-of-speech tagger. Proceed as follows:

  • Download and install Hunpos on your computer.
  • Read the User Manual to learn how to use Hunpos.
  • Use the training data for the parser to produce training data for the tagger.
  • Train the tagger on this training data.
  • Use the trained tagger to tag the sentences in the parser data (training and testing).
  • Produce new parser data by replacing the gold standard tags in the original data with the automatic tags.
  • Re-train and re-evaluate the parser using the new data.

In order to succeed with these tasks, you may be tempted to write some code that can modify CoNLL files. However, all manipulations can also be done using standard Unix commands such as cut and paste.

Reporting: Report the LAS of the parser trained and tested on data with automatically-assigned tags. If you are feeling extra ambitious, you can experiment with mixed scenarios where you train the parser on gold-standard tags but test on automatically assigned tags. Describe the conclusions that you draw from your results.

Submission and Grading

The assignment will be graded based on your written lab report.

The submission deadline for this assignment is 15 January 2014.

Problems

Please do not hesitate to contact me as soon as possible either personally or via email in case you encounter any problems.

Enjoy!

History

This assignment was developed for 5LN455 by Marco Kuhlmann, 2011-2012.