Gold-Standard Tags Versus Predicted Tags

This document describes one of the possible task for the grade Pass With Distinction (VG).

In the training and testing data that you have been using up to now, the part-of-speech (POS) tags are gold-standard tags, in the sense that they were assigned manually. In this task you will be exploring what happens in the more realistic scenario where the tags are assigned automatically.

Your specific task is to produce alternative versions of the training and testing data where the gold-standard POS tags have been replaced with automatically-assigned tags. To obtain these, you can use Hunpos, a state-of-the-art part-of-speech tagger. Proceed as follows:

In order to succeed with these tasks, you may be tempted to write some code that can modify CoNLL files. However, all manipulations can also be done using standard Unix commands such as cut and paste.