Training a dependency parser on gdbank

A very quick note to say that I’ve trained maltparser, a dependency parser, with?the current gdbank sentences (a mere 1223 tokens spread across 70-odd sentences), the Universal POS tagging scheme and the current Universal-ish gdbank dependency annotation scheme, and then seen how it performed on an unseen test set of 8 sentences containing 276 tokens taken from an article in The Scotsman from a few years ago.

It got 196 (71%) of the heads right, 207 (75%) of the dependency types right, and both the head and the dependency were right in 187 (68%) of cases. My initial impressions is that the main problems are subordinators and my having mis-POS-tagged a few words, but there will be a confusion matrix soon.

MaltParser cheat mode

If you train MaltParser using the learnwo flowchart in place of learn, it does all the same things, except that it writes out the sentences as it reads them in.

This means that if you have, ahem, misformatted any of your input, you can see exactly which misformatting MaltParser is complaining about, because it will be in the first sentence that hasn’t been written to stdout.

Installing MaltParser on Mac OS X 10.6.8

MaltParser is a dependency parser and it’s available here: http://www.maltparser.org/download.html

If you try to run the ready-built jar under Mac OS X 10.6.8 and you haven’t updated to Java 1.7, you’ll get a major.minor version number error. However, if you simply edit references in the build.xml file to read 1.6, and type

ant dist

to build with ant, then it will whirr away for a bit and build fine.