An editor for dependency treebanks

I was pleased to meet Johannes Heinecke at the International Congress of Celtic Studies in Bangor last week. As well as producing a dependency treebank for Welsh, he has written a rather smart editor for CoNLL-U files, which are pretty much the standard these days for dependency trees.

Screengrab of Johannes Heinecke's CoNLL-U editor. The tree is for the sentence "Cuir d' ainm ri seo."

I managed to get it working this morning on a Mac running Mac OS Mojave 10.14.6 with a minimum of hassle. You will need Java, Apache Maven, and Homebrew in order to install wget. One small surprise is that if you edit a file in a git repository then by default every time you edit the tree, the new file is committed, which makes the commit history look a bit busy.

The second best bit is that you can see non-projective relations at a glance, which I certainly can’t do in emacs.

The best bit, as someone who recently wrote a paper where all the arrows in the dependency diagrams pointed the wrong way and didn’t notice until the referees pointed it out, is that there is a wee button you can click on to get a tikz version of the tree for pasting into LaTeX.

Fronting

Unless I indicate otherwise, all these examples are taken from Gareth King’s Intermediate Welsh (London: Routledge, 1996). The analyses are mine, as are the errors.

I don’t think I ever mastered the word mai, and reading up on it, I think it’s because I never mastered changes of word order. The verb doesn’t have to go first in the sentence. Take the title of Menna Elfyn’s Ibsen translation Y Fenyw Ddaeth o’r M?r, where the NP, ‘the woman’, comes before the dependent form of the verb, ddaeth not daeth. The opening stage directions have lots of PPs before the independent form of the verb, like this:

  • Ar y chwith mae feranda dan do llydan. ‘On the left there is a veranda under a broad roof.’
  • Yn y tu blaen, ac o gwmpas y t?, mae gardd. ‘In front, around the house, is a garden.’
  • Islaw’r feranda, mae polyn baner. ‘Below the veranda, there is a flagpole’.

and so on. Now ordinary subordinate clauses, which I did get the hang of, look like this:

Dw i'n meddwl          fod                  Ron yn dod yfory
------ --- --- -------------
S[n]/NP/S[sub] S[sub]/S[asp]/NP/NP NP S[asp]/NP

which is the same as a declarative clause, except it can be an argument to meddwl or credu or another verb of thinking, feeling and so on. But what if we’re emphasizing Ron? Then we have the word mai before Ron before the dependent form of mae, which in this case is sy. So how do we handle this? There is a back door in CCG which is the unary type-changing rule. It’s not the done thing, but if I gather examples of them hopefully someone who understands these things better can refactor the grammar into a cleverer shape. Here are three type-changing rules, which add a feature FRONTED:

  • S[dep]/NP ? S[dcl, +FRONTED]\NP (blocked for mae)
  • S[dep]/NP ? S[dcl, +FRONTED]\S[n]/NP (not blocked for mae). Example: Gwaethygu mae’r sefyllfa yn Ne Ewrop.
  • PP ? S[+FRONTED]/S. Example: Menna Elfyn’s scene setting above.

The idea here is that mai (and its South Walian counterpart taw) has the type S[sub]/S[dcl, +FRONTED], which is to say that it only takes a declarative clause if there’s something in front of the verb.

That feels as if I’ve learnt something.

Every one’s a clitic: a general treatment of one family of fused words in Welsh

I’ve been starting to look at Welsh through the lens of CCG, largely because if I did manage to learn how to use words like mai, sydd, sef and bod (as a conjunction) correctly in my youth I have forgotten now.

I have to know what’s going on in the simpler clauses that these words are joining together first, though. So far the analysis from Scottish Gaelic, for example, word order, verbal nouns being clauses of type S[n]/NP/NP or S[n]/NP and particles like yn or wedi being type-changers, carries through, partly because I made sure I read up on how people have treated the verbal noun in Welsh beforehand. However the example sentences I’ve been looking at have pronouns attached to clitic particles, hi’n, to articles, e’r and to possessive pronouns, fe’ch.

This needn’t be a problem for dependency grammars, where you can have as many edges coming out of a single node as you like, but it looks tricky for constituency parsers where you expect the sentence to be of the form VP NP, but part of the fused word is in the VP and part of it is in the NP. At this stage it would be very easy to decide to change the tokenization rules so that e and ‘r are separate words, but one thing CCG is good at is assigning categories, possibly baroque and frightening ones to words that reflect what the words do in a sentence.

Let’s take Rydyn nhw’n dod ‘they are coming’. dod is an intransitive verbal noun which I take to be S[n]/NP. Rydyn is the independent verb ‘to be’, present tense, third person, and expects an NP for the subject and either an adjectival phrase or an aspectual phrase. I’ve written this as S[dcl]/S[asp]/NP/NP. On their own, nhw ‘they’ and yn (aspect marker) are NP and S[asp]/NP/S[n]/NP respectively. But what are they when combined? The way to answer this is to treat parsing the sentence as a mathematical puzzle. We know the solution is S[dcl], and at each stage of the proof we are allowed one of the allowed moves in CCG, application, substitution, type-raising or composition, and then we solve for Q in the below. I had a hunch that backwards crossed composition combined with type-raising would be the way to go here. Let’s try type-raising dod first. We want a backslash so we can try backwards crossed composition, Y/Z X\Y -> X/Z

Rydyn               nhw'n              dod
S[dcl]/S[asp]/NP/NP Q S[n]/NP
------------T
D\D/S[n]/NP
(try D = S[asp]/NP)
S[asp]/NP\S[asp]/NP/S[n]/NP

So, X = S[asp]/NP and Y = S[asp]/NP/S[n]/NP. Q = Y/Z. We know that X/Z = S[asp]/NP/NP, so…

                  Y/Z
S[asp]/NP/S[n]/NP/NP S[asp]/NP\S[asp]/NP/S[n]/NP
-------------------------------------------<Bx
S[asp]/NP/NP
-------------------------------->
S[dcl] ?

The first thing I want to observe is that this would be clearer if everything were coloured in. The second thing is that the type of nhw’n, if you take the type of nhw to be A and ‘n to be B, is B/A. This feels like the sort of result that is obvious to someone who is more proficient than me. But is it generalizable? Let’s try the simpler construction in Gwerthodd e’r oergell – ‘he sold the fridge’. Here e is NP and ‘r, the article is NP/N, and oergell, an indefinite fridge, is N, so A = NP, B = NP/N and B/A = NP/N/NP.

Gwerthodd      e'r     oergell
S[dcl]/NP/NP NP/N/NP N
---------T
try type-raising with NP
NP\NP/N
---------------<Bx
Y = NP/N, X = NP, Y = NP
NP/NP
------------------->
S[dcl] ?

I think that’s a result. Next up: look into Lambek’s product operator and sort out what’s going on with the eich… chi construction.