I’ve been starting to look at Welsh through the lens of CCG, largely because if I did manage to learn how to use words like mai, sydd, sef and bod (as a conjunction) correctly in my youth I have forgotten now.
I have to know what’s going on in the simpler clauses that these words are joining together first, though. So far the analysis from Scottish Gaelic, for example, word order, verbal nouns being clauses of type S[n]/NP/NP or S[n]/NP and particles like yn or wedi being type-changers, carries through, partly because I made sure I read up on how people have treated the verbal noun in Welsh beforehand. However the example sentences I’ve been looking at have pronouns attached to clitic particles, hi’n, to articles, e’r and to possessive pronouns, fe’ch.
This needn’t be a problem for dependency grammars, where you can have as many edges coming out of a single node as you like, but it looks tricky for constituency parsers where you expect the sentence to be of the form VP NP, but part of the fused word is in the VP and part of it is in the NP. At this stage it would be very easy to decide to change the tokenization rules so that e and ‘r are separate words, but one thing CCG is good at is assigning categories, possibly baroque and frightening ones to words that reflect what the words do in a sentence.
Let’s take Rydyn nhw’n dod ‘they are coming’. dod is an intransitive verbal noun which I take to be S[n]/NP. Rydyn is the independent verb ‘to be’, present tense, third person, and expects an NP for the subject and either an adjectival phrase or an aspectual phrase. I’ve written this as S[dcl]/S[asp]/NP/NP. On their own, nhw ‘they’ and yn (aspect marker) are NP and S[asp]/NP/S[n]/NP respectively. But what are they when combined? The way to answer this is to treat parsing the sentence as a mathematical puzzle. We know the solution is S[dcl], and at each stage of the proof we are allowed one of the allowed moves in CCG, application, substitution, type-raising or composition, and then we solve for Q in the below. I had a hunch that backwards crossed composition combined with type-raising would be the way to go here. Let’s try type-raising dod first. We want a backslash so we can try backwards crossed composition, Y/Z X\Y -> X/Z
Rydyn nhw'n dod
S[dcl]/S[asp]/NP/NP Q S[n]/NP
(try D = S[asp]/NP)
So, X = S[asp]/NP and Y = S[asp]/NP/S[n]/NP. Q = Y/Z. We know that X/Z = S[asp]/NP/NP, so…
The first thing I want to observe is that this would be clearer if everything were coloured in. The second thing is that the type of nhw’n, if you take the type of nhw to be A and ‘n to be B, is B/A. This feels like the sort of result that is obvious to someone who is more proficient than me. But is it generalizable? Let’s try the simpler construction in Gwerthodd e’r oergell – ‘he sold the fridge’. Here e is NP and ‘r, the article is NP/N, and oergell, an indefinite fridge, is N, so A = NP, B = NP/N and B/A = NP/N/NP.
Gwerthodd e'r oergell
S[dcl]/NP/NP NP/N/NP N
try type-raising with NP
Y = NP/N, X = NP, Y = NP
I think that’s a result. Next up: look into Lambek’s product operator and sort out what’s going on with the eich… chi construction.