Call for Participation: 3rd Celtic Language Technology Workshop (@MT Summit 2019)

Workshop: 3rd Celtic Language Technology Workshop

Date: 19th August 2019

Location: Dublin, Ireland



We invite you to participate in the Third Celtic Language Technology Workshop, sponsored by Mozilla and the Irish Government Department of Culture, Heritage and the Gaeltacht.

  • invited talks by Claudia Soria, Italian National Research Council, “The Digital Language Survival Kit” &  Kelly Davis, Mozilla, “Common Voice” 
  • oral presentations on a range of technological advances and exploration for Celtic Languages (machine translation, treebanking, CALL, etc) 
  • a CLTW community discussion
  • social excursion and networking event

Please visit the workshop webpage for details on accepted papers:

The full programme will be announced soon.

This workshop is co-located with MT Summit 2019, registration is available on the conference website:

Workshop Organizers and Program Committee Chairs

Teresa Lynn, Dublin City University

Delyth Prys, University of Bangor

Colin Batchelor, Royal Society of Chemistry

Francis M. Tyers, Indiana University and Higher School of Economics

Posted in conferences | Leave a comment

An editor for dependency treebanks

I was pleased to meet Johannes Heinecke at the International Congress of Celtic Studies in Bangor last week. As well as producing a dependency treebank for Welsh, he has written a rather smart editor for CoNLL-U files, which are pretty much the standard these days for dependency trees.

Screengrab of Johannes Heinecke's CoNLL-U editor. The tree is for the sentence "Cuir d' ainm ri seo."

I managed to get it working this morning on a Mac running Mac OS Mojave 10.14.6 with a minimum of hassle. You will need Java, Apache Maven, and Homebrew in order to install wget. One small surprise is that if you edit a file in a git repository then by default every time you edit the tree, the new file is committed, which makes the commit history look a bit busy.

The second best bit is that you can see non-projective relations at a glance, which I certainly can’t do in emacs.

The best bit, as someone who recently wrote a paper where all the arrows in the dependency diagrams pointed the wrong way and didn’t notice until the referees pointed it out, is that there is a wee button you can click on to get a tikz version of the tree for pasting into LaTeX.

Posted in dependency parsing, not gaelic, other people's code, welsh | Leave a comment


Unless I indicate otherwise, all these examples are taken from Gareth King’s Intermediate Welsh (London: Routledge, 1996). The analyses are mine, as are the errors.

I don’t think I ever mastered the word mai, and reading up on it, I think it’s because I never mastered changes of word order. The verb doesn’t have to go first in the sentence. Take the title of Menna Elfyn’s Ibsen translation Y Fenyw Ddaeth o’r Môr, where the NP, ‘the woman’, comes before the dependent form of the verb, ddaeth not daeth. The opening stage directions have lots of PPs before the independent form of the verb, like this:

  • Ar y chwith mae feranda dan do llydan. ‘On the left there is a veranda under a broad roof.’
  • Yn y tu blaen, ac o gwmpas y tŷ, mae gardd. ‘In front, around the house, is a garden.’
  • Islaw’r feranda, mae polyn baner. ‘Below the veranda, there is a flagpole’.

and so on. Now ordinary subordinate clauses, which I did get the hang of, look like this:

Dw i'n meddwl          fod                  Ron yn dod yfory
------ --- --- -------------
S[n]/NP/S[sub] S[sub]/S[asp]/NP/NP NP S[asp]/NP

which is the same as a declarative clause, except it can be an argument to meddwl or credu or another verb of thinking, feeling and so on. But what if we’re emphasizing Ron? Then we have the word mai before Ron before the dependent form of mae, which in this case is sy. So how do we handle this? There is a back door in CCG which is the unary type-changing rule. It’s not the done thing, but if I gather examples of them hopefully someone who understands these things better can refactor the grammar into a cleverer shape. Here are three type-changing rules, which add a feature FRONTED:

  • S[dep]/NP → S[dcl, +FRONTED]\NP (blocked for mae)
  • S[dep]/NP → S[dcl, +FRONTED]\S[n]/NP (not blocked for mae). Example: Gwaethygu mae’r sefyllfa yn Ne Ewrop.
  • PP → S[+FRONTED]/S. Example: Menna Elfyn’s scene setting above.

The idea here is that mai (and its South Walian counterpart taw) has the type S[sub]/S[dcl, +FRONTED], which is to say that it only takes a declarative clause if there’s something in front of the verb.

That feels as if I’ve learnt something.

Posted in welsh | Leave a comment

Every one’s a clitic: a general treatment of one family of fused words in Welsh

I’ve been starting to look at Welsh through the lens of CCG, largely because if I did manage to learn how to use words like mai, sydd, sef and bod (as a conjunction) correctly in my youth I have forgotten now.

I have to know what’s going on in the simpler clauses that these words are joining together first, though. So far the analysis from Scottish Gaelic, for example, word order, verbal nouns being clauses of type S[n]/NP/NP or S[n]/NP and particles like yn or wedi being type-changers, carries through, partly because I made sure I read up on how people have treated the verbal noun in Welsh beforehand. However the example sentences I’ve been looking at have pronouns attached to clitic particles, hi’n, to articles, e’r and to possessive pronouns, fe’ch.

This needn’t be a problem for dependency grammars, where you can have as many edges coming out of a single node as you like, but it looks tricky for constituency parsers where you expect the sentence to be of the form VP NP, but part of the fused word is in the VP and part of it is in the NP. At this stage it would be very easy to decide to change the tokenization rules so that e and ‘r are separate words, but one thing CCG is good at is assigning categories, possibly baroque and frightening ones to words that reflect what the words do in a sentence.

Let’s take Rydyn nhw’n dod ‘they are coming’. dod is an intransitive verbal noun which I take to be S[n]/NP. Rydyn is the independent verb ‘to be’, present tense, third person, and expects an NP for the subject and either an adjectival phrase or an aspectual phrase. I’ve written this as S[dcl]/S[asp]/NP/NP. On their own, nhw ‘they’ and yn (aspect marker) are NP and S[asp]/NP/S[n]/NP respectively. But what are they when combined? The way to answer this is to treat parsing the sentence as a mathematical puzzle. We know the solution is S[dcl], and at each stage of the proof we are allowed one of the allowed moves in CCG, application, substitution, type-raising or composition, and then we solve for Q in the below. I had a hunch that backwards crossed composition combined with type-raising would be the way to go here. Let’s try type-raising dod first. We want a backslash so we can try backwards crossed composition, Y/Z X\Y -> X/Z

Rydyn               nhw'n              dod
S[dcl]/S[asp]/NP/NP Q S[n]/NP
(try D = S[asp]/NP)

So, X = S[asp]/NP and Y = S[asp]/NP/S[n]/NP. Q = Y/Z. We know that X/Z = S[asp]/NP/NP, so…

S[asp]/NP/S[n]/NP/NP S[asp]/NP\S[asp]/NP/S[n]/NP
S[dcl] ∎

The first thing I want to observe is that this would be clearer if everything were coloured in. The second thing is that the type of nhw’n, if you take the type of nhw to be A and ‘n to be B, is B/A. This feels like the sort of result that is obvious to someone who is more proficient than me. But is it generalizable? Let’s try the simpler construction in Gwerthodd e’r oergell – ‘he sold the fridge’. Here e is NP and ‘r, the article is NP/N, and oergell, an indefinite fridge, is N, so A = NP, B = NP/N and B/A = NP/N/NP.

Gwerthodd      e'r     oergell
S[dcl]/NP/NP NP/N/NP N
try type-raising with NP
Y = NP/N, X = NP, Y = NP
S[dcl] ∎

I think that’s a result. Next up: look into Lambek’s product operator and sort out what’s going on with the eich… chi construction.

Posted in welsh | Leave a comment

Geàrr Ghràmar na Gàidhlig

Tha mi air a bhith a’ leughadh Geàrr Ghràmar na Gàidhlig le Richard A. V. Cox. Tha e glè dhlùth, mhionaideach is 492 duilleagan a dh’fhaide is e anns a’ Ghàidhlig air fad. Mar sin tha sanas bhriathar ann is tha na teirmichean teicnigeach nas soilleire na anns a’ Bheurla. Dè tha apocope, syncope is aphaeresis a’ ciallachadh? Teasgadh deiridh, teasgadh meadhain is teasgadh toisich.

I have been reading Richard A. V. Cox’s Geàrr Ghràmar na Gàidhlig (‘Short Grammar of Gaelic’). It’s very dense, very detailed and 492 pages long, not to mention entirely in Gaelic. To this end there is a glossary of the technical vocabulary, which is generally easier to work out than the corresponding vocabulary in English: apocope, syncope and aphaeresis are teasgadh deiridh, teasgadh meadhain and teasgadh toisich.

Posted in grammar | Leave a comment


The immediate family members in Scottish Gaelic are màthair, athair, bràthair, all of which are clearly related to other familiar European languages, and piuthar, “sister”, which looks odd. Irish is yet odder at first glance, with deartháir meaning “brother” and deirfiúr meaning “sister”.

I’ve been reading David Stifter’s Sengoidelc, a readable and reassuring text about Old Irish, the written Irish of the 8th and 9th centuries, which contains at least part of the explanation. It turns out that in Old Irish there were two letters s. One of them lenited by turning into an h, a bit like in Gaelic becoming sh pronounced /h/, but the other one turned into an f, and the main word that began with that sort of an s was siur, meaning “sister”.

What seems to have happened in Scotland is that the nominative case form was back-derived from the lenited form phiur and assumed to be piur. Conversely in Ireland the nominative form won out, and they say siúr, but mainly, I think, for non-biological sisters, like nurses and nuns. A further difference here: Scotland retains the disyllabic form, whereas in Ireland it’s been simplified to a long vowel.

But why in Ireland do they say deartháir and deirfiúr for your biological siblings? Enter eDIL, the Electronic Dictionary of the Irish Language, which has entries for derbráthair and derbsiur, “true brother” and “true sister” respectively.

Posted in Uncategorized | Leave a comment

Second Celtic Language Technology Workshop revised deadline April the 20th

… which is next Wednesday rather than this Friday. Or if you’re in the UK or Ireland it’s very early next Thursday, but clearly nobody reading this would leave submission till the last moment. No.

Posted in conferences | Leave a comment


(1) Dìreach aona mìos deug roimhe sin…

“Just eleven months before that”. In my annotation guidelines I have blithely stated “Attributive numbers are N/N“, which is fine for aona, but less so for deug, which I am going to treat as N\N. And yet in trì deug mìle it seems fair enough.

(2) Bha Gàidhlig ga bruidhinn air feadh Alba anns an aona linn deug.

(3) Bha sin ann an naoi ceud deug, fichead ‘s a ceithir.

Years and centuries are interesting. In (2), anns an aona … deug means “in the tenth”, as opposed to the other examples where deug means “ten”. In (3) the heads look like ceud, fichead and ceithir, so each of these can be N too.

Different rules apply, however, for the personal numbers: aonar, dithis, triùir and so on because if they are not standing on their own, they are followed by a noun in the genitive, for example dithis chloinne (“two children”) where dithis is N and chloinne is N\N.

Posted in grammar | Leave a comment

Second Celtic Language Technology Workshop deadline April the 15th

I have partly been quiet here because I have been hard at work putting together something for this:
and clearly I should not prejudice the double-blindness of the refereeing too much. Ahem.

Posted in conferences | Leave a comment

Resumptivity resumed

I said (four years ago) that Gaelic doesn’t have resumptive pronouns. However, while scouring William Lamb’s Scottish Gaelic for unusual uses of agus, I found these examples, with the resumptive bit in bold.

  • sin an gille a shuidh Cèit air (that is the boy who Kate sat upon) (do not try this at home)
  • sin an gille a tha a mhàthair bochd (that is the boy whose mother is ill)

Now, in dictionaries air in the first example is indeed treated as a pronoun, though for subcategorization purposes I prefer to treat it as a PP. The second case, a as possessive pronoun, I’ve been treating as a pronoun, so on my own account what I said about Gaelic was wrong. It may of course be a determiner. The evidence for this off the top of my head is that unlike the small class of prenominal adjectives deagh, droch, sàr and so on, the possessives mo, do, a and so on can’t co-occur with the article an or with gach, and that unlike nouns in the genitive they go before the possessor rather than after the possessor. Pronoun or determiner, they have type N/N in categorial grammar.

Apparently there are resumptive pronouns in Irish, but I don’t have enough Irish to make sense of the literature I’ve seen on the subject, so I shall stop here.

Posted in grammar | 2 Comments