(1) D?reach aona m?os deug roimhe sin…

“Just eleven months before that”. In my annotation guidelines I have blithely stated “Attributive numbers are N/N“, which is fine for aona, but less so for deug, which I am going to treat as N\N. And yet in tr? deug m?le it seems fair enough.

(2) Bha G?idhlig ga bruidhinn air feadh Alba anns an aona linn deug.

(3) Bha sin ann an naoi ceud deug, fichead ‘s a ceithir.

Years and centuries are interesting. In (2), anns an aona … deug means “in the tenth”, as opposed to the other examples where deug means “ten”. In (3) the heads look like ceud, fichead and ceithir, so each of these can be N too.

Different rules apply, however, for the personal numbers: aonar, dithis, tri?ir and so on because if they are not standing on their own, they are followed by a noun in the genitive, for example dithis chloinne (“two children”) where dithis is N and chloinne is N\N.

Resumptivity resumed

I said?(four years ago) that Gaelic doesn’t have resumptive pronouns. However, while scouring William Lamb’s?Scottish Gaelic for unusual uses of?agus, I found these examples, with the resumptive bit in bold.

  • sin an gille a shuidh C?it air (that is the boy who Kate sat upon) (do not try this at home)
  • sin an gille a tha a mh?thair bochd (that is the boy whose mother is ill)

Now, in dictionaries?air?in the first example is indeed treated as a pronoun, though for subcategorization purposes I prefer to treat it as a PP. The second case, a?as possessive pronoun, I’ve been treating as a pronoun, so on my own account what I said about Gaelic was wrong. It may of course be a determiner. The evidence for this off the top of my head is that unlike the small class of prenominal adjectives deagh, droch, s?r and so on,?the possessives?mo, do, a and so on can’t co-occur with the article?an or with?gach, and that unlike nouns in the genitive they go before the possessor rather than after the possessor. Pronoun or determiner, they have type N/N in categorial grammar.

Apparently there are resumptive pronouns in Irish, but I don’t have enough Irish to make sense of the literature I’ve seen on the subject, so I shall stop here.

Interrogative frequencies in DASG

One aspect of Gaelic I want to look at more closely is interrogatives. Just as all the wh- words in English (who, when, why, what, how) go to the front of the sentence, so do all the c- words in Gaelic and the word order in the rest of the sentence changes as well. This is not universal, however. In Chinese, one simply substitutes the word for ‘what’ in the ordinary sentence order, just as when we’re particularly surprised in English we might say “You ate what?”.

In order to see how they work exactly, we need example sentences, so I’ve been looking in?DASG. One easy first step is to look at frequencies in this table:

Interrogative Count English Observations
c? 9122 who noisy; lots of prefixes and parts of words
ciod 4587 what ?
cia 2363 how also cia mar?in older texts, cia fhad ‘how long’,?cia mh?r ‘how big’
d? 403 what also ‘God’
ciamar 273 how ?
c?it 182 where also genitive of cat meaning ‘cat’
carson 133 why ?
c?ite 90 where ?
cuin 59 when ?
cuine 15 when ?

These are the results of accent-insensitive searches as the older texts haven’t had their spelling modernized or made consistent. The results surprised me a great deal for a number of reasons. Firstly,?ciod?’what’, which I don’t recall seeing terribly often in the present day is the most numerous interrogative, mostly occurring in a single document, a history of Scotland. One of the very first words you learn in Gaelic is its modern counterpart?d?, which only has about 200 (judged by eye) instances as an interrogative in DASG. This is a similar number to?c?it(e), carson,?cuin(e), and?ciamar, ‘where’, ‘why’, ‘what’ and ‘how’. Secondly, the enormous number of hits for?cia?‘how’, which on a cursory inspection are often?exclamations, ‘how swift’, ‘how long’, ‘how horrible’ or an old spelling of?ciamar in addition to the more familiar?cia mheud ‘how many’.?Thirdly, nearly all of the instances of?d? meaning ‘what’ are from a single work,?Saoghal Bana-mharaiche, describing the Gaelic from the coast of Easter Ross.

I’ll leave you with a new meaning I’d never seen before for gu. This can be gu the?preposition, gu the subordinator (as in?gu bheil),?gu the aspect marker?or gu?the adverbializer, but?Gu d? tha thu? from DASG31,?Ugam agus bhuam, is clearly neither. As explained here, what is going on is this: the Gaelic for ‘what’ used to be?ciod e, like the Irish?cad ?, and over time this became?d?. Gu d? is a variant of this. It’s another one of those pesky multiword expressions.

[Edit 2015-01-03 to clarify reason for looking at interrogatives and add another meaning of?gu.]

DASG and the second comparative

If you haven’t come across?Dachaigh airson St?ras na G?idhlig/Digital Archive of Scottish Gaelic you should stop what reading this and go straight there.

Welcome back. It contains eight and a half million words and is a resource I keep coming back to. In my first investigation, I’m looking for the second comparative, which I had never seen before last weekend. Here’s an example:

Is feairrde na stamagan srubag dheth

(The stomachs are better for a wee drink in them.) It’s explained in Gillie’s?Elements of Scottish Gaelic Grammar, as differing from the normal comparative (“Xer”) in that it means “Xer by that” or “Xer because of that”. If you search for a word, DASG gives you a concordance so you can look at the local context of words.

Some second comparatives in DASG: feairrd, feairrde, misd, bigid, lughaid. An ambiguous word that might be a second comparative:?m?id. I look forward to a POS-tagged version of DASG.

Training a dependency parser on gdbank

A very quick note to say that I’ve trained maltparser, a dependency parser, with?the current gdbank sentences (a mere 1223 tokens spread across 70-odd sentences), the Universal POS tagging scheme and the current Universal-ish gdbank dependency annotation scheme, and then seen how it performed on an unseen test set of 8 sentences containing 276 tokens taken from an article in The Scotsman from a few years ago.

It got 196 (71%) of the heads right, 207 (75%) of the dependency types right, and both the head and the dependency were right in 187 (68%) of cases. My initial impressions is that the main problems are subordinators and my having mis-POS-tagged a few words, but there will be a confusion matrix soon.

MaltParser cheat mode

If you train MaltParser using the learnwo flowchart in place of learn, it does all the same things, except that it writes out the sentences as it reads them in.

This means that if you have, ahem, misformatted any of your input, you can see exactly which misformatting MaltParser is complaining about, because it will be in the first sentence that hasn’t been written to stdout.

Installing MaltParser on Mac OS X 10.6.8

MaltParser is a dependency parser and it’s available here:

If you try to run the ready-built jar under Mac OS X 10.6.8 and you haven’t updated to Java 1.7, you’ll get a major.minor version number error. However, if you simply edit references in the build.xml file to read 1.6, and type

ant dist

to build with ant, then it will whirr away for a bit and build fine.

Headline passive

I read the news today. To be precise, I’ve been looking at the BBC website’s news in Gaelic?and I’ve spotted a grammatical theme among a large proportion?of the headlines and standfirsts:

  • Fiosrachadh ga shireadh mu ghoid charbad phoilis?“information sought about the theft of a police car”
  • Ceathrar gan toirt far Beinn Nibheis?“Four people taken from the top of Ben Nevis”
  • Teaghlach de cheathrar gan toirt far Beinn Nibheis […]?(standfirst for the foregoing) “Family of four taken from the top of Ben Nevis”
  • Duine ga lorg air a’ Chliseam?“Person found on Clisham [mountain on Harris]”
  • Leasachadh Beinn Uais ga dhi?ltadh?“Ben Wyvis development turned down”

Here the aspect marker?ag?preceding a verbal noun has merged with the possessive pronoun that is the direct object of the direct noun in question (sireadh,?toirt,?lorg and?diultadh), leniting it if it’s?ga?masculine. Put a form of?bi at the front and you have a full sentence, but it need not be passive in that case. They could be, maybe absurdly:

  • Information seeks him about the theft of a police car
  • Four people take them from the top of Ben Nevis
  • Family of four take them from the top of Ben Nevis
  • Person finds him on Clisham?or Person finds it on Clisham
  • Ben Wyvis development turns him down

These have a look of machine translation about them, don’t they?