piuthar

The immediate family members in Scottish Gaelic are màthair, athair, bràthair, all of which are clearly related to other familiar European languages, and piuthar, “sister”, which looks odd. Irish is yet odder at first glance, with deartháir meaning “brother” and deirfiúr meaning “sister”.

I’ve been reading David Stifter’s Sengoidelc, a readable and reassuring text about Old Irish, the written Irish of the 8th and 9th centuries, which contains at least part of the explanation. It turns out that in Old Irish there were two letters s. One of them lenited by turning into an h, a bit like in Gaelic becoming sh pronounced /h/, but the other one turned into an f, and the main word that began with that sort of an s was siur, meaning “sister”.

What seems to have happened in Scotland is that the nominative case form was back-derived from the lenited form phiur and assumed to be piur. Conversely in Ireland the nominative form won out, and they say siúr, but mainly, I think, for non-biological sisters, like nurses and nuns. A further difference here: Scotland retains the disyllabic form, whereas in Ireland it’s been simplified to a long vowel.

But why in Ireland do they say deartháir and deirfiúr for your biological siblings? Enter eDIL, the Electronic Dictionary of the Irish Language, which has entries for derbráthair and derbsiur, “true brother” and “true sister” respectively.

Posted in Uncategorized | Leave a comment

Second Celtic Language Technology Workshop revised deadline April the 20th

… which is next Wednesday rather than this Friday. Or if you’re in the UK or Ireland it’s very early next Thursday, but clearly nobody reading this would leave submission till the last moment. No.

Posted in conferences | Leave a comment

Numbers

(1) Dìreach aona mìos deug roimhe sin…

“Just eleven months before that”. In my annotation guidelines I have blithely stated “Attributive numbers are N/N“, which is fine for aona, but less so for deug, which I am going to treat as N\N. And yet in trì deug mìle it seems fair enough.

(2) Bha Gàidhlig ga bruidhinn air feadh Alba anns an aona linn deug.

(3) Bha sin ann an naoi ceud deug, fichead ‘s a ceithir.

Years and centuries are interesting. In (2), anns an aona … deug means “in the tenth”, as opposed to the other examples where deug means “ten”. In (3) the heads look like ceud, fichead and ceithir, so each of these can be N too.

Different rules apply, however, for the personal numbers: aonar, dithis, triùir and so on because if they are not standing on their own, they are followed by a noun in the genitive, for example dithis chloinne (“two children”) where dithis is N and chloinne is N\N.

Posted in grammar | Leave a comment

Second Celtic Language Technology Workshop deadline April the 15th

I have partly been quiet here because I have been hard at work putting together something for this:
http://www.lattice.cnrs.fr/CLTW/index-en.html
and clearly I should not prejudice the double-blindness of the refereeing too much. Ahem.

Posted in conferences | Leave a comment

Resumptivity resumed

I said (four years ago) that Gaelic doesn’t have resumptive pronouns. However, while scouring William Lamb’s Scottish Gaelic for unusual uses of agus, I found these examples, with the resumptive bit in bold.

  • sin an gille a shuidh Cèit air (that is the boy who Kate sat upon) (do not try this at home)
  • sin an gille a tha a mhàthair bochd (that is the boy whose mother is ill)

Now, in dictionaries air in the first example is indeed treated as a pronoun, though for subcategorization purposes I prefer to treat it as a PP. The second case, a as possessive pronoun, I’ve been treating as a pronoun, so on my own account what I said about Gaelic was wrong. It may of course be a determiner. The evidence for this off the top of my head is that unlike the small class of prenominal adjectives deagh, droch, sàr and so on, the possessives mo, do, a and so on can’t co-occur with the article an or with gach, and that unlike nouns in the genitive they go before the possessor rather than after the possessor. Pronoun or determiner, they have type N/N in categorial grammar.

Apparently there are resumptive pronouns in Irish, but I don’t have enough Irish to make sense of the literature I’ve seen on the subject, so I shall stop here.

Posted in grammar | 2 Comments

Interrogative frequencies in DASG

One aspect of Gaelic I want to look at more closely is interrogatives. Just as all the wh- words in English (who, when, why, what, how) go to the front of the sentence, so do all the c- words in Gaelic and the word order in the rest of the sentence changes as well. This is not universal, however. In Chinese, one simply substitutes the word for ‘what’ in the ordinary sentence order, just as when we’re particularly surprised in English we might say “You ate what?”.

In order to see how they work exactly, we need example sentences, so I’ve been looking in DASG. One easy first step is to look at frequencies in this table:

Interrogative Count English Observations
9122 who noisy; lots of prefixes and parts of words
ciod 4587 what
cia 2363 how also cia mar in older texts, cia fhad ‘how long’, cia mhòr ‘how big’
403 what also ‘God’
ciamar 273 how
càit 182 where also genitive of cat meaning ‘cat’
carson 133 why
càite 90 where
cuin 59 when
cuine 15 when

These are the results of accent-insensitive searches as the older texts haven’t had their spelling modernized or made consistent. The results surprised me a great deal for a number of reasons. Firstly, ciod ‘what’, which I don’t recall seeing terribly often in the present day is the most numerous interrogative, mostly occurring in a single document, a history of Scotland. One of the very first words you learn in Gaelic is its modern counterpart , which only has about 200 (judged by eye) instances as an interrogative in DASG. This is a similar number to càit(e), carson, cuin(e), and ciamar, ‘where’, ‘why’, ‘what’ and ‘how’. Secondly, the enormous number of hits for cia ‘how’, which on a cursory inspection are often exclamations, ‘how swift’, ‘how long’, ‘how horrible’ or an old spelling of ciamar in addition to the more familiar cia mheud ‘how many’. Thirdly, nearly all of the instances of  meaning ‘what’ are from a single work, Saoghal Bana-mharaiche, describing the Gaelic from the coast of Easter Ross.

I’ll leave you with a new meaning I’d never seen before for gu. This can be gu the preposition, gu the subordinator (as in gu bheil), gu the aspect marker or gu the adverbializer, but Gu dè tha thu? from DASG31, Ugam agus bhuam, is clearly neither. As explained here, what is going on is this: the Gaelic for ‘what’ used to be ciod e, like the Irish cad é, and over time this became dè. Gu dè is a variant of this. It’s another one of those pesky multiword expressions.

[Edit 2015-01-03 to clarify reason for looking at interrogatives and add another meaning of gu.]

Posted in grammar, preliminaries | Leave a comment

DASG and the second comparative

If you haven’t come across Dachaigh airson Stòras na Gàidhlig/Digital Archive of Scottish Gaelic you should stop what reading this and go straight there.

Welcome back. It contains eight and a half million words and is a resource I keep coming back to. In my first investigation, I’m looking for the second comparative, which I had never seen before last weekend. Here’s an example:

Is feairrde na stamagan srubag dheth

(The stomachs are better for a wee drink in them.) It’s explained in Gillie’s Elements of Scottish Gaelic Grammar, as differing from the normal comparative (“Xer”) in that it means “Xer by that” or “Xer because of that”. If you search for a word, DASG gives you a concordance so you can look at the local context of words.

Some second comparatives in DASG: feairrd, feairrde, misd, bigid, lughaid. An ambiguous word that might be a second comparative: mòid. I look forward to a POS-tagged version of DASG.

Posted in grammar | 2 Comments

Training a dependency parser on gdbank

A very quick note to say that I’ve trained maltparser, a dependency parser, with the current gdbank sentences (a mere 1223 tokens spread across 70-odd sentences), the Universal POS tagging scheme and the current Universal-ish gdbank dependency annotation scheme, and then seen how it performed on an unseen test set of 8 sentences containing 276 tokens taken from an article in The Scotsman from a few years ago.

It got 196 (71%) of the heads right, 207 (75%) of the dependency types right, and both the head and the dependency were right in 187 (68%) of cases. My initial impressions is that the main problems are subordinators and my having mis-POS-tagged a few words, but there will be a confusion matrix soon.

Posted in dependency parsing, maltparser | Leave a comment

MaltParser cheat mode

If you train MaltParser using the learnwo flowchart in place of learn, it does all the same things, except that it writes out the sentences as it reads them in.

This means that if you have, ahem, misformatted any of your input, you can see exactly which misformatting MaltParser is complaining about, because it will be in the first sentence that hasn’t been written to stdout.

Posted in dependency parsing, maltparser | Leave a comment

Installing MaltParser on Mac OS X 10.6.8

MaltParser is a dependency parser and it’s available here: http://www.maltparser.org/download.html

If you try to run the ready-built jar under Mac OS X 10.6.8 and you haven’t updated to Java 1.7, you’ll get a major.minor version number error. However, if you simply edit references in the build.xml file to read 1.6, and type

ant dist

to build with ant, then it will whirr away for a bit and build fine.

Posted in dependency parsing, maltparser | Leave a comment