Why do we bother with grammatical frameworks?

Most natural languages, like English French Chukchi Basque Gaelic Italian Russian Latgallian Finnish Tamil and so forth, can be reasonably well modelled by a context-free grammar, which is the sort of grammar that people write computer languages in. Parsers for these are ten-a-penny. They have to be, otherwise you couldn’t run C, Perl, PHP, Python, Haskell or whatever. So a question you might be asking is why people don’t use these parsers for natural languages and go off and invent grammatical frameworks like HPSG LFG CCG and so on.

One important reason is agreement, by which I mean that verbs in English, say, agree for number and in a limited way for person. What does this mean in practice? Well, if you’re writing a context-free grammar to handle sentences like “The lady vanishes”, then you can’t just say:

S ? NP VP

because that overgenerates. That would allow “The lady vanish”, “The ladies vanishes”, “I vanishes” and so on, because each of these have the form NP VP. “The lady” is an NP (noun phrase), as is “The ladies” and “I”. The rest of these sentences are all VPs (verb phrases). So our grammar has to also say:

S ? NP_3rdsg VP_3rdsg

S ? NP_non3rdsg VP_non3rdsg

and the same applies to every rule you have in the grammar. Modern grammatical frameworks use feature structures to look after all of this, and enable you to insist that whatever features, like number (singular, plural, and in Slovene dual) or person (I, you, he/she) words have have to agree, so you can write rules like this:

S ? NP VP

and let the lexicon, the collection of the words themselves, handle the details.

A first attempt at the copula

Having got OpenCCG working, we can now start doing what we’re here for. To say “Calum is a teacher”, or “I am a teacher”, you have to say the at-first-glance rather odd:

  • ‘S e tidsear a th’ann Calum.
  • ‘S e tidsear a th’annam.

The unwary might translate those as “It is a teacher that is in Calum” and “It is a teacher that is in me”, but really tha + ann means “there is”. annam is a preposition marked for person, which I don’t think I’ve mentioned before. I’ve kind of implemented this, but it does overgenerate like mad. Overgeneration is when your grammar allows sentences that aren’t grammatical.

copula.ccg contains the grammar so far. Here are some highlights: Continue reading “A first attempt at the copula”

Getting OpenCCG to work on the Mac

OpenCCG is a java/python toolkit for working on combinatory categorial grammar, so is ideal for this exercise.

Update 2014-07-14: if you’re using OpenCCG 0.95, the latest version, on Mac OS X 10.6.8, then as long as you have Python 2.x and Java installed, then if you follow the build instructions?exactly then it should Just Work.

It comes with instructions for getting it to work under Unix and Windows, but on the Mac, or at least on the one I’m using, there’s a small amount of fiddling needed. Here it is:

  • You may not already have a recent version of python, which you can get from http://www.python.org/download/releases/2.7.1/ as a .dmg, which has a friendly hand-holdy installation process.
  • Environmental variables:
    • export JAVA_HOME=/usr (this surprised me, but it works on Mac OS X 10.4.11)
    • export PATH=”$PATH:$OPENCCG_HOME/bin”
    • cd to the directory that you’ve downloaded openccg to, type pwd, and set OPENCCG_HOME to it using export.
  • You will also need to fetch lex.py and yacc.py from sourceforge: http://openccg.cvs.sourceforge.net/viewvc/openccg/openccg/bin/ and put them in the bin folder in your OpenCCG installation.
  • If you then follow the instructions in the README file and get an error about the wrong class number you’ll have to rebuild it. Try typing ant at the command line and see what happens. I don’t remember installing ant, which means that it might come on the Mac by default. If not, you’ll have to go to http://ant.apache.org/. Good luck! ?Update 2014-07-13: do NOT attempt to build by typing ‘ant’ at the command line. This does not work. Make sure you type ‘ccg-build’. Only issue the ‘ant’ command if you want to see whether ant is installed on your machine.

It comes with some minuscule test grammars including Basque and Turkish.