What is the simplest parser that could possibly work?

Behind this blog is a happy half hour or so I spent on Friday evening writing a bit of code to do forward composition in a really simple-minded way. Forward composition, in categorial grammar, is applying the following rule:

X/Y Y → X

So I, being simple minded, wrote a bit of code that

  • Tokenized a sentence on spaces, and separated out punctuation marks (actually just the question mark).
  • Converted that tokenlist into a long string of types.
  • Applied forward composition to remove any like types next to each other and the slash that preceded them.
  • Printed out what remains.

My sentences are:

  • Dè tha dol? What goes? What’s happening?
  • An robh fhios agaibh? Did you know?

Let’s try the first one. The output of the program looks like this:

S[int]/?/VPVP/VNVN?
S[int]/?/VPVP?
S[int]/??
S[int]

based on the following types:

  • → S[int]/?/VP
    • S[int] indicates that the sentence is interrogative.
    • /? is there to digest the question mark at the end.
  • tha → VP/VN
    • give me a verbal noun, of which more soon, and I will give you a VP
  • dol → VN
  • ? → ?

I’m not happy with this grammar, which I wrote on Friday evening. But I leave it here for demonstration purposes.

Observations:

  1. This is a bit of a cheat, because the code doesn’t actually give you a proper bracketed parse.
  2. Also, it only allows for each lexical item to have a single type, which is wrong.
  3. But, despite all that and it being almost the simplest piece of code I’ve ever written, it identifies that those sentences are well-formed. If you try feeding it a sentence with the same words in a different order, it doesn’t give you S[int] out at the end.
  4. Something’s gone horribly wrong with the character encoding. You will just have to imagine the letter è in Dè tha dol? But I’m not going to investigate at this time of night.
  5. Code in haste, blog at leisure. It’s taken longer to write these 300-odd words than it did to produce the code in the first place.

Available here: ForwardComposition.java

Coming soon: To be and to be (2).

This entry was posted in code. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *