<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>CGGblog</title>
	<atom:link href="http://www.tantallon.org.uk/cggblog/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://www.tantallon.org.uk/cggblog</link>
	<description>Categorial Grammar of Gaelic</description>
	<lastBuildDate>Fri, 10 Jun 2011 20:26:42 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>Why do we bother with grammatical frameworks?</title>
		<link>http://www.tantallon.org.uk/cggblog/?p=58</link>
		<comments>http://www.tantallon.org.uk/cggblog/?p=58#comments</comments>
		<pubDate>Fri, 10 Jun 2011 20:26:42 +0000</pubDate>
		<dc:creator>Colin Batchelor</dc:creator>
				<category><![CDATA[grammar]]></category>
		<category><![CDATA[preliminaries]]></category>

		<guid isPermaLink="false">http://www.tantallon.org.uk/cggblog/?p=58</guid>
		<description><![CDATA[Most natural languages, like English French Chukchi Basque Gaelic Italian Russian Latgallian Finnish Tamil and so forth, can be reasonably well modelled by a context-free grammar, which is the sort of grammar that people write computer languages in. Parsers for &#8230; <a href="http://www.tantallon.org.uk/cggblog/?p=58">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><strong></strong>Most natural languages, like English French Chukchi Basque Gaelic Italian Russian Latgallian Finnish Tamil and so forth, can be reasonably well modelled by a <strong>context-free grammar</strong>, which is the sort of grammar that people write computer languages in. Parsers for these are ten-a-penny. They have to be, otherwise you couldn&#8217;t run C, Perl, PHP, Python, Haskell or whatever. So a question you might be asking is why people don&#8217;t use these parsers for natural languages and go off and invent grammatical frameworks like HPSG LFG CCG and so on.</p>
<p>One important reason is <strong>agreement</strong>, by which I mean that verbs in English, say, agree for number and in a limited way for person. What does this mean in practice? Well, if you&#8217;re writing a context-free grammar to handle sentences like &#8220;The lady vanishes&#8221;, then you can&#8217;t just say:</p>
<p>S → NP VP</p>
<p>because that <strong>overgenerates</strong>. That would allow &#8220;The lady vanish&#8221;, &#8220;The ladies vanishes&#8221;, &#8220;I vanishes&#8221; and so on, because each of these have the form NP VP. &#8220;The lady&#8221; is an NP (noun phrase), as is &#8220;The ladies&#8221; and &#8220;I&#8221;. The rest of these sentences are all VPs (verb phrases). So our grammar has to also say:</p>
<p>S → NP_3rdsg VP_3rdsg</p>
<p>S → NP_non3rdsg VP_non3rdsg</p>
<p>and the same applies to every rule you have in the grammar. Modern grammatical frameworks use <strong>feature structures</strong> to look after all of this, and enable you to insist that whatever <strong>features</strong>, like <strong>number</strong> (singular, plural, and in Slovene dual) or <strong>person</strong> (I, you, he/she) words have have to agree, so you can write rules like this:</p>
<p>S → NP VP</p>
<p>and let the <strong>lexicon</strong>, the collection of the words themselves, handle the details.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.tantallon.org.uk/cggblog/?feed=rss2&#038;p=58</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A first attempt at the copula</title>
		<link>http://www.tantallon.org.uk/cggblog/?p=55</link>
		<comments>http://www.tantallon.org.uk/cggblog/?p=55#comments</comments>
		<pubDate>Mon, 06 Jun 2011 22:05:23 +0000</pubDate>
		<dc:creator>Colin Batchelor</dc:creator>
				<category><![CDATA[grammar]]></category>

		<guid isPermaLink="false">http://www.tantallon.org.uk/cggblog/?p=55</guid>
		<description><![CDATA[Having got OpenCCG working, we can now start doing what we&#8217;re here for. To say &#8220;Calum is a teacher&#8221;, or &#8220;I am a teacher&#8221;, you have to say the at-first-glance rather odd: &#8216;S e tidsear a th&#8217;ann Calum. &#8216;S e &#8230; <a href="http://www.tantallon.org.uk/cggblog/?p=55">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Having got OpenCCG working, we can now start doing what we&#8217;re here for. To say &#8220;Calum is a teacher&#8221;, or &#8220;I am a teacher&#8221;, you have to say the at-first-glance rather odd:</p>
<ul>
<li><em>&#8216;S e tidsear a th&#8217;ann Calum.</em></li>
<li><em>&#8216;S e tidsear a th&#8217;annam.</em></li>
</ul>
<p>The unwary might translate those as &#8220;It is a teacher that is in Calum&#8221; and &#8220;It is a teacher that is in me&#8221;, but really <em>tha</em> + <em>ann</em> means &#8220;there is&#8221;. <em>annam</em> is a preposition marked for person, which I don&#8217;t think I&#8217;ve mentioned before. I&#8217;ve kind of implemented this, but it does <strong>overgenerate </strong>like mad. Overgeneration is when your grammar allows sentences that aren&#8217;t grammatical.</p>
<p><a href="http://www.tantallon.org.uk/cggblog/Downloads/copula.ccg">copula.ccg</a> contains the grammar so far. Here are some highlights:<span id="more-55"></span>This is my feature set so far. EMPH applies to the emphatic and unemphatic forms of the pronoun. EXIS, which doesn&#8217;t quite work yet, marks existential clauses.<br />
<code><br />
feature {<br />
NUM&lt;2&gt;: sg pl;<br />
GEND&lt;2&gt;: masc fem;<br />
PERS&lt;2&gt;: 1st 2nd 3rd;<br />
EMPH&lt;2&gt;: emph unemph;<br />
PROP&lt;2&gt;: proper common;<br />
EXIS&lt;2&gt;: exis+ exis-;<br />
}<br />
</code><br />
I have a rather small word list, with <em>tidsear</em> (teacher) and <em>ball-parlamaid</em> as common nouns, and Calum, Somhairle and Sine (pronounced Sheena) as personal names, and <em>is</em>, <em>s</em> and <em>tha</em> as copula, copula and <em>Bi</em> &#8211; a special type for a peculiar verb, respectively:<br />
<code><br />
word is:Cop;<br />
word s:Cop;<br />
word tidsear:N;<br />
word ball-parlamaid:N;<br />
word annam:PP;<br />
word Calum:Name;<br />
word Somhairle:Name;<br />
word Sine:Name;<br />
word th:Bi;<br />
word tha:Bi;<br />
word a:Rel;<br />
word pre:P {<br />
ann: exis+;<br />
air: exis-;<br />
}<br />
word pro1:Pro {<br />
mi: 1st sg unemph;<br />
thu: 2nd sg unemph;<br />
e: 3rd sg unemph masc;<br />
i: 3rd sg unemph fem;<br />
}<br />
word pro2:ProEmph {<br />
mise: 1st sg emph;<br />
thusa: 2nd sg emph;<br />
esan: 3rd sg emph masc;<br />
}<br />
</code><br />
and the actual grammatical machinery to get at least some examples working:<br />
<code><br />
family Bi(V) {<br />
 entry: s<1>/pp<2>;<br />
}<br />
family P { entry: pp<2>/np<2>; }<br />
family Pro {entry: np<2>[unemph];}<br />
family ProEmph { entry: np<2>[emph]; }<br />
family N { entry: np<2>[common]; }<br />
family Name { entry: np<2>[proper]; }<br />
family Cop(V) {<br />
 entry: (s<1>/np<2>[emph])/np<2>[proper];<br />
 entry: (s<1>/np<2>[unemph 3rd masc sg])/np<2>[exis+];<br />
}<br />
family NP { entry: np<2>/np<2>; }<br />
family PP { entry: pp<2>; }<br />
family Rel { entry: (np\np)/s; }<br />
</code><br />
I don&#8217;t have a particularly smart account of NPs here. But it&#8217;s a start. Also, I can&#8217;t work out how to get apostrophes in.</p>
<p>To do: make sure that it doesn&#8217;t overgenerate for non-existential clauses. I&#8217;m not sure, for example, that <em>tha air Calum</em> is an S, although <em>tha ann Calum</em> is.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.tantallon.org.uk/cggblog/?feed=rss2&#038;p=55</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Getting OpenCCG to work on the Mac</title>
		<link>http://www.tantallon.org.uk/cggblog/?p=48</link>
		<comments>http://www.tantallon.org.uk/cggblog/?p=48#comments</comments>
		<pubDate>Sun, 05 Jun 2011 16:15:46 +0000</pubDate>
		<dc:creator>Colin Batchelor</dc:creator>
				<category><![CDATA[other people's code]]></category>
		<category><![CDATA[preliminaries]]></category>

		<guid isPermaLink="false">http://www.tantallon.org.uk/cggblog/?p=48</guid>
		<description><![CDATA[OpenCCG is a java/python toolkit for working on combinatory categorial grammar, so is ideal for this exercise. It comes with instructions for getting it to work under Unix and Windows, but on the Mac, or at least on the one &#8230; <a href="http://www.tantallon.org.uk/cggblog/?p=48">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://openccg.sourceforge.net/">OpenCCG</a> is a java/python toolkit for working on combinatory categorial grammar, so is ideal for this exercise. It comes with instructions for getting it to work under Unix and Windows, but on the Mac, or at least on the one I&#8217;m using, there&#8217;s a small amount of fiddling needed. Here it is:</p>
<ul>
<li>You may not already have a recent version of python, which you can get from <a href="http://www.python.org/download/releases/2.7.1/">http://www.python.org/download/releases/2.7.1/</a> as a .dmg, which has a friendly hand-holdy installation process.</li>
<li>Environmental variables:
<ul>
<li><code>export JAVA_HOME=/usr</code> (this surprised me, but it works on Mac OS X 10.4.11)</li>
<li>export PATH=&#8221;$PATH:$OPENCCG_HOME/bin&#8221;</li>
<li><code>cd</code> to the directory that you&#8217;ve downloaded openccg to, type <code>pwd</code>, and set <code>OPENCCG_HOME</code> to it using <code>export</code>.</li>
</ul>
</li>
<li>You will also need to fetch <code>lex.py</code> and <code>yacc.py</code> from sourceforge: <a href="http://openccg.cvs.sourceforge.net/viewvc/openccg/openccg/bin/">http://openccg.cvs.sourceforge.net/viewvc/openccg/openccg/bin/</a> and put them in the <code>bin</code> folder in your OpenCCG installation.</li>
<li>If you then follow the instructions in the README file and get an error about the wrong class number you&#8217;ll have to rebuild it. Try typing <code>ant</code> at the command line and see what happens. I don&#8217;t remember installing ant, which means that it might come on the Mac by default. If not, you&#8217;ll have to go to <a href="http://ant.apache.org/">http://ant.apache.org/</a>. Good luck!</li>
</ul>
<p>It comes with some minuscule test grammars including Basque and Turkish.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.tantallon.org.uk/cggblog/?feed=rss2&#038;p=48</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>But what can we tell from the 100 top word tokens?</title>
		<link>http://www.tantallon.org.uk/cggblog/?p=44</link>
		<comments>http://www.tantallon.org.uk/cggblog/?p=44#comments</comments>
		<pubDate>Thu, 26 May 2011 20:07:05 +0000</pubDate>
		<dc:creator>Colin Batchelor</dc:creator>
				<category><![CDATA[grammar]]></category>
		<category><![CDATA[preliminaries]]></category>

		<guid isPermaLink="false">http://www.tantallon.org.uk/cggblog/?p=44</guid>
		<description><![CDATA[26 are prepositions of some sort 23 are nouns 10 are conjunctions 10 are verbs 5 are articles 7 are adjectives 7 are pronouns 4 are preverbal particles 2 are adverbs The number of prepositions is unusually high and indicates &#8230; <a href="http://www.tantallon.org.uk/cggblog/?p=44">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<ul>
<li>26 are prepositions of some sort</li>
<li>23 are nouns</li>
<li>10 are conjunctions</li>
<li>10 are verbs</li>
<li>5 are articles</li>
<li>7 are adjectives</li>
<li>7 are pronouns</li>
<li>4 are preverbal particles</li>
<li>2 are adverbs</li>
</ul>
<p>The number of prepositions is unusually high and indicates that PPs (prepositional phrases) do an awful lot of the work in a Gaelic sentence. The number of verbs seems pretty low, and in fact many of them are forms of the verbs &#8220;to be&#8221; that we&#8217;ve seen earlier. This is because the verb &#8220;to be&#8221; typically does much of the rest of the work. More examples of this to come.</p>
<p>The article doesn&#8217;t mark gender (of which there are two, masculine and feminine) but it does mark the two numbers (singular and plural). So how come there are five articles listed?</p>
<p>Well, <em>an </em>is the singular, <em>na</em> does double duty for &#8220;of the&#8221; and &#8220;the&#8221; plural. <em>nan</em> does &#8220;of the&#8221; plural. Before a labial consonant, <em>an</em> becomes <em>am</em> and <em>nan</em> becomes <em>nam</em>. This warns us that our system will have to take into account initial consonants to get this right.</p>
<p>There are also some duplicates. &#8220;Scotland&#8221; is <em>Alba</em> normally and <em>h-Alba </em>after <em>na</em>, as in <em>Banca na h-Alba</em> &#8220;Bank of Scotland&#8221;. <em>duine</em> (person) has a weird-looking plural, <em>daoine</em>. <em>dùthaich</em> has the <strong>genitive</strong> form <em>dùthcha</em>. <em>baile</em> (town) has a <strong>lenited</strong> form (I will come to this, but not today) <em>bhaile</em>. So we see that Gaelic is not only morphologically rich, but instead of adding case endings and whatnot to the ends of words, like in Hungarian or Turkish, modifies the insides of words instead.</p>
<p>That will do for the now.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.tantallon.org.uk/cggblog/?feed=rss2&#038;p=44</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What kind of language is this? The top 100 word tokens in Gaelic</title>
		<link>http://www.tantallon.org.uk/cggblog/?p=30</link>
		<comments>http://www.tantallon.org.uk/cggblog/?p=30#comments</comments>
		<pubDate>Sat, 21 May 2011 15:14:50 +0000</pubDate>
		<dc:creator>Colin Batchelor</dc:creator>
				<category><![CDATA[grammar]]></category>
		<category><![CDATA[preliminaries]]></category>

		<guid isPermaLink="false">http://www.tantallon.org.uk/cggblog/?p=30</guid>
		<description><![CDATA[I downloaded all of the Gaelic wikipedia. This is not hard. It is at http://dumps.wikimedia.org/gdwiki/latest/ and you probably want gdwiki-latest-pages-articles.xml.bz2, which contains all the text. Now I can do word-token counts on it, using terrible code like the following: #!/usr/bin/perl &#8230; <a href="http://www.tantallon.org.uk/cggblog/?p=30">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I downloaded all of the Gaelic wikipedia. This is not hard. It is at <a href="http://dumps.wikimedia.org/gdwiki/latest/">http://dumps.wikimedia.org/gdwiki/latest/</a> and you probably want <a href="http://dumps.wikimedia.org/gdwiki/latest/gdwiki-latest-pages-articles.xml.bz2">gdwiki-latest-pages-articles.xml.bz2</a>, which contains all the text.</p>
<p>Now I can do word-token counts on it, using terrible code like the following:</p>
<p><code>#!/usr/bin/perl -w<br />
my %list;<br />
while (&lt;&gt;) {<br />
@tokens = split(/[\s\)\(\.=\,\?]/);<br />
for $token (@tokens)<br />
{<br />
$list{lc($token)}++;<br />
}<br />
}<br />
foreach $key (sort { $list{$b} &lt;=&gt; $list{$a} } (keys %list)) {<br />
print " $key $list{$key}\n";<br />
}<br />
</code></p>
<p>Note the entirely <em>ad hoc</em> collection of characters to split on.<br />
The list is <a href="http://www.tantallon.org.uk/cggblog/?page_id=35">here</a>, and you will see that the first noun is <em>baile</em> (town) at number 25, which tells you more about Wikipedia than it does about Gaelic. But also that <em>an</em>, <em>e</em> and <em>is</em> are, as we have seen, ambiguous between parts of speech, and that I can&#8217;t quite work out what to do with <em>a</em>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.tantallon.org.uk/cggblog/?feed=rss2&#038;p=30</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Arabic</title>
		<link>http://www.tantallon.org.uk/cggblog/?p=27</link>
		<comments>http://www.tantallon.org.uk/cggblog/?p=27#comments</comments>
		<pubDate>Wed, 16 Mar 2011 22:27:45 +0000</pubDate>
		<dc:creator>Colin Batchelor</dc:creator>
				<category><![CDATA[grammar]]></category>
		<category><![CDATA[not gaelic]]></category>
		<category><![CDATA[other people's code]]></category>

		<guid isPermaLink="false">http://www.tantallon.org.uk/cggblog/?p=27</guid>
		<description><![CDATA[A better-resourced language that is VSO is Arabic, and I noticed today that Chris Brew&#8217;s group have a paper on converting the Penn Arabic Treebank into CCG. I didn&#8217;t know that Arabic had resumptive pronouns. Gaelic doesn&#8217;t, but they might &#8230; <a href="http://www.tantallon.org.uk/cggblog/?p=27">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>A better-resourced language that is VSO is Arabic, and I noticed today that Chris Brew&#8217;s group have a <a href="http://www.lrec-conf.org/proceedings/lrec2010/summaries/623.html">paper on converting the Penn Arabic Treebank into CCG</a>. I didn&#8217;t know that Arabic had <a href="http://en.wikipedia.org/wiki/Resumptive_pronoun">resumptive pronouns</a>. Gaelic doesn&#8217;t, but they might be useful in posts explaining gapping later on.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.tantallon.org.uk/cggblog/?feed=rss2&#038;p=27</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>To be and to be (2)</title>
		<link>http://www.tantallon.org.uk/cggblog/?p=25</link>
		<comments>http://www.tantallon.org.uk/cggblog/?p=25#comments</comments>
		<pubDate>Thu, 10 Mar 2011 22:51:56 +0000</pubDate>
		<dc:creator>Colin Batchelor</dc:creator>
				<category><![CDATA[grammar]]></category>

		<guid isPermaLink="false">http://www.tantallon.org.uk/cggblog/?p=25</guid>
		<description><![CDATA[And there is another verb &#8220;to be&#8221;, like this: Positive: Is mise Calum. (I am Calum.) Interrogative: An tusa Ealasaid? (Are you Elizabeth?) Negative: Cha mise Calum. (I amn&#8217;t Calum.) Negative interrogative: Chan esan Uilleam? (Aren&#8217;t you William?) However, is &#8230; <a href="http://www.tantallon.org.uk/cggblog/?p=25">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>And there is another verb &#8220;to be&#8221;, like this:</p>
<ul>
<li>Positive: <em>Is mise Calum</em>. (I am Calum.)</li>
<li>Interrogative: <em>An tusa Ealasaid?</em> (Are you Elizabeth?)</li>
<li>Negative: <em>Cha mise Calum.</em> (I amn&#8217;t Calum.)</li>
<li>Negative interrogative: <em>Chan esan Uilleam?</em> (Aren&#8217;t you William?)</li>
</ul>
<p>However, <em>is</em> doesn&#8217;t have type (S/NP)/NP because you can&#8217;t say</p>
<p>*<em>Is Calum tidsear.</em> (Calum is a teacher.)</p>
<p>(The star indicates that a sentence is ungrammatical.) You also can&#8217;t say</p>
<p>* <em>Is mise tidsear.</em> (I am a teacher.)</p>
<p>So we can, for now, assign it type (S/Nper)/PN, where Nper is a personal pronoun and PN is a proper noun. But let me assure you it gets much harder.</p>
<p>Coming soon: How to say that I am a teacher.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.tantallon.org.uk/cggblog/?feed=rss2&#038;p=25</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What is the simplest parser that could possibly work?</title>
		<link>http://www.tantallon.org.uk/cggblog/?p=22</link>
		<comments>http://www.tantallon.org.uk/cggblog/?p=22#comments</comments>
		<pubDate>Tue, 08 Mar 2011 23:00:42 +0000</pubDate>
		<dc:creator>Colin Batchelor</dc:creator>
				<category><![CDATA[code]]></category>

		<guid isPermaLink="false">http://www.tantallon.org.uk/cggblog/?p=22</guid>
		<description><![CDATA[Behind this blog is a happy half hour or so I spent on Friday evening writing a bit of code to do forward composition in a really simple-minded way. Forward composition, in categorial grammar, is applying the following rule: X/Y &#8230; <a href="http://www.tantallon.org.uk/cggblog/?p=22">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Behind this blog is a happy half hour or so I spent on Friday evening writing a bit of code to do forward composition in a really simple-minded way. Forward composition, in categorial grammar, is applying the following rule:</p>
<blockquote><p>X/Y Y → X</p></blockquote>
<p>So I, being simple minded, wrote a bit of code that<span id="more-22"></span></p>
<ul>
<li>Tokenized a sentence on spaces, and separated out punctuation marks (actually just the question mark).</li>
<li>Converted that tokenlist into a long string of types.</li>
<li>Applied forward composition to remove any like types next to each other and the slash that preceded them.</li>
<li>Printed out what remains.</li>
</ul>
<p>My sentences are:</p>
<ul>
<li><em>Dè tha dol?</em> What goes? What&#8217;s happening?</li>
<li><em>An robh fhios agaibh?</em> Did you know?</li>
</ul>
<p>Let&#8217;s try the first one. The output of the program looks like this:</p>
<pre>S[int]/?/VPVP/VNVN?
S[int]/?/VPVP?
S[int]/??
S[int]
</pre>
<p>based on the following types:</p>
<ul>
<li><em>Dè </em>→ S[int]/?/VP
<ul>
<li>S[int] indicates that the sentence is interrogative.</li>
<li>/? is there to digest the question mark at the end.</li>
</ul>
</li>
<li><em>tha </em>→ VP/VN
<ul>
<li>give me a verbal noun, of which more soon, and I will give you a VP</li>
</ul>
</li>
<li><em>dol</em> → VN</li>
<li>? → ?</li>
</ul>
<p>I&#8217;m not happy with this grammar, which I wrote on Friday evening. But I leave it here for demonstration purposes.</p>
<p>Observations:</p>
<ol>
<li>This is a bit of a cheat, because the code doesn&#8217;t actually give you a proper bracketed parse.</li>
<li>Also, it only allows for each lexical item to have a single type, which is wrong.</li>
<li>But, despite all that and it being almost the simplest piece of code I&#8217;ve ever written, it identifies that those sentences are well-formed. If you try feeding it a sentence with the same words in a different order, it doesn&#8217;t give you S[int] out at the end.</li>
<li>Something&#8217;s gone horribly wrong with the character encoding. You will just have to imagine the letter è in <em>Dè tha dol?</em> But I&#8217;m not going to investigate at this time of night.</li>
<li>Code in haste, blog at leisure. It&#8217;s taken longer to write these 300-odd words than it did to produce the code in the first place.</li>
</ol>
<p>Available here: <a href="http://www.tantallon.org.uk/cggblog/Downloads/ForwardComposition.java">ForwardComposition.java</a></p>
<p>Coming soon: To be and to be (2).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.tantallon.org.uk/cggblog/?feed=rss2&#038;p=22</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>To be and to be (1)</title>
		<link>http://www.tantallon.org.uk/cggblog/?p=16</link>
		<comments>http://www.tantallon.org.uk/cggblog/?p=16#comments</comments>
		<pubDate>Mon, 07 Mar 2011 18:49:33 +0000</pubDate>
		<dc:creator>Colin Batchelor</dc:creator>
				<category><![CDATA[grammar]]></category>
		<category><![CDATA[open questions]]></category>

		<guid isPermaLink="false">http://www.tantallon.org.uk/cggblog/?p=16</guid>
		<description><![CDATA[First up, &#8220;to be&#8221;. Bi has three forms in the present tense, according to whether it&#8217;s positive, interrogative, negative or negative interrogative, all thanks to particles which I don&#8217;t know whether they&#8217;re a VMOD or a P or a what. &#8230; <a href="http://www.tantallon.org.uk/cggblog/?p=16">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>First up, &#8220;to be&#8221;.</p>
<p><em>Bi</em> has three forms in the present tense, according to whether it&#8217;s positive, interrogative, negative or negative interrogative, all thanks to particles which I don&#8217;t know whether they&#8217;re a VMOD or a P or a what.</p>
<ul>
<li>Positive: <em>Tha mi sgìth</em> (I am tired). <em>Tha</em> is <strong>independent</strong> so is (S/NP)/ADJ.</li>
<li>Negative: <em>Chan eil mi sgìth</em> (I am not tired). <em>Eil</em> is <strong>dependent</strong>, so its type will be ((S\VMOD)/NP)/ADJ.</li>
<li>Negative interrogative: <em>Nach eil thu sgìth?</em> (Are you not tired?) Same as above.</li>
<li>Interrogative: <em>A bheil thu sgìth?</em> (Are you tired?) <em>Bheil</em> is <strong>dependent</strong> again but whereas most verbs just have a dependent and an independent form in any one tense, <em>bi</em> has two dependent forms. So, what to do?</li>
</ul>
<p>Whatever happens, we have a tree at the top which says something like (Vspec Vbar). I don&#8217;t especially like this because <em>eil thu sgìth</em>, the Vbar, isn&#8217;t a <a href="http://en.wikipedia.org/wiki/Constituent_(linguistics)">constituent</a>. I&#8217;m willing to lay money that there are no song titles that begin &#8220;Eil&#8221;. A coordination test for constituency, incidentally, isn&#8217;t decisive in English because one thing that categorial grammar is good at is non-constituent coordination, say in &#8220;Mary loves pizza and Tim rice&#8221;.</p>
<p>So either we commit ourselves to a type S/Vbar for <em>a</em> or <em>chan</em> or <em>nach</em>, which is potentially good because you could parse the entire sentence with forward composition, of which more tomorrow, or to assign <em>eil</em> type ((S\Vspec_neg)/NP)/ADJ and <em>bheil</em> type ((S\Vspec_int)/NP/ADJ. More types will be needed for all of these forms, because <em>bi</em> isn&#8217;t just for adjectives!</p>
<p>Tomorrow: what is the simplest parser that could possibly work?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.tantallon.org.uk/cggblog/?feed=rss2&#038;p=16</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The slashes</title>
		<link>http://www.tantallon.org.uk/cggblog/?p=12</link>
		<comments>http://www.tantallon.org.uk/cggblog/?p=12#comments</comments>
		<pubDate>Sun, 06 Mar 2011 12:26:24 +0000</pubDate>
		<dc:creator>Colin Batchelor</dc:creator>
				<category><![CDATA[grammar]]></category>

		<guid isPermaLink="false">http://www.tantallon.org.uk/cggblog/?p=12</guid>
		<description><![CDATA[In most frameworks you quickly get familiar with notation like PP (prepositional phrase), NP (noun phrase), VT (transitive verb), ADJ (adjective) and so forth. Categorial grammar however, bristles with things like (S\NP)\((S\NP)/(S[adj]\NP)). What&#8217;s going on here? Aside: hopefully these are &#8230; <a href="http://www.tantallon.org.uk/cggblog/?p=12">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>In most frameworks you quickly get familiar with notation like PP (prepositional phrase), NP (noun phrase), VT (transitive verb), ADJ (adjective) and so forth. Categorial grammar however, bristles with things like (S\NP)\((S\NP)/(S[adj]\NP)). What&#8217;s going on here?</p>
<p><em>Aside: hopefully these are the last examples I give in English.</em></p>
<p>&#8220;Mary loves pizza&#8221;. &#8220;Mary&#8221; is a singular personal name, &#8220;pizza&#8221; is a mass noun, and those are both sorts of NP. What about &#8220;loves&#8221;? It&#8217;s (S\NP)/NP. A simpler example is &#8220;Ice melts&#8221;. &#8220;Ice&#8221; is an NP, and &#8220;melts&#8221; here is S\NP. The backslash in <em>Y</em>\<em>X</em> means &#8220;give me something of type <em>X</em> to my left and I&#8217;ll give you a <em>Y</em>&#8220;.</p>
<p>So (S\NP)/NP, with a forward slash and a backslash, takes NPs to the left and right, and gives you an S, or a sentence.</p>
<p>In principle Gaelic verbs should have type (S/NP)/NP, but I have never seen a sentence exactly like this. &#8220;Mary loves pizza&#8221;, after all, is only OK because &#8220;loves&#8221; is stative. Unless you do the marketing for McDonald&#8217;s.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.tantallon.org.uk/cggblog/?feed=rss2&#038;p=12</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

