Monthly Archives: May 2011

But what can we tell from the 100 top word tokens?

26 are prepositions of some sort 23 are nouns 10 are conjunctions 10 are verbs 5 are articles 7 are adjectives 7 are pronouns 4 are preverbal particles 2 are adverbs The number of prepositions is unusually high and indicates … Continue reading

Posted in grammar, preliminaries | Leave a comment

What kind of language is this? The top 100 word tokens in Gaelic

I downloaded all of the Gaelic wikipedia. This is not hard. It is at http://dumps.wikimedia.org/gdwiki/latest/ and you probably want gdwiki-latest-pages-articles.xml.bz2, which contains all the text. Now I can do word-token counts on it, using terrible code like the following: #!/usr/bin/perl … Continue reading

Posted in grammar, preliminaries | Leave a comment