The 100 top word tokens in Gaelic

Here they are, along with their counts in Wikipedia (as of the 25th of May 2011), their parts of speech (this is slightly fluid as I work out the tagset for the difficult ones), and a gloss in English where that’s doable.

01 an 42853 DT, COP, PRON, PARTICLE "the", "Is the?", "their", "?"
02 a 27516 REL, INFINITIVIZER, PRON "that", "to", "his", "her"
03 na 21328 DT "the", "of the"
04 ann 17545 P "in"
05 tha 16505 COP "am","are", "is"
06 e 16141 COP "he", "is"
07 a' 15155 AG (does not translate)
08 agus 14273 CONJ "and"
09 air 13113 P3S/P "on"
10 am 7038 DT "the"
11 's 6717 COP
12 anns 6091 P "in"
13 bha 5385 COP "was"
14 is 5022 COP/CONJ "and"
15 gu 4715 P/COMPLEMENTIZER "to"
16 aig 4317 P "on"
17 le 4209 P "with"
18 de 4121 P "of"
19 mar 3597 CONJ "if"
20 seo 2999 PRON "this"
21 sin 2848 PRON "this"
22 ri 2834 P "to"
23 nan 2716 DT "of the"
24 as 2682 P or COMPARATIVE MARKER "from"
25 baile 2648 N "town"
26 chaidh 2456 V "went"
27 ach 2310 CONJ "but"
28 iad 2242 PRON "they"
29 airson 2158 P "for"
30 do 2005 P/PRON/PAST TENSE PARTICLE "to"
31 bho 1939 P "from"
32 i 1795 PRON "she"
33 a-mach 1792 ADV "out"
34 san 1791 P+DT "in the"
35 daoine 1781 N "people"
36 eadar 1772 P "between"
37 bàsaichean 1741 N "the dead"
38 neo 1636 CONJ "or"
39 tachartasan 1634 N "events"
40 h-alba 1604 N "Scotland"
41 brèithean 1582 N "judgements"
42 mu 1550 P "about"
43 linn 1549 N "century"
44 leis 1538 P3S or P "with it" or "with" before article
45 bhaile 1484 N "town"
46 no 1472 CONJ "nor"
47 ceanglaichean 1380 N "links"
48 den 1375 P+DT "of the"
49 eile 1371 JJ "other"
50 dhe 1355 P "off"
51 bheil 1339 COP "was"
52 suidhichte 1303 JJ "arranged"
53 sa 1297 FUSEDPREPANDART "in the"
54 gun 1255 PREP "without"
55 ris 1249 PP3S or P "to him" or "to" before article
56 aige 1237 PP3S "on him"
57 cuideachd 1231 ADV "also"
58 robh 1219 COP "was"
59 iomraidhean 1194 N "the famous"
60 tuath 1192 N "north"
61 fuireach 1187 N "stay"
62 dùthcha 1149 N "of the country"
63 aonaichte 1099 JJ "united"
64 taobh 1072 N "to like"
65 duais 1048 N "prize"
66 nam 1029 DT "of the"
67 motha 1022 JJR "larger"
68 roinn 1016 N "region"
69 às 1006 P "from"
70 nuair 1004 CONJ "when"
71 iar 997 N "east"
72 far 968 CONJ "where"
73 tachartan 967 N "events"
74 eil 961 COP "was"
75 aon 948 NUM "one"
76 duine 946 N "person"
77 bhith 936 COP
78 eilean 921 N "island"
79 fhèin 920 PRON "oneself"
80 alba 917 N "Scotland"
81 stàitean 905 N "states"
82 breithean 887
83 deas 885 N "south"
84 bhliadhna 880 N "year"
85 chan 872 NEGP/COP "not"
86 mòr 871 JJ "big"
87 dùthaich 869 N "country"
88 ainm 820 N "name"
89 th' 813 COP "am", "are", "is"
90 dè 805 WH "what"
91 gach 805 JJ "every"
92 prìomh-bhaile 802 N "capital"
93 ag 785 AG
94 nach 771 COP "is not"
95 è 764 PRON "he"
96 ainmeil 764 JJ "famous"
97 bhon 761 P+DT "from the"
98 b' 751 COP "was"
99 nas 741 COMPARATIVE MARKER "the most"
100 cho 732 P "as"

Source for the bits I didn’t know Stòr-dàta. Source for the mistakes Colin Batchelor.

One Response to The 100 top word tokens in Gaelic

  1. Pingback: What kind of language is this? The top 100 word tokens in Gaelic | CGGblog

Leave a Reply

Your email address will not be published. Required fields are marked *