Analysis of Thai dictionaries
In this post, we are looking at the size of various dictionaries and considering overlaps and differences.
In this post, we are looking at the size of various dictionaries and considering overlaps and differences.
We processed a Sept. 2025 dump of Thai wikipedia. The purpose was to produce a frequency list based on a relatively neutral corpus. Throughout this blog, the resulting frequency list will be referred to as the 'thwiki' list. 500,000 articles, north of 150+ million words/tokens. We processed it so you don't have to.
As L2 learners of the Thai language, our needs are not always served by general public resources, they also differs by individual, situations, and over time as learning progresses. There are resources for the Thai language, adequate if sparse, but many are produced and maintained (or not) by individuals whose life may get in the way. More community cooperation is needed. Here is our bit.
What is available?
A Thai word frequency list of ~20k words used in textbooks of primary and secondary school for Thai children
The แจ่มไพบูลย์/แรช Frequency List for Thai Learners v2.4
The first 2,500-2,700 roughly correspond to primary school level. The whole list to secondary school level.
The original frequency list is the 2016 work of Dr. Tantong Champaiboon (Ph.D. from Chulalongkorn University, Linguistics Department). She studied a corpus of textbooks for Thai students age 3-16 yo. The list is organised by various dimensions: measures of complexity of the vocabulary, comparison across 4 age ranges and 4 historical and current curricula.
The แจ่มไพบูลย์/แรช Frequency List for Thai Learners v2 is the enhanced version of the list as adapted for (English-speaking) Thai learners.
This strategy game style of hexagon map highlight the space occupied by the frequency list in the overall dictionary space.