Thai Language Textbook Vocabulary Development: A Corpus Study
To better understand the source of the แจ่มไพบูลย์/แรช frequency list of Thai words for L2 learners, it is useful to review in depth the thesis of Dr. T. Chaempaiboon.
Part One: Overview of the Thesis
The thesis by Dr. T. Chaempaiboon (2016), titled Development of High-Frequency Words in Thai Language Textbooks: A Corpus Linguistics Study, provides a comprehensive analysis of vocabulary development in Thai education materials. Dr. Thanthong Chaempaiboon completed this work for a Doctor of Philosophy degree in Linguistics at Chulalongkorn University, with Associate Professor Dr. Wirot Arunmanakul serving as the main thesis advisor.
Context and Methodology
The research follows a corpus linguistics approach, relying on the Thai Textbook Corpus (TTC), which was created specifically for this study.
- Corpus Size and Scope: The TTC encompasses 3,037,772 tokens (words in occurrence) and 19,494 word forms (types). The data was collected from 354 textbooks and supplementary reading materials for Thai language learning, ranging from pre-primary level up to Mathayom 6 (upper secondary).
- Dimensions of Comparison: The study compares vocabulary development across two major dimensions:
- Levels of Education (or grade levels/ช่วงชั้น): Divided into four levels (L1: lower primary, L2: upper primary, L3: lower secondary, L4: upper secondary).
- Curricula: Covering four basic education curricula announced in 1960, 1978, 1990, and 2001 (C1-C4).
- High-Frequency Words: The term "high-frequency words" refers to those words whose frequency falls within the first 50% of the cumulative frequency, anticipated to be around 90–100 words, which were selected for in-depth analysis of semantic development.
Research Objectives and Hypotheses
The primary objectives were to compare similarities, differences, and the expansion of the vocabulary list and the expansion of meaning and function of words within each grouping (level and curriculum).
The hypotheses guiding the research were:
- Levels of Education:
- It was hypothesized that higher levels would feature a greater and more diverse number of words and that the vocabulary of higher levels would cover (or include) the word forms found in lower levels.
- It was also assumed that the meanings and functions of words would increase in higher education levels.
- Curricula:
- It was hypothesized that there would be more than 90% shared word forms between textbooks from each curriculum.
- It was assumed that the variety of meanings and functions would be the same across all curricula, meaning the same terminology would be used with the same scope, meaning, and function.
Key Findings and Conclusions
The study’s findings challenge some hypotheses while confirming others, providing an extensive overview of vocabulary characteristics in Thai textbooks.
Vocabulary Expansion and Corpus Characteristics
- Corpus Size and Word Forms: The study confirmed that the corpus generally grew larger according to increasing grade levels, both in terms of the number of words in occurrence (tokens) and word forms (types).
- Coverage and Growth Rate: However, the hypothesis that vocabulary in higher grades covered all words found in lower grades was refuted; vocabulary in higher levels did not cover all the word forms in the lower levels.
- Highest Growth: Vocabulary did not increase dramatically across all grades, but showed a significant increase from early primary (Level 1) to late primary (Level 2). The number of word forms then remained relatively stable in subsequent grades. This indicated that the late primary level (L2) is the period with the highest vocabulary growth.
- Type of Words Added: The increase in vocabulary consisted mainly of content words. This rise was attributed to the diverse content presented in lessons and reading passages.
Difficulty Analysis
Since relying solely on the quantity of content words was insufficient to judge difficulty, the study analyzed vocabulary difficulty across four dimensions:
- Syllable Count: Words with a higher number of syllables were considered more difficult.
- Word Formation: Compound words and English transliterated words were considered more difficult than single words.
- Grapheme/Pronunciation Possibilities: Words with more than one possible pronunciation were considered more difficult.
- Semantic Opacity: Opaque words (meaning not derived from internal components, e.g. "absorb") were considered more difficult than transparent words.
The results showed that both vocabulary and text difficulty differed across grade levels. For instance, when compared across levels, higher levels showed an increase in word forms that may be pronounced in more than one way, suggesting higher complexity in grapheme-to-sound relationship. The word forms generally followed a trend of increasing difficulty as the grade level increased.
Meaning and Function Expansion
- Levels of Education: The study found no increment of meanings or functions corresponding to higher education levels for high-frequency words. The high-frequency words often had similar meanings and functions across all grades, implying that the various meanings and functions of these words are taught from the initial educational level (L1).
- Curricula Comparison: In the curricular dimension, the number of word forms and tokens was found to be similar. More than 90% of the word forms were shared across curricula. Furthermore, there was no finding of expansion in the scope of meaning or function across curricula, suggesting that the patterns and methods of using various words remained consistent despite the passage of time.
In summation, the research confirmed that Thai language textbooks exhibit a sharp increase in vocabulary in the upper primary level, though subsequent levels maintain a stable volume of new words. Although the overall amount of words increases with grade level, leading to general complexity, the core meanings and usage patterns of high-frequency words are established early in a student's education.
References
The thesis hosted at U.Chula document full text
The thesis (website), as far as I can tell is in the public domain.