Skip to content

Thai Language Textbook Vocabulary Development: A Corpus Study part 2

This note follows the review, and goes into more details about the grade levels.

Part Two: Review of the Central Goal

The Central Goal of the thesis by Thanthong Chaempaiboon (2016), titled "Development of high-frequency words in Thai language textbooks: A corpus linguistics study", is explicitly defined by its aims and objectives: to study the development of high-frequency words in Thai language textbooks by comparing vocabulary expansion, including the expansion of meaning and function, across two main dimensions.

This Central Goal is situated within a larger Research Overview that acknowledges the critical need for a standardised Thai vocabulary list suitable for teaching. The study functions as a continuation of prior research on basic Thai vocabulary carried out by the Ministry of Education, but it distinguishes itself by using written language data from textbooks, rather than spoken language, and covers all levels of basic education.

Dual Dimensions of Comparison (The Core Goal)

The research employs a corpus linguistics approach to achieve its objectives. The methodology involves analysing the Thai Textbook Corpus (TTC), which was created for this research and comprises 3,037,772 words (tokens) or 19,494 word forms (types).

The comparison is conducted along two primary dimensions:

  1. Comparison between Grade Levels (ช่วงชั้น): The vocabulary is analysed across four educational levels: lower primary (Level 1), upper primary (Level 2), lower secondary (Level 3), and upper secondary (Level 4).
  2. Comparison between Curricula (หลักสูตร): The vocabulary is also analysed across four fundamental education curricula: 1960, 1978, 1990, and 2001.

Specific Research Objectives

To fulfil the Central Goal of studying vocabulary development, the research sets out specific objectives, which focus on both quantitative and qualitative aspects of the vocabulary lists:

  • Similarities, Differences, and Expansion of the Vocabulary List: This objective aims to quantify and compare the words found across the various levels and curricula.
  • Expansion of Meaning and Function of Words: This objective involves an in-depth analysis of high-frequency words (expected to be around 90–100 words, representing the first 50% of the cumulative frequency) to see how their meanings and grammatical functions change or expand across different contexts (levels/curricula).

Research Hypotheses (Guiding Assumptions)

The Central Goal is tested against several specific hypotheses:

Regarding Grade Levels:

  • It was hypothesised that higher levels of education would contain a greater and more diverse number of words than lower levels.
  • It was hypothesised that the vocabulary in higher levels would cover all the words found in lower levels.
  • It was hypothesised that the meanings and functions of words would increase as the grade level increased.

Regarding Curricula:

  • It was hypothesised that curricula would share more than 90% of their vocabulary.
  • It was hypothesised that the meaning and function of high-frequency words would be the same across all curricula.

Key Findings Relating to the Central Goal

The findings reveal that while the size of the corpus increased with higher grade levels, supporting one part of the objective, vocabulary expansion did not occur uniformly.

  • The number of word types did not consistently increase across higher levels (L3 and L4 had similar counts, slightly lower than L2), and vocabulary in higher levels did not cover all words found in lower levels.
  • The expansion in vocabulary was most significant between lower primary (L1) and upper primary (L2).
  • Crucially, regarding the expansion of meaning and function (a core objective), the study found that high-frequency words had similar meanings and functions across all grade levels and curricula. This suggests that the basic meanings and functions of these words are taught from the earliest education level.

The analysis of the Central Goal thus shows that the research aimed to systematically map the evolution of Thai vocabulary in textbooks, providing an objective corpus-based foundation for assessing which words are appropriate for instruction at various stages.


Analogy: The Central Goal of this research is like charting the growth of a student's linguistic toolbox (vocabulary) over time. Instead of just guessing what tools (words) should be added or how complex they should be, the researcher uses a massive collection of blueprints (textbooks) from different periods and grades to empirically determine which tools were actually present (vocabulary scope) and how those foundational tools were consistently used (meaning and function), revealing where the biggest leaps in complexity occurred in the curriculum design.

Details: Thai Textbook Corpus Educational Segments

The author of the study defines the levels L1 through L4 (referred to as TTC-L1 to TTC-L4 within the corpus) as four educational period segments used for analysing high-frequency words found in the Thai Textbook Corpus (TTC).
The corpus itself draws data from Thai language textbooks and supplementary reading materials covering all educational levels, from the beginner level up to Mathayom Suksa 6 (M.6), across four different curricula.
Here is a breakdown of these levels and their corresponding educational audiences:

Level Thai Name Definition / Educational Segment Audience
L1 ช่วงชั้นที่ 1 (TTC-L1) Primary School Lower Section (ชั้นประถมศึกษาตอนต้น) Targets students in the lower primary grades.
L2 ช่วงชั้นที่ 2 (TTC-L2) Primary School Upper Section (ประถมศึกษาตอนปลาย) Targets students in the upper primary grades.
L3 ช่วงชั้นที่ 3 (TTC-L3) Secondary School Lower Section (มัธยมศึกษาตอนต้น) Targets students in the lower secondary grades.
L4 ช่วงชั้นที่ 4 (TTC-L4) Secondary School Upper Section (มัธยมศึกษาตอนปลาย) Targets students in the upper secondary grades (up to Mathayom Suksa 6).

The purpose of creating these segmented word lists (L1–L4) was to analyse and compare the number and diversity of words and meanings across these different educational stages. The hypothesis related to these levels was that higher levels would feature a greater and more diverse number of words, with an increase in the number of word meanings. The subsequent analysis found that the types and tokens of words did increase by the levels of education, though the vocabulary in higher levels did not necessarily cover all the words found in lower levels.

References

The thesis hosted at U.Chula: document full text
The thesis (website), as far as I can tell is in the public domain.