The present study was conducted to improve and secure the reliability of the performance assessment and grading of the Korean speaking test. To develop the speaking portion of the 'Test of Proficiency in Korean', this core research currently looks to increase the validity and reliability of the speaking assessment. This study explored the effectiveness of training sessions with raters. To uncover the affective factors of grading, the raters were divided into three distinct groups: native-experienced, native non-experienced, and non native raters. All of the participants were graduate students majoring in Korean language education. The traing sessions preceded in three stages. After the first session, all raters identified their problems, and continued with their grading. The results were analyzed with a multi-dimentional rasch model using the FACETS program. The results showed that experienced raters scored assessments using relatively strict grading criteria. The three groups showed different tendencies on scoring items, constructs, and measurement use. However, experience or nativeness did not affect the internal consistency of the individual rater. Their internal consistency of scoring was influenced by the process of setting up their criterion or by the eventual fatigue with grading. Future studies should examine experienced Korean teachers in different types of traing sessions. Despite the limitations, this study provides significance in this context as the first paper on the training of the Korean speaking performance test.
(Yonsei University, Ewha Womans University)