Adaptation of the Token Test in Standard Indonesian

Aphasia is a language disorder caused by focal brain injury. The Token Test is a tool to detect aphasic symptoms and measure aphasic severity in individuals who suffer brain damage causing language impairment. While Indonesia has a diagnostic test battery for aphasia (TADIR), it has yet to be able to quantify aphasic severity. In this study, we tested 49 individuals: 26 healthy adults, 7 non-aphasic post-stroke individuals, and 16 aphasic individuals. A series of tests were administered: the TADIR, Token Test, and the Verb and Sentence Test. The Token Test was sensitive enough to distinguish between the three groups and was also correlated with all other language tests including the TADIR.


Introduction
Aphasia is a language impairment caused by a focal brain damage (most commonly from stroke) that affects single or multiple channels of language, including the comprehension and production of language, as well as reading and writing (National Aphasia Association, 2017). Language deficiencies caused by aphasia depend on the area and extent of damage (Ibanescu & Pescariu, 2010). One third of stroke patients suffer from aphasia (Brady, Kelly, Godwin, Enderby, & Campbell, 2016;Croquelois & Bogousslavsky, 2011). Since the disabling language problems in aphasia have significant impacts on the patient's quality of life, communicative and social functions as well as adding to the costs of stroke care, aphasia is essential to be recognized as soon as possible among stroke patients (Hachioui, Visch-Brink, Lau, Sandt-Koenderman, Nouwens, Koudstaal, & Dippel, 2017;Papanathanasiou, Coppens, & Davidson, 2017).
Stroke is considered a global burden due to the fact that it is a major cause of death and disability. In Asia, stroke incidents account for more than two-thirds of global incidence of stroke (Suwanwela, Poungvarin, & the Asian Stroke Advisory Panel, 2016). According to the Ministry of health data, In Indonesia, stroke is also considered as the leading cause of death, with stroke prevalence of 12.1/1000 recorded in 2013 (Pusdatin Kemenkes RI, 2014). Thus, it can be assumed that 4 out of 1000 Indonesians are at risk of aphasia. Unfortunately, neurorehabilitation in Indonesia does not always include speech therapy. This is true especially in smaller hospitals or more remote areas where stroke is treated only with the primary care of physicians. Oftentimes, stroke patients may receive neither aphasia assessment nor intervention.
Aphasia can be assessed using a series of tests; one of the globally used tests to assess aphasia is the Token Test. The Token Test (De Renzi & Vignolo, 1962) consists of a series of commands that progresses in complexity and length. Participants are requested to identify and interact with tokens of various shapes, colors, and sizes. The ease of usage, quantifiable scores, and sensitivity towards milder forms of aphasia of the Token Test make it a widely used tool to diagnose comprehension impairments (Boller & Dennis, 1979). The Token Test is used in standardized aphasia batteries like the Aachen Aphasia Test (AAT; Huber, Poeck, & Wilmes, 1983) and has been translated to 40 different languages .
The Token Test can assess language comprehension in a relatively isolated manner from the influence of visualspatial factors, general cognitive abilities, nonverbal memory capacity, and sociolinguistic context (De Renzi & Faglioni, 1978). As Whitaker & Whitaker (1979, in Boller & Dennis, 1979 states, the Token Test "avoids all unusual syntactic constructions, rare words, and linguistic redundancies all of which contribute to its credibility and usefulness as an instrument for assessing language impairments following brain damage." To go back towards the previous point, most of the words in the Token Test are frequently used and are therefore less prone to individual differences in vocabulary levels. With the exception of two words, all the words in the English Token Test are amongst the 1000 most frequently used words of the language (Whitaker & Whitaker, 1979). Thus, the performance of participants in the Token Test should be accounted by factors other than lexical frequency.
In analyzing the results of the Token Test, one has to identify the basis of aphasic deficits underlying the poor performance in the task. While Lesser (1976) argued that attentional aspects and the sequencing span affected the performance of the speakers with aphasia in the Token Test, other studies have shown that there is a minimal contribution from these factors in the scores (Kreindler, Gheorghita, & Voinescu, 1971). The type of aphasia seems to have no effect on the overall Token Test scores (Mack & Boller, 1979), but the Token Test has been observed to be correlated with the severity of comprehension deficits (Kreindler et al., 1971). The Token Test is also able to distinguish between stroke patients with aphasia, non-aphasic stroke patients (for example, right-hemisphere stroke on right-handed patients), and healthy participants (Swisher & Sarno, 1969;Spellacy & Spreen, 1969). The Token Test is also utilized for measuring language comprehension on children, usually among 4 to 15 year-old children. When applied to children, it shows significant differences across age groups as primary school students group had better scores than preschool groups (Gallardo, Guàrdia, Villaseñor, & McNeil, 2011). Additionally, the Token Test manipulates linguistic variables systematically. The first four sections of the Token Test are very similar syntactically. The level of complexity is varied by the adjectival content of the object noun phrases in each section. Additionally, the object noun phrases are compounded (referring to both the shape and the colour of the token) in the second and fourth sections. The fifth part of the Token Test has the most elaborate syntactic structures where aside from the imperative sentence, which is present in every section, subordinate clauses, adverbs, and locative prepositional phrases are present. An example from the fifth section is "Pick up the squares, except the yellow one." All the verbs in the Token Test are transitive. Most, with the exception of show, involve the manipulation of objects when put in a semantic class of verbs. To close, while the Token Test avoids redundancy, the increasing complexity of the test is attributed to the adjectival content rather than syntax with the exception of the fifth section. Additionally, the usage of relatively frequent and simple nouns and verbs minimizes the risk of individual differences in vocabulary or concepts.
The TADIR (Tes Afasia untuk Diagnosis, Informasi, dan Rehabilitasi or Aphasia Test for Diagnosis, Information, and Rehabilitation) is the first aphasia test battery in Standard Indonesian. In general, the TADIR has four aims that are fulfilled by combinations of the subtests (Dharmaperwira-Prins, 1996); (1) To diagnose individuals with or without aphasia, (2) to diagnose which aphasia syndrome is being experienced, (3) to provide information to patients, their environment, and other individuals or instances, and (4) to provide a basis for therapy and rehabilitation. The tasks used for (1) are object naming and verbal fluency (to say as many words of a category such as 'animal' in one minute). The subtests used for (2) are speech rate from the individual's spontaneous speech (elicited by a set of questions), auditory comprehension with picture pointing, and word and sentence repetition. All the subtests are used for purpose (3) and (4). This includes auditory comprehension at the sentence and word level, word and sentence repetition, reading comprehension, writing to dictation, writing (filling-in own personal information), speech rate, and picture naming (objects and more complex pictures for sentences). The duration for administering the TADIR is set to be one hour, and the manual recommends the testing to be split into two separate sessions of thirty minutes. All the individuals with aphasia in this study are tested with the TADIR, though only using the subtests for purpose (2).
There are several reasons for the adaptation of the Token Test in Standard Indonesian (SI). First and foremost, the Token Test can be used to assess aphasic severity that will be useful when analyzing other aphasia test scores. Secondly, the Token Test can serve as a complement to the aphasia diagnosis provided by the TADIR, as the TADIR does not provide a readily quantifiable measure of diagnosis (aphasic/nonaphasic). Finally, the adaptation of the Token Test and scores of both aphasic and NBD (non-brain-damaged) individuals in this study can be utilized in future studies involving aphasic SI speakers.
Aside from its uses in the present study, the adaptation of the Token Test can contribute further in both research and clinical contexts of Indonesian aphasiology. One such instance where this present norm data was used, was in a study of sentence comprehension in Broca's aphasic speakers in Indonesian (Jap, Martinez-Ferreiro, & Bastiaanse, 2016). After attaining norms on healthy participants and individuals with aphasia, the Token Test in Indonesian can be used, if needed, to compare scores crosslinguistically with many standardized aphasia batteries like the Aachen Aphasia Test (original Dutch version by Graetz, De Bleser, & Willmes, 1992). Moreover, it can be used to help distinguish between individuals with and without aphasia. What the Token Test has that the current aphasia battery in Indonesia (TADIR) does not is that the Token Test can be administered to a relatively wider population of individuals with aphasia and detect subtler forms of aphasia. This is true particularly because the Token Test does not require any speech production, which highlights its advantage when used on individuals with verbal apraxia (motor speech disorder that disrupts language production), a disorder that commonly cooccurs with aphasia. Additionally, another advantage of the Token Test is that the scores can be compared quantitatively as a measure of aphasic severity, which the TADIR currently lacks. The main aim of the study is to provide a preliminary case of usage for the Token Test in Indonesia and contribute towards generating a pool of sample which eventually could be large and significant enough to be used as norm data. It is also conducted to indicate some semblance of validation of the Token Test as a tool to detect language impairments causing comprehension problems.

Methods
A total of 49 individuals participated in this study. The group consisted of 16 individuals with aphasia, 7 post-stroke individuals without aphasia, and 26 non-brain-damaged Standard Indonesian speakers. Aphasic participants were recruited from 6 nursing homes in several cities of Central Java Province, Indonesia (Surakarta, Brebes, Semarang, and Yogyakarta). The stroke participants were selected in consultation with the clinical staff at the nursing homes. They generally live at the immobility/isolation wards or with the other residents. The criteria for the stroke participants was to have vision sufficient enough to look at pictures, hearing sufficient enough to listen and comprehend sentences, and also able to somewhat communicate or produce words. Aphasic participants' demographic profiles were partially taken from the caretaker of the nursing home. When relevant, this information was completed by means of individual interviews. The NBD/healthy group was comprised of university students and staff from Jakarta who are at different age group compared to the post-stroke group participants. While the age difference would influence the results somewhat, we used the available age-adjustment procedures from the original Aachen Aphasia Test. The three groups were distinguished by using the TADIR and observing medical records from the clinical staff on the site. The NBD group had never had a stroke or any other neurological diseases, whereas the non-aphasic stroke patients had experienced stroke but were identified as nonaphasic by TADIR, while the aphasic patients had experienced stroke and were identified aphasic by TADIR.
Aside from noting individual characteristics such as sensory problems and hemiparesis, written informed consent was acquired from the participants. The demographic details, including the time post-onset of stroke of the individuals with aphasia are given in Table 1.
The Standard Indonesian Token Test was adapted from the Dutch Token Test, which is part of the Dutch Aachen Aphasia Test (Graetz, De Bleser, & Willmes, 1992). There are 50 items in total divided into 5 sections of 10 items each. The difficulty of the sections rises progressively. The task begins with being asked to point at one shape without specifying size (out of 10 objects), point at one shape of a certain size and color (out of 20 objects), point at two shapes without specifying size and color (out of 10 objects), point at two shapes of a certain size and color (out of 20 objects), and finally manipulation of the tokens (moving, touching, and taking, out of 10 objects). While the adaptation of the token test attempts to remain as close as possible to the original, linguistic differences in translation can give rise to differences in stimuli. First, due to the fact that Indonesian is a multisyllabic language with longer word forms, a direct translation of rectangle (1 word; 3 syllables) to persegi panjang (2 words; 5 syllables) may not be a viable equivalent due to substantial length differences that would accumulate to make the auditory stimulus even longer in subtests 3, 4, and 5. A solution was just to call the shape a four-sided figure persegi (3 syllables). While this is not the most optimal of translations, as it could refer to other shapes such as a square, none of the participants had problems using this term to refer to the rectangle. Secondly, regional differences greatly affect the adaptation of the stimuli at the word level. In Central and East Java, a circle is called bundaran (/bundəran/), while Standard Indonesian usually refer to it as a lingkaran. Bundaran does exist as an alternative to lingkaran in Standard Indonesian, but it is pronounced as /bundaran/. The experimenter always used the regional variation familiar to the participant to ensure consistency across subjects.
Before the Token Test, the participants were asked whether they could see each shape clearly, and whether they could see all ten shapes of the first subtest and their different colours. Afterwards, the following instruction was read aloud:

"Saya akan membaca beberapa kalimat. Tunjuklah kepada keping yang menurut anda sesuai. (subtest 2) Ada keping yang besar dan ada yang kecil. (subtest 3) Saya akan sebut dua sekaligus, anda boleh menunjuk dengan kedua tangan atau satu tangan, dan urutan menunjuknya bebas, bisa sesuai urutan yang saya baca bisa juga yang lainnya (subtest 5) Saya akan membacakan beberapa kalimat, mohon diikuti instruksinya"
"I will read several sentences. Point to the matching token. (subtest 2) There are large tokens and small ones. (subtest 3) I will say two (tokens) at once, you may point with both hands or one hand, and the order does not matter, you can point according to the order in which I say it, or in other orders. (subtest 5) I will read several sentences, please follow the instructions." The test used 5 colored sheets of A4 paper from the Dutch AAT (Graetz et al., 1992). Scoring was done by indicating whether the participant had chosen the matching token. Choosing the correct token provided 1 point while choosing the non-matching picture was not awarded points. Repetition was generally discouraged unless the participant insisted, in which the item would be repeated but would still be marked as incorrect. Selfcorrections were mostly allowed unless done repeatedly and could be seen as a form of guessing.

Results
The total scores were out of 50. All participants were right-handed, with education (in years) ranging from 6 to 18 years. All aphasic participants were categorized as 'chronic' with a minimum of 6 months post the onset of stroke.
The mean of the NBD group was 49.19 (SD=0.90) with a range of 47-50. The mean of the non-aphasic stroke sufferers was 38.14 (SD=4.95) with a range of 32-47 while the mean of the aphasic group was 25 (SD=6.69) with a range of 9-34. A cut-off point of below 35 may be established to distinguish individuals with aphasia to those without aphasia. One non-aphasic stroke sufferer (number 31) scored 32 on the Token Test, but the participant was relatively old at 86, and there are no norms for age adjustment in SI yet. Compared to the two non-aphasic groups, the aphasic group has higher variance in terms of Token Test scores, which suggests that aphasic severity can be measured through this task. A one-way between subjects ANOVA was conducted to compare Token Test scores of the three groups: poststroke aphasic group, post-stroke non-aphasic group, and non-brain-damaged group. There was a significant difference at the p<.01 level for the three groups [F(2, 46) = 159.49, p = .00]. Post hoc comparisons with the Tukey HSD show that the scores of the NBD group (M=49.19, SD=0.90) is significantly higher than both the post-stroke non-aphasic group (M=38.14, SD= 4.95) and post-stroke aphasic group (M=25, SD=6.69). The post-stroke non-aphasic group scores significantly higher than the post-stroke aphasic group (p=.001). A bivariate correlation was conducted to see how the Token Test scores relate to the raw comprehension score of TADIR, and two other tests adapted from the Verb and Sentence Test (VAST, Bastiaanse, Edwards, Maas, & Rispens, 2003): the verb comprehension and the sentence comprehension tests. Both tests are sentence/word-picture matching tasks where participants have to point to the correct action or picture that matches the verb or sentence. The correlation table can be seen below.
The TADIR comprehension raw score is significantly correlated with the token test score (r(16)=.534, p=.033). Sentence comprehension is also significantly correlated with the token test (r(16)=.586, p=.022). Additionally, the verb comprehension score is significantly correlated (r(10)=.646, p=.044) with the token test having the highest correlation coefficient. Token Test is the total score for the Token Test. TADIR is the raw score of the comprehension subtest of TADIR (range= 1-7). Verb C is the total score for the verb comprehension test (range=1-48). Sentence C is the total score for the sentence comprehension test (range= 1-40).

Discussion
There were 3 groups (NBD and stroke patients with or without aphasia) with 49 participants in total. The Token Test performance of the 3 groups were significantly different from one another. The NBD group scored at ceiling, the non-aphasic stroke patients scored slightly lower, and the aphasic stroke patients performed the poorest of the three groups. Thus, the adapted Token Test can be used to identify individuals with aphasia even among stroke patients, and healthy individuals can also complete it with a high accuracy score.
The adapted Token Test is significantly correlated with the raw score of the TADIR comprehension section (tests for auditory comprehension at the word and sentence level), which is a crucial finding as the Token Test in the present study is newly adapted and had yet to be compared with a standardized test battery like the TADIR. Additionally, the scores of the Token Test were significantly correlated with the sentence comprehension results and the verb comprehension test. These results suggest that the adaptation could be used for further studies involving aphasic samples.
There are, however, several limitations of the Token Test in the present study, potential implementation, and use in Indonesia. First of all, the differences due to age are not fully adjusted to the Indonesian context. In the present study, we used the default age-adjustment values from the original Aachen Aphasia Test (Orgass 1976), which is used in clinical settings presently. This is, most definitely, not optimal as the context of health, well-being, and education of individuals of varying ages would be distinct not only between the countries involved (i.e. Indonesia and Germany), but also other elements such as socio-economic status. The sensitivity of the tool would be greatly improved if there was norm data on whichever target population of the Token Test. In this light, the data the current study provides may also be used for future studies to improve the adaptation and subsequently establish age-adjustment values for Indonesia.
The second limitation is in regard to whether the applicability of this assessment tool is widespread in its potential implementation in Indonesia. While the adaptation is designed to test speakers of Standard Indonesian, because the materials were designed to be as linguistically simple as possible, the participants require only a moderate level of proficiency to sufficiently understand the tasks. This is a serious consideration in any adaptation of cognitive assessment tools because the majority of the demographics in Indonesia speak SI as a second language, learning it in formal settings such as schools as well as receiving exposure of it from the media and government. There are approximately 23 million 'native' speakers of SI and 140 million L2 speakers (Lewis, Simons, & Fennig, 2013). Another concern is that some regions utilize a non-standard version of Indonesian. In this case, the materials can be further adapted until it suits the most commonly used set of lexicon in the region. There should be minimal differences between the regions if the tasks are sufficiently adapted to mitigate unfamiliarity with the materials. However, further research on the implementation of the Token Test in different regions with different variants of Indonesian would be a required supplement to this adaptation.
There are only few studies on Indonesian aphasic speakers in general and only some of those used the Token Test in their studies (e.g. Anjarningsih et al., 2012;Jap et al., 2016). As such, future research with the tool could not only improve its sensitivity via norm establishment, but also test its validity in the context of other language and cognitive tasks.

Conclusion
The significant correlations of the Token Test with the other three tests, namely the TADIR, the verb comprehension test, and sentence comprehension test, can be used to signify severity. When individuals perform poorly in the Token Test, they also score poorly in the other tests, despite the fact that the Token Test has a substantially different linguistic manipulation compared to them. This would also suggest that the Token Test measures certain linguistic processes common to the other comprehension tests and the TADIR. Future studies can add to the bulk of samples to create a more reliable approximation of Token Test norms for aphasic as well as non-aphasic subjects. All in all, the adaptation of the Indonesian Token Test aids in the assessment and diagnosis of aphasic symptoms in Indonesia, in particular, distinguishing aphasic to nonaphasic individuals, and providing information regarding aphasic severity.