A Training-Model Scale’s Validity and Reliability Coefficients: Expert Evaluation in Indonesian Professional Psychology Programs

Very little information has been available on training models in professional psychology programs in Indonesia, despite the Indonesian National Accreditation Body recommending that scientist-practitioner models be applied in the education of psychologists. By contrast, research abounds on such training models in Western countries. This discrepancy raises the importance of developing a measurement tool appropriate for assessing training models in Indonesian professional psychology programs. This article describes the process of testing the validity and reliability of such a training model measuring tool in the Indonesian context. The authors used the expert evaluation method and the Aiken formula to calculate a coefficient of content validity and item’s internal consistency reliability. This process formed a training model scale comprising 77 items with satisfactory validity and reliability indexes for measuring Indonesian professional psychology program training models. Pengujian Validitas Isi dan Reliabilitas Alat Ukur Model Pendidikan pada Program Profesi Psikologi di Indonesia: Penggunaan Metode Evaluasi oleh Panel Ahli


Introduction
In a professional psychology program the training model is an important educational component for aspiring psy-chologists playing a significant role in determining educational direction (Horn et al., 2007) by guiding formation of program objectives and determining learning experiences to achieve them. The training model colors Academic discussion on training models in professional psychology education dates from 1948, at which time American scientists and practitioners in psychology formulated a training model to provide standards for implementation of education for professional psychologists (Baker & Benjamin, 2000;Cautin & Baker, 2014). The first such training model-the scientist-practitioner modelwas developed in that era, and many professional psychology schools worldwide have adopted it (Baker & Benjamin, 2000;Belar & Perry, 1992;Bell & Hausman, 2014;Horn et al., 2007). The scientist-practitioner model balances emphasis on practice and science/research components in educating future psychologists, endorsement of educative integration activities that connect science and practice (Belar & Perry, 1992).
Dissatisfaction with the scientist-practitioner training model led to formation of others, the most common being the practitioner model, which emerged in 1973 (Korman, 1974). Its derivatives include the practitionerscholar (Bell & Hausman, 2014;Ellis, 1992;Rodolfa, Kaslow, Stewart, Keilin, & Baker, 2005), the local-clinical scientist models (Stricker, 1997;Stricker & Trierweiler, 1995), and the Clinical-Science model created in the 1990s (McFall, 1991(McFall, , 2006). The practitioner model emphasizes the practice component (Korman, 1974). Deriving from the practitioner model, the practitioner-scholar model gives greater attention to the science/research aspect than its prototype, while still emphasizing the practice component and practical application of scientific knowledge over science/research (Ellis, 1992;. The local-clinical scientist model, in addition to emphasizing psychological practice, places greater importance on delivering psychological services according to clients' specific needs (Bell & Hausman, 2014). By contrast, the Clinical-Science model emphasizes science and research components over practice (McFall, 2006).
At the end of the 1980s, training programs placed importance on considering graduates' competency in conducting psychological practice, rather than merely emphasizing learning materials and students' practice hours during their professional education. The training models above focused more on which learning content should be emphasized in professional programs (McEvoy et al., 2005), thus relating to the classical debate over whether practice or science/research should be deemed more important in educating future psychologists. Thus, these previous training models are classified as contentbased models. Recent discussion has shifted to the set of competencies graduates must attain on completing a professional program. In turn, target competencies are now the main driver determining necessary learning experiences in professional education. The determination of target competencies and indicators of attainment of those competencies, along with competency measuring tools, are central to competency-based models in professional psychology education. The new competency culture continues to grow and develop to the point that some scholars (e.g., Rodolfa et al., 2014; and professional organizations (British Psychological Society, 2015;Canadian Psychological Association, 2001, 2011National Council of Schools and Programs of Professional Psychology, 2014) have succeeded in formulating a set of target competencies accompanied by behavioral indicators for each competency, and supplemented by formulation of measuring tools and methods to determine each learner's level of competence (Fouad et al., 2009). For a more complete discussion on training models in professional psychology education and their respective characteristics, please refer to Ningdyah, Greenwood, Kidd, Helmes, and Thompson (2016).
In contrast to Western countries, professional psychology program providers and educators in Indonesia are relatively unfamiliar with the notion of training models, information on which is only rarely available. Of the 19 professional psychology programs in Indonesia, only one explicitly mentions it's training model (Universitas Surabaya, 2015). General information about learning content is available in other programs, but the specific training model applied in these programs is not available to the public, either as of brochures or on official websites. Statements on specific training models are useful in providing an overview not only of the learning content a program provides and emphasizes in a program, but also on the nature of internship as an important component in professional psychology education (Sullivan & Conoley, 2001). In its current guidelines on professional psychology programs' accreditation mechanism, the Indonesian National Accreditation Board (BAN-PT, 2013a) as the only institution implementing accreditation for these programs, has stated that the scientist-practitioner model should be used in Indonesian professional psychology programs. Accreditation instruments designed and used by BAN-PT (2013b) were also structured along the lines of the scientist-practitioner model.
Besides lack of information on training models in Indonesian programs, research profiling Indonesian professional psychology programs is scarce, if not absent. Such research is abundant in Western countries, including on professional programs' basic profiles (e.g., Pachana, O'Donovan, & Helmes, 2006), characteristics of graduates (e.g., Cherry et al., 2000), and students' views of their experience during professional education (e.g., Merlo, Collins, & Bernstein, 2008). Expert evaluation of a training models instrument as discussed in this article is part of a comprehensive study that attempts to overview the basic profile and curricula characteristics of Indonesian professional psychology programs. July 2018 ½Vol. 22 ½ No. 1 In researching training models and other related elements of professional psychology programs, previous researchers (e.g., Merlo et al., 2008;Nixon, 1994;Pachana et al., 2006; have mostly used crosssectional surveys with questionnaires designed specifically to answer research questions. In those studies, questions to detect training models applied in target programs were formulated as closed-ended or mixed-question format, thus providing respondents with multiple choice options and an opportunity to add responses other than those already provided (see for example, . In the Indonesian context, this questioning technique may not be appropriate due to the prevailing lack of familiarity with the training-model concept. Some fear that study results might be hampered by respondents being unable to identify accurately the type of training model actually used in their program. Leong and Zachar (1991) used another item-generation approach in their study, which constructed items based on characteristics of a specific training model. This approach was deemed more appropriate for this research, since it overcomes the weakness of direct questioning techniques.
Results of a literature review on the main training models in professional psychology education (Ningdyah et al., 2016) formed the basis of item development in the current investigation's training-model scale. Items in the trainingmodel questionnaire were classified into six groups corresponding with the six dominant training models identified in the literature: 1) scientist-practitioner; 2) practitioner; 3) practitioner-scholar; 4) local-clinical scientist; 5) clinicalscience; and, 6) competency based. An evaluation to assess the validity and reliability of the training-model scale that is this article's content was conducted to test the scale's effectiveness in the Indonesian context. A content validity test was applied in evaluating the training-model scale, to assess how well the measuring instrument represents the relevant content areas (Haynes, Richard, & Kubany, 1995;Rosnow & Rosenthal, 2002) and to ensure that material not relevant to measurement purposes was not included (Azwar, 2012). Reliability testing of the training model scale drawing on the concept of internal-consistency reliability, was intended to ascertain that items measuring the same general construct actually produce similar scores. Moreover, the measuring tool's face validity was assessed to ensure that the training-model scale's format was such that respondents would be motivated to participate (Rosnow & Rosenthal, 2002) and that words, sentences, and terms were appropriate for the Indonesian context.
Availability of a valid reliable training-model scale adapted specifically for use in Indonesia is indispensable, considering the absence of such an instrument and the lack of relevant studies in professional psychology education there. The training-model scale developed from this study's results is assumed beneficial for identifying characteristics of training model(s) applied in Indonesian professional psychology programs. Description of a program's training-model(s) profile obtained from the scale's use can serve as input for development of educational processes in a particular professional program. This applies especially to implementation of science and practice integration, an important requirement for application of the scientist-practitioner model-the model of professional psychology education that the Indonesian government requires.

Methods
The expert evaluation method was used to determine the training-model items' content validity and internalconsistency reliability. Experts in professional psychology education in Indonesia were invited to assess each item's degree of relevance to the training-model component it was intended to represent.
Two popular methods can be used to calculate content validity based on expert judgment. Lawshe (1975) first proposed the content-validity ratio for quantifying the rating of experts' judgment on items. This content-validity ratio (CVR) yields a value from −1.00 to +1.00. Lawshe provides a table of significance that contains critical CVR values to determine the degree of content validity of the obtained CVR in accordance with the number of experts involved and with a degree of significance at the 0.05 and 0.01 levels.
Another approach to calculating items content-validity value based on expert evaluation was proposed by Aiken (Aiken, 1980(Aiken, , 1985. Similar to Lawshe's, Aiken's contentvalidity coefficient is calculated based on experts assessment of item's relevance in measuring the intended construct, with rating categories arranged in a Likertscale format.
The Lawshe (1975) formula tends to be difficult to implement because it requires a large number of experts to be involved in order for an item to be deemed significant at an acceptable CVR value. The fewer the appraisers, the greater the CVR required. As an illustration, in the table of significance provided by Lawshe (1975, p. 568), if there are only seven experts, an item requires a CVR of 0.99 to be significant and have adequate content validity at a significance level of 0.05. Employing a large number of experts to ensure the critical value demanded is not too high is unrealistic in this study, since the availability of experts in professional psychology education is limited. Azwar (2012) proposes interpretation of the CVR value within its relative range, that spanning from −1 to +1. Items with negative values are considered to have very low content validity, so they need to be removed from the measuring instrument, while items with positive CVR value are considered to have content validity at a certain level. However, such interpretation July 2018 ½Vol. 22 ½ No. 1 is susceptible to subjectivity and thus risks generating subjective results and lowering interpretation standards consistency (Yu, 1993, in Yang, 2011. Accordingly, the content validity formula proposed by (Aiken, 1980(Aiken, , 1985 was used in this study. To assess expert degree's of consistency in evaluating items, Aiken's (1985) statistical formula, the homogeneity coefficient, was also used in this study. The homogeneity coefficient (H) serves as an internal-consistency reliability coefficient for rating data (Aiken, 1985).
Participants and procedure. Approval from the university's human research ethics committee was obtained before data collection commenced. Experts were recommended by the Indonesian Psychological Association (HIMPSI), the sole professional psychology organization in Indonesia involved in of accreditation of professional programs. Selection of experts was based on the following criteria: 1) they were HIMPSI members who have been actively involved in preparation of accreditation instruments for professional psychology programs; or, 2) they were HIMPSI members who have been involved in accreditation of professional programs with the national accreditation body; or, 3) they were academics with current or past involvement in management of professional psychology programs, but not serving as program directors at the time of data collection. HIMPSI recommended eight experts. Via email, the researcher invited each expert to participate in the training-model evaluation study, sending details of research objectives and statements about the importance of their participation in developing a measuring tool to identify training models in Indonesian professional psychology programs. Six experts responded to this invitation, expressed their willingness to participate in the research, and signed informed consent forms. They were then emailed the questionnaire. By the end of the data collection period, five questionnaires had been returned and all were considered valid for further analysis.

Measures.
A specific questionnaire was developed for this study, the Expert Evaluation Form for the Training-Models Scale. This measuring tool consists of six item clusters arranged according to the six previously specified types of training models identified in the literature review (Ningdyah et al., 2016), abbreviated as follows: 1) the scientist practitioner model (SP); 2) the practitioner model (P); 3) the practitioner-scholar model (PS); 4) the localclinical scientist (LCS) model; 5) the Clinical-Science model (CS); and, 6) the competency-based model (CB). To assess items' content validity, respondents were asked to determine the extent to which each item was relevant to the training model it represented, with ratings from 1 (completely irrelevant) to 5 (extremely relevant). In addition, experts were invited to write specific comments regarding items or the measuring tool, to improve item quality in particular and the measuring tool as a whole, in yielding responses. Data analysis. Aiken's (1980Aiken's ( , 1985 content validity index (V coefficient) was calculated for all items on each training model component by applying the following formula: (Aiken, 1985, p. 133) where S represents the sum of the absolute values of the difference of each rating by the appraiser, with n the number of raters and c the number of rating categories.
V coefficient ranges from 0 to 1: The greater the V, the higher an item's content validity. Aiken (1985) provides a table of significance for determining the value of V which can be considered significant closest to .05 and .01 for a given number of raters and of rating categories.
To determine the extent to which experts agreed regarding an item's relevance, Aiken's homogeneity coefficient (H) for each item was calculated with the following formula (1985): , 1985, p. 140) where S represents the sum of the absolute values of the difference of each rating by the appraiser, n is the number of raters, j = 0 if n is an even number and j = 1 when n is an odd number.
The Aiken's H coefficient presented above quantifies the degree of expert's consistency of in assessing an item. The H value ranges from 0 to 1. Similar to the value of V, Aiken also provides a table of statistical significance to determine the critical value of the H coefficient that is considered significant for a given number of rating categories and of raters, at a significance level closest to 0.05 and 0.01. Table 1 in appendix presents a summary of statistical calculations of experts' evaluation results including the mean, standard deviation, the Aiken's Content-Validity Coefficients (V), and the Homogeneity Coefficients (H).

Results
The value of V coefficients in the SP component range from 0.65 to 1.00. In the P component, coefficient Vs range from 0.85 to 1.00. In the PS group, V coefficients range from 0.95 to 1.00. V coefficients in the LCS group range from 0.90 to 1.00. For the CS component, V coefficients were in the range of 0.85 to 1.00. Lastly, V coefficients for the CB models component range from 0.75 to 1.00.
Checking the Aiken's significance table (1985, p. 134) for critical V value shows that for five experts and a five-category evaluation rating, the validity coefficient (V value) must be greater than 0.80 to have a sufficient Makara Hubs-Asia.
July 2018  Checking against Aiken's significance table shows that for five experts and a five category evaluation rating, the homogeneity coefficient (H value) of an item must be greater than 0.75 to be deemed significant (p = 0.05).
From the 195 items, 28 had lower H values than the critical H demanded (these 28 items included the five items with low content validity value mentioned above). All items with these low V and H coefficients were eliminated from the training-model scale.
Experts provided useful comments suggesting how the measuring instruments could be improved. Their comments on particular item(s) or the measuring instrument as a whole are presented in Table 2 (appendix).
Experts' specific, item-related comments to suggest using more appropriate words and expressions within the Indonesian context. Items' wording was improved in direct response to experts' comments and advice. Furthermore, as in Table 2 (appendix), experts' general comments primarily related to the scale's length, repetition of some items, and of some overlapping items, so, experts suggested shortening the scale by removing similar and overlapping items. Subsequent comprehensive analysis included re-examining each item's content and the training-model domain of a number of repetitive items. Accordingly, several items were combined, repetitive items were deleted, and the use of words and sentences was again reviewed to produce easier-tounderstand items. The final result was development of 77 items, with the following psychometric properties: Content-validity coefficients range 0.85 to 1.00; internal reliability coefficients range from 0.75 to 1.00.

Discussion
A measuring tool for training models in professional psychology programs is needed in Indonesia where little, if any, research on these programs has been undertaken. The expert evaluation study discussed in this article was part of a project to develop a valid, reliable training-model scale for the Indonesian context.
The scale's original design included 195 items. Analysis of expert opinion calculations using Aiken's V and H coefficients showed that 28 items had lower V and H coefficients than required, and these were deleted. Typically, deleted items were too general or vague and deemed unsuitable by experts in distinguishing different types of training models. Some were also affirmative items that referred to common characteristics or facts initially thought applicable to Indonesian professional psychology programs. For example, basic psychological practice content is taught in the program's early years (SP component, number 59).
Our study's results demonstrate that use of expert judgment methods, including careful selection of experienced experts, is a very beneficial part of item selection. Due to their experience in professional psychology program accreditation and management, experts could provide valuable input when evaluating items in the training-model scale. The overall judgment process identified items that could have lowered the scale's efficacy in discriminating among training models, so these items were eliminated.
Expert respondents' comments on specific items included suggestions on word selection, so their more precise wording was adopted. Rigorous effort was also directed towards modifying or eliminating items they regarded as repetitive and overlapping. The re-examination procedure also included reviewing each training model's characteristics, particularly classification in content-based models.
Content-based models emphasize programs' educational content and classify models accordingly, in terms of the practice component, the science/research component, or both. A different emphasis on learning content leads to a different training model. As previously mentioned, for example, emphasis on the science/research component is one of the CS model's main characteristics. However, thorough examination of training models' characteristics shows that content-based training models are not mutually exclusive; emphasizing one content component does not necessarily completely eliminate other content components. In the practitioner model, for example, the dominant emphasis on the psychological practice component does not automatically preclude delivery of the science component. Korman (1974), a supporter of this model, states that, despite the fact that the practitioner model emphasizes the practice component and the delivery of psychological service, educational experiences were delivered to students "…without abandoning comprehensive psychological science as the substantive and methodological root of any educational or training enterprise in the field of psychology and without depreciating the value of scientist or scientist-professional training programs for certain specific objectives" (p. 442, original emphasis). The same phenomenon applies in other content-based training models, leading to the new understanding that relative emphasis is key in the classifying content-based training models. Figure 1 illustrates the hypothetical relative position of contentbased models along a continuum, with the practice component at one end and the science/research component at the other.
The number of repeating and overlapping items in the initial training-model scale is, to some degree, attributable to similar characteristics that different training models. For example, the of "Giving a wide range of practical experiences" is a feature of the SP model, but also typifies the practitioner (Korman, 1974) and the practitionerscholar models (Bell & Hausman, 2014). Furthermore, "Teaching staff performing psychological practices" is required for the scientist-practitioner (Belar & Perry, 1992) and practitioner-scholar models (Bell & Hausman, 2014). "The use of scientifically based interventions" is one feature explicitly attributed to the scientist-practitioner (Belar & Perry, 1992) and CS models (McFall, 1991), but it implies use of a scientific approach in psychological practice. All training models in professional psychology advocate this, even if not explicitly stated. Learly, then, multiple training models share some characteristics, possibly leading to repetition of items representing these characteristics. This phenomenon provides evidence that classification of training models in professional psy-

Figure 1. Content-based Models' Relative Position in the Range of Practice and Science Components in Professional Psychology Education
chology education is not mutually exclusive-a thought that previous researchers have expressed (Helmes, 2015, personal communication).
Results from this study, related to the validity and reliability coefficients, as well as qualitative comments provided by expert respondents have provided significant input applicable to improving the training-model scale. The training-model scale's final version consists of 77 items grouped into five clusters based on the main components of training-model classifications. This arrangement is different from the item grouping applied before the instrument testing, which was done on the before the instrument testing, which was done on the basis of the titles of the training models (the Scientist-Practitioner/SP cluster, the Clinical-Science/CS cluster, and so on). The main training model's classification, the basis for item grouping in the new scale, include the following components: 1) practice; 2) science/research; 3) integration of science and practice; 4) local-clinical scientist; and, 5) competency. Items on the revised training-model scale were ratedon a Likert scale with five alternative answers, starting from "Not at all" (1) to "Very high degree" (5), based on respondents' judgment of whether the stated condition applies in his/her professional program. The higher the score on a certain group of items, the higher the program's incidence of characteristics represented by the item cluster.