A Summative Translation Quality Assessment Model for Undergraduate Student Translations : Objectivity Versus Manageability

Various scholars from different schools of thought have proposed criteria and/ or models or translation assessment. Surprisingly, almost none of them are tailor-made for a manageable summative evaluation of student translation. That is why most translation teachers still draw on holistic and traditional methods of translation evaluation in their exams. These methods are either too holistic or too detailed (and complex) for translation evaluation purposes in educational settings. The holistic approaches that verge on subjectivity are quite manageable for a teacher who is to evaluate of a score of students, whereas the detailed and quantitative models, which are highly demanding on the limited resources of a classroom teacher, are considered highly objective. Feeling the need for a model, which is both manageable and objective, this study aims at reaching a compromise between the subjectivity and the complexity of these approaches to translation evaluation. Our proposed model draws on and combines the five linguistic equivalences introduced by Koller (1979) and the five-leveled holistic scheme for translation evaluation proposed by Waddington (2001). DOI: http://dx.doi.org/10.5755/j01.sal.0.26.12421


Introduction
is assumed that the assessment models proposed for the published translations and the quantitative scales developed for assessing professional translator's competence can be readily adopted and applied for pedagogical purposes.However, applying such models, which may contain over a hundred parameters, is not a viable option for a classroom teacher who is supposed to correct a pile of examination papers in a limited time at the end of the term.Without evaluation and assessment students trainees will not know how to enhance the quality of their translations systematically (Honig, 1997) and the teachers will not have reliable a road map to allocate objective scores to their students.If translation trainees do not improve themselves, we may not have high quality literary translations and high profile professionals for whom we can propose translation quality assessment models.Some questions come into mind regarding the adoptability of translation quality assessment models proposed for the literary works and professional translators in the educational settings.Does teacher have enough resources (time and energy) to score student translations objectively using the criteria set for professional translators?Is it fair to expect translation trainees to match senior experienced translators' competence?Can we assess the quality of student translations without considering the source texts (as non-academic people do in evaluating the quality of literary and religious works)?The answer to the above questions cannot be a 'yes'.
In a study carried out by Waddington (2001), a questionnaire was distributed among European and Canadian translation teachers to find out about the assessment methods they employ.It was found that 38.5 % of the respondents still employ a holistic (subjective) method to judge the student translations and to the author's surprise, they base their judgment on the requirements of the professional translators.Thus, there is a need to develop a new practical model to assess the student translations based on the requirements of the translation training programs.Waddington (2001) notes that teachers use three methods to evaluate student translations: error analysis, holistic and a combination of error analysis and holistic judgment.He believes that the sum of errors may not directly reflectthe quality of translation and as Pym (1993) notes the distinction is not between black and white, but between various levels of grey.Teachers still employ the traditional and subjective criteria to assess student translations because the supposedly objective criteria proposed for evaluation is not manageable in educational contexts.
Feeling the need for further research in the area of evaluating translation trainees, Martinez Melis (1997, p.156) outlines various objects, types, functions, aims and means of student translation evaluation: _ Objects: student translator competence, study plans, programs _ Types: product assessment, process assessment, qualitative assessment _ Functions: diagnostic, summative, formative _ Aims: academic, pedagogical, speculative _ Means: translations, evaluation criteria, correcting criteria, grading scales, tests, exercises, questionnaires, etc.
As to the limited scope of this paper, the authors only focus on the product-based summative evaluation of student translations in undergraduate programs.The organization of the paper is as follows.Firstly, the paper scrutinizes the merits and demerits of the various theories and models of translation to see their potential as the basis for a manageable but objective evaluation of student translations.Then, we propose our model that includes a correcting and a grading scale for the summative evaluation of student translations.

Approaches to Translation Evaluation
In order to develop an evaluation scenario a road map is needed.Such a map can be provided by various translation theories.In fact, one of the important objectives (if not the most important) of translation theorization has been the concern for telling high quality translations from the low quality ones.According to House (2009b, p.222) every "translation quality assessment presupposes a theory of translation".Different theories offer different quality yard sticks for translation evaluation purposes.

Pre-systematic views
Early translators use mainly dichotomous criteria such as word-for-word versus sense-for-sense, literal versus free, and faithful versus unfaithful to tell 'good' translations from the 'bad' ones', but can a translation be absolutely good or bad, faithful or unfaithful?Some emphasized the preservation of some abstract constructs such as 'spirit' and 'truth' to assess translations and translators (Kelly 1979, p.205).According to Munday (2008, p.24) some even considered concepts like "the creative energy of a text" as the cornerstone of an adequate translation The early theorists mainly used a metaphorical language to describe and explain translation.Some theorists proposed few parameters for evaluation.For example, Larson (1984) in the meaning-based translation, proposed only three parameters of accuracy, naturalness and faithfulness as the determining factors of a good translation, but she did not operationalize these criteria properly.Munday (2008, p.26) mentions an early theorist like Dryden who expressed his criteria for a translation via a few reductionist rules, but did not follow those criteria even in his own practice.The reason for such a deviation from one's own prescription could be nothing but the subjectivity of proposed rules.

Post-modernist views
Postmodern views on translation strived to challenge the existence of a stable meaning in a text and believed in the relativity of language.As a matter of fact, subjective and relative criteria for 'adequate' translation are legion in postmodern translation theories.For example, Pound (1918Pound ( /2004) and Benjamin (1969Benjamin ( /2004) both advocated highly innovative and experimental translations as 'good'.Benjamin called for highly literal translations in order to let the foreign text 'shine' through in translation.He believed that a 'pure language' would be revealed as a result of direct interaction between the two languages.Stolze (1992) states that a 'good' translation comes into existence, when a translator is able to identity fully with the source text.However, he does not specify any means by which one can objectively measure the amount of translator's identification with the source text.
According to the postmodernists, readers may potentially have infinite interpretations of the same original text.This fuzzy definition of meaning and interpretation of source text, though attractive and thought provoking, is something that defies objectivity asone of the most desirable features of an evaluation scenario in educational context where the students are compared with each others in terms of their scores.Another main drawback of postmodern views is that they do not put the criteria they prescribe into measurable terms and any evaluation scenario arising from them lacks objectivity.

Semi-systematic response-based views
The early systematic views stemmed from the leading linguistic theories like Structuralism and Transformational Generative Grammar.Taking a behaviorist stance, Nida (1964)  There are some key limitations in the response-based views of translation evaluation.Firstly, they are highly target text and target reader oriented; the intrinsic values of the ST are largely disregarded.Secondly, the immeasurability of the ST and TT readers' responses turn equivalent response into an indefinitely vague and subjective criterion.In addition, employing only one single master criterion (equivalent response) to evaluate a complicated phenomenon like translation is a very reductionist, indeed.Having these limitations, equivalent response cannot be a reliable foundation for an objective evaluation of student translations.

Functionalist and skopos-based views
The early functionalist approaches focus only on one master criterion when explaining the translation phenomenon.According to Reiss (1971Reiss ( /2000)), text types and their corresponding functions are the most influential factors in determining translation strategies.Subsequently, Reiss and Vermeer (1984) considered the dominant function (skopos) of the text as the criterion against which the quality of a translation can be measured.To them all the decisions resorted to by the translators are subordinate to the primary purpose of the target text, but they did not propose any procedures for translation evaluation.In skopos theory's target text oriented framework, the purpose of target text justifies the tools (translation strategies) whatever they may be.
Working within the functionalist paradigm, Nord (1991) considers translation as a communicative act and believes that in translation evaluation micro-textual error analysis may not be a reliable tool if applied alone, because a translator may omitor add something to the translation to meet the requirements of the target text's macro-level function.She tries to devise an overall text analysis model, but most of the times she considers only one specific parameter in her analysis and maintains that it is not possible to have a comprehensive evaluation of a translation.
The formalist polysystem theory (Even-Zohar, 1978/2004) investigates translation and its quality in isolation from its source to determine the status and the function of a translation in the target literary system.However, it does not provide a scenario for translation evaluation.This theory considers translation and its function as an independent and stand alone work quite like an original.Poly-system theory largely disregards the intrinsic value and the status of ST in itself.Borrowing ideas from polysystem theory, theorists like Toury (2012) and Chesterman (1997) try to establish various translation norms and standards based on the observation of the regularity of the behavior in the translated texts and the translators, but they fall short of proposing any practical procedures for translation evaluation.
To them adherence to the prevalent norms of translation in the target text is the corner stone of 'good' translation.Naturally, a flexible notion like norm entails a lot of subjectivity on the part of the translation evaluator.If the norms of translation were already established for a specific society in specific time, we could use them as criteria to judge the quality of any student translation.
The main concern of functionalist and polysystemic views of translation is the evaluation of published (literary and religious) translations.Either they do not offer any procedures for translation evaluation or offer highly complicated procedures for textual analysis.In addition, much like response-based theories, they are highly target-text oriented.Following skopos-based views on evaluation, every student translation could be judged as faultless resorting to the excuse that the translation is serving a certain purpose (skopos), which might be different from the original purpose of the ST.These major limitations defy the adoptability of these view sas the base for correcting and grading scales for the evaluation of student translations.

Complex and multi-parameter views
These views mainly draw on the different dimensions of language introduced and explained by theories such as (critical) discourse analysis, speech act, pragmatics, argumentation theory, etc.At times, eclectic mixturesare used to compare and contrast STs and TTsand make quality statements.The distinguishing feature of these views is the consideration of both ST and TT in the real life contexts and analyzing the texts on supra-sentential levels.
House (1997 and 2009a), borrowing ideas from theories such as speech act and pragmatics developed a highly complicated model for translation quality assessment.Although working within a functionalist framework, she criticizes the target-oriented nature of the previous functional theories of translation, and maintains a more balanced appreciation of the roles of ST and TT in her model.According to House (1997, pp.66-69) ST and TT must be analyzed from various dimensions (semantically, pragmatically, textually, etc.) to look for a relative "functional equivalence".She distinguishes between two basic types of translations.Where an overt translation flavors, symbolically, like a translation and the readers know that they are reading a translation, a covertly translated text seems like an original in the target culture because of being passed through a "cultural filter".Cultural filter is a highly vague construct and entails a subjective judgment that is far from desirable for a translation quality assessment model that claims objectivity.For House, the word 'culture' encompasses textual and linguistic norms and conventions embedded in every linguistic community.She admits that "translation is at its core a linguistic-textual phenomenon" and it must be evaluated as such.As a matter of fact, House (2001, p.254) advocates a combination of objective linguistic evaluation of translation (which is inter-subjectively verifiable) plus a holistic value judgment taking into account the macro-level (social and cultural) qualities of ST/ TT.
Although House's model (1997) has tackled many of the criticisms leveled against the previous views, it is highly impractical for real-life educational purposes.If we consider the previous theories and their promises for quality assessment as highly subjective and reductionist, House's model is objective, but far too unmanageable for educational purposes.House (1997; p.64) admits that it sounds "unlikely that translation quality assessment can ever be objectified in the manner of natural science".That is why she incorporates cultural and social loads of ST and TT in her analysis.This is something that must be considered in every evaluation scenario.House's model is mainly used for corpus-based studies within the covert translation project in the University of Hamburg.Its adoptability for educational settings with teacher's limited resources does not seem viable.
Al-Qinai (2000) attempted to develop an 'empirical' framework for quality assessment drawing on some parameters like textual typology, formal correspondence, thematic coherence, reference cohesion, lexical-syntactical properties, etc.Such an eclectic model only adds to the number of the parameters involved in the assessment process, something that is to be avoided in order to have a manageable evaluation criterion for student translations.
Williams (2004, pp.21-30) draws on argumentation theory, which is one aspect of discourse analysis, and New Rhetorics to build a general and whole-text framework for non-literary (instrumental) translation evaluation.He argues that argumentation is present in every text ranging from the most scientific to the most literary and thus his proposed model can be applied to all text types.Williams (2004, pp.32-65) explains that every argument is composed of six components (claim(s), ground(s), warrant(s), backing(s), qualifier(s), and rebuttal) which can be compared cross linguistically.The model checks whether the translations render these components accurately.Later on, Williams turns into eclecticism and borrows ideas from various scholars and under the designation 'rhetorical typology' adds more parameters like organizational relations, conjunctives, inference indicators, propositional functions, arguments type, figures of speech and nar-rative strategy to his "multiple-parameter grid" for translation evaluation.There are serious limitations if one wants to adopt William's model to evaluate student translations.Firstly, identifying these components does not seem to be an easy task at all and not every piece of text includes all these components.Secondly, the 'accuracy' of rendering each parameter is not objectively measurable.
The multi-parameter and eclectic models may be of great academic interest and help researchers understand the translation phenomenon and evaluate published translations or professional translators' competence but they cannot be applied in evaluating student without considering the specific requirements of the educational context.It seems that the authors of these models are under the influence of eclecticism as an 'enlightened' method which has strong supports in second language learning literature (see Brown, 2007, and Swain&Lapkin, 1995).However, if eclecticism has worked for second language teaching, this does not justify its application in translation evaluation.
Trying to explore the advantages and disadvantages of the existing translational evaluation scenario, Williams (2004) divides all the existing models into two broad categories: qualitative and quantitative.Quantitative models are mainly developed and used by organizations and bureaus for certification and examination purposes in professional contexts.They mainly focus on a small random sample of a text and conduct sub-sentence error analysis and error counts to score translations based on predefined correcting and grading scales.Use of short sample of professional translator's translation is quite similar to summative evaluation of student translations where a usually short text is translated and then evaluated in the final exams.
One of the important factors considered in translation evaluation in various models is error gravity.According to Nord (1996) the higher the level of errors, the stronger its impact on the whole text under evaluation.Thus, to her, errors with the highest gravity are pragmatic errors.They are followed by cultural errors and the least important errors are the linguistic ones.Larose (1989) emphasizes the importance of the context in which the errors occur as determining factor in the seriousness of the error.Some scholars maintain that the degree to which an error violates the effectiveness or the functioning of the target text determines the error gravity (Dancette, 1989 andHurtado Albir, 1995).According to Martinez Melis and Hurtado Albir (2001) error gravity must be analyzed from functionalist stance to see: the effect of the error on the texts as a whole, its effect on the cohesion and coherence of the target text, the extent of departure from the meaning of the original text, the degree of violation of the target text communicative function (including text type conventions) and its impact on the translation skopos.
The main purpose of evaluation in the educational context is to measure student translator's skills.According to Stansfield et al. (1992), there are two basic translation skills: accuracy and expression.Accuracy is related to source text processing and expression deals with target text production.Hatim and Mason (1997) also emphasize the inclusion of source text processing skills, transfer skills and target text processing skills into every evaluation scenario.
Alongside error gravity, error typology is another important aspect in translation

Requirements of a Practical Model for the Evaluation of Student Translations
evaluation literature.Hurtado Albir (1995) identifies two broad types of errors: errors dealing with the understanding of the source text (language errors) and those dealing with the production or the expression of target text (transfer errors).He also considers the faulty transfer of the primary or the secondary function of the source text as another main source of error in translation.Following this error typology, he defines two gravity levels for errors.For every serious error the translation receives -2 and for a minor error -1.One score is allocated to a good translation solution and +2 for an exceptionally good translation solution.
However, there are cases in which one cannot decide between major and minor error.Thus, it seems that more than two levels must be defined for error gravity.
To sum up, for a model to work as aguide for the evaluation of student translation, it needs to be: _ Objective: the postmodern and response-based views are highly impressionistic and their application in educational context leaves much room for subjective judgments; _ Practical and manageable: multi-parameter and complex models consider numerous parameters in their analysis for the sake of objectivity and validity but they are far too unmanageable and complex for a real life summative evaluation of student translations; _ Non-reductionist: except multi-parameter models, the other views and models make use of onlyone master criterion (e.g., skopos or equivalent response) or a binary criterion (e.g., free/ literal or overt/ covert) to evaluate a relatively complex phenomenon.Reductionism is a serious threat to validity of an evaluation model; _ Tailor-made: the existing models could not be employed to assess the quality of all three areas introduced by Martinez Melis.Most models focus on the evaluation of a published translation and professional translator's competence.We need a tailor-made model to evaluate student translator's competence summatively.
_ Bi-directional: most theories and their implications for evaluation are either source or target text oriented.We need a model that considers both language and translation skills.
_ Objective and holistic at the same time, because the sum of micro-textual errors may not determine the quality of a translation and translation evaluation cannot reach to the level of physical sciences in its accuracy.

Such a model needs a scoring system including:
_ A correcting scale that defines the error types hierarchically; _ A grading scale that determines the levels of error gravity hierarchically.
It must be noted that if an evaluation model draws only on one master criterion or a dichotomy, the evaluation task becomes quite manageable, but the translation evaluation will be highly impressionistic.On the other hand, if the model draws on a complex analysis, the objectivity can be guaranteed to some extent, but its application will be time consuming and unmanageable for summative translation evaluation of the student translations.Complex models may be of great academic interest, but they are not used by practicing teachers.
Therefore, a model that fulfills both manageability and objectivity is required for students translation evaluation.Such a model must necessarily draw on a few manageable parameters.In the next section, the above mentioned requirements will be put into the framework of a manageable multi-parameter model for translation evaluation.
As we showed in the previous sections, the existing models for translation evaluation cannot be readily adopted for educational settings where texts are to be translated and evaluated.The existing models either draw on too many parameters in their analysis and are, thus, too unmanageable for the summative evaluation of student translations or they are too reductionist and include only a couple of highly impressionistic criteria and thus verge on extreme subjectivity.According to Pollitt (1991) in judging tests, it is optimistic to claim "five reliable bands" to measure the ability we are observing.Inclusion of five levels in analysis can promise both objectivity and manageability when one is evaluating the translation of a usually small text by student translators in the summative exams.
Koller (1979, pp.99-104) introduces exactly five equivalences for translation that could be readily adopted for translation evaluation purposes as levels of correcting scale.Koller's equivalences are denotative, connotative, text-normative, pragmatic and formal that are hierarchically ordered as to the level of task difficulty.
Denotative equivalence refers to equivalence of the referential content of source and target texts.This equivalence could be considered as the lowest (the easiest and the least demanding) level of student's translation competence.This has mainly to do with the second language proficiency of a student translator (language skill).For example, any direct deletion or addition could go under this category.A student that does not know the denotative meaning of a source text item may resort to deletion in most of the cases.Connotative equivalence concerns the suitability of translator's lexical choices and the ability to understand and transfer the connotative meanings between source and target texts.Realizing this equivalence demands higher understanding and production skills than the denotative level (both language and transfer skills).Text-normative equivalence has to do with text types and suitability of student's lexical choices regarding the text type under translation (transfer skill).Pragmatic equivalence is the ability to understand and produce pragmatically acceptable equivalences across semiotic borders.This entails familiarity with the pragmatic aspects of source and target texts and the way they correspond to each other (transfer skill).Formal equivalence deals with aesthetic properties of the texts including word plays, use of puns and highly expressive features of the ST and TT.This is the most difficult level of equivalence and itis highly demanding on the production skills of the translator trainees.The type of equivalence one is looking for depends largely on the text type under translation.
If we consider these multiple equivalences as the base for our model's error taxonomy, we will have denotative, connotative, text-normative, pragmatic and formal errors in the model.The order of equivalences from the easiest to the most difficult can serve as the basis for determining error gravity.Since languages

Developing a Model for the Summative Evaluation of Student Translation
agree more on micro than macro level, it is naturally more difficult to find a pragmatic equivalence than a denotative one.Therefore, in the evaluation scenario, the higher the level of equivalence required, the higher the demand on the student translator's skill and, thus, the higher the gravity of error in the grading scale (see Nord, 1996).Thus, there are five main types of errors in the correcting scale and five corresponding levels for the error gravity on the grading scale.
If we assume that the total score is 100, 70 points are allocated to linguistic error analysis based on correcting and grading scales proposed earlier according to Koller's equivalences.Thus, our correcting scale would be as follows: _ Every denotative error deduces 1 point; _ Every connotative score deduces 2 points; _ Every text-normative error deduces 3 points; _ Every pragmatic error deduces 4 points; _ Every formal error deduces 5 points.Some exceptions will be added to preceding scoring scale, because translation is rarely done in a word-by-word manner.
_ An error is only punished for the first time.If repeated no point is deduced; _ If students change something in order to achieve an equivalent in a higher level of language, no point is deduced; _ No scores are deduced if an error is compensated for somewhere else in the translation.
Because translation is a relative activity in nature and translations with equal number of errors may still be of different quality levels (see House, 2001;Waddington, 2001;Pym, 1993), the sum of linguistic translation errors (both language and transfer errors) may not directly indicate the quality of the translation and there is a need to add a holistic parameter to the our evaluation scenario.Therefore, 30 points (out of 100) are allocated to an overall holistic appreciation of the rater about translation quality.
We equip our model with the holistic band proposed by Waddington (2001) to capture the holistic appreciation of raters of student translations (Table 1).The marks allocated to each level are tripled to give the maximum score of 30 (in the original band the maximum possible score is 10 for the a translation rates as level 1, while the translation which is marked as level 5 by a rater receives maximum between 1-2 scores).Waddington's band considers three holistically judged criteria, which are accuracy of transfer of the ST content (language skill), quality of expression in the TL (transfer skill) and degree of task completion.Thus, in our proposed model a translation which satisfies the qualities mentioned for level-1 receives maximum 30 scores, whereas the one which satisfies level-5 qualities will receive only 6 scores.As it can be seen, level-1 translation can receive a score ranging from 24 to 30, which allows a translation rater to distinguish more neatly between the quite similar student translations.
This scenario is, then, applied to a student translation of about 300 words including two texts of about 150 words each (two different texts are chosen in order to increase the reliability of the evaluation).If possible, two raters can judge a student translation in order to increase inter-subjective reliability.A student translation that earns at least 50 out of 100 passes the summative final exam.Scores higher than 80 are marked as excellent, ones ranging from 60 to 80 are marked as good, and those ranging from50 to 60 are considered acceptable: Those falling below 50 are marked as unacceptable translations in this evaluation model.

Conclusion
Various criteria and models have been proposed for translation evaluation.
Almost none of them are tailor-made for the summative evaluation of student translation at undergraduate level.They are targeted at evaluating literary works and professional practicing translator's competence.Applying most of these models is quite time consuming and demanding on the classroom teacher's limited resources.That is why most of the teachers still draw on purely subjective methods of translation evaluation in the final exams in under-graduate levels.If a model is to gain popularity among the practicing teachers for translation evaluation, it must enjoy both objectivity and manageability.
We proposed a model that includes five types of equivalences in various linguistic levels as a guideline for a correcting scale and five corresponding error gravities in the grading scale to judge the quality of student translations quantitatively.As it has been rightly argued by some researchers (e.g, House, 2001;Waddington, 2001, andPym, 1993), the total sum of linguistic errors cannot directly reflect the quality of translation.Therefore, 70 percent of total scores is determined by error analysis (following Koller, 1979) and the remaining 30 scores are determined by evaluator's holistic appreciation of the quality of translation (Waddington, 2001).
The main rationale for choosing this combination was to arrive at a manageable model to evaluate student translation in pedagogical contexts.
Besides employing the model for summative assessments, one can also use this model for formative evaluation of student translation in order to find out the weak and strong points of the student translators during a translation course.It must be remembered that for purposes other than evaluation of student translations, there are other highly reliable and objective models existing in the literature that could be employed for the evaluation of literary and professional translations.
Further research is required to see the model in action and find its workability and compatibility with the various institutional contexts.It is recommended to apply this model to various text types.The validity and reliability of the model could be displayed better in evaluating the quality of the texts that include the maximum number of five equivalences proposed in by Koller (1979).It is believed that this model could be applied to evaluate the translation of all texts types in the educational settings.