Likert Scale: Explored and Explained

Likert scale is applied as one of the most fundamental and frequently used psychometric tools in educational and social sciences research. Simultaneously, it is also subjected to a lot of debates and controversies in regards with the analysis and inclusion of points on the scale. With this context, through reviewing the available literature and then clubbing the received information with coherent scientific thinking, this paper attempts to gradually build a construct around Likert scale. This analytical review begins with the necessity of psychometric tools like Likert scale andits variants and focuses on some convoluted issues like validity, reliability and analysis of the scale.


INTRODUCTION
Nothing is more than a fear you cannot name.
-Cornelia Funke, Inkheart Since the inception of human race there is an inclination to capture the ethereal attributes of human behaviour and performance. Simultaneously, it has been a challenge from the same time to quantify the thing which cannot be measured through conventional measurement techniques. The perceived need of this quantification lies in the necessity to transform an individual's subjectivity into an objective reality. Attitude, perceptions and opinions are such qualitative attributes amenable for quantitative transformation due to above mention reason. Qualitative research techniques do try to compensate, by depicting the complexity of human thoughts, feelings and outlooks through several social science techniques, still the quantification of these traits remains a requirement and that's how psychometric techniques come into picture.

PSYCHOMETRICS AND LIKERT SCALE
Psychometrics techniques are being developed, instituted and refined in order to meet the quantification of traits like ability, perceptions, qualities and outlooks-the requirement of social sciences and educational researches [1,2]. Psychometrics operates through two ways; the first is to formulate approaches (theoretical construct) for measurements, followed by development of measuring instruments and their validation. Stanford Binet test (measures human intelligence) and Minnesota Multiphasic Personality Inventory (measures human personality) are the example for the same. The content in such instruments are rather 'pre-fixed' [3,4,5]. The another path is same up to formulation of theoretical construct for the measurement. This conceptualization is followed by operational assembly of abstract ideas/experiences/issues under investigation into some statements (items) largely guided by the aim of the study. This permits the contents (items) in such scales/models to be rather flexible and need based. Rasch measurement model (use for estimation of ability), Likert scale (measures human attitude) are the examples of such scales in Psychometrics used widely in the social science & educational research [3,4,5].
Likert scale was devised in order to measure 'attitude' in a scientifically accepted and validated manner in 1932 [6,7]. An attitude can be defined as preferential ways of behaving/reacting in a specific circumstance rooted in relatively enduring organization of belief and ideas (around an object, a subject or a concept) acquired through social interactions [8]. This is clear from this discourse mentioned above that thinking (cognition), feeling (affective) and action (psychomotor) all together in various combination/permutation constitute delivery of attitude in a specified condition. The issue is how to quantify these subjective preferential thinking, feeling and action in a validated and reliable manner: a help is offered by Likert scale [9,10].
The original Likert scale is a set of statements (items) offered for a real or hypothetical situation under study. Participants are asked to show their level of agreement (from strongly disagree to strongly agree) with the given statement (items) on a metric scale. Here all the statements in combination reveal the specific dimension of the attitude towards the issue, hence, necessarily inter-linked with each other [11].
With this context, this exploratory article attempts to describe two confusing issues related with Likert scale-(would be) preferable numbers of points on a scale and analysis of the scale. During one of the contributing authors' participation in a web based conversational learning forum on medical education. These two issues emerged as thrust area amenable for further exploration and lucid explanation for the educational researchers. An initial literature searched by authors led to aggregation of mutual conflicting evidences which compelled us to reexplore and further construct arguments based upon accumulated knowledge.

LIKERT SCALE AND ITS VARIATION
Before proceeding further, let's have a brief look on several constructional diversities of a Likert scale as the analytical treatment and interpretation with Likert scale largely depends upon these diversities.-Symmetric versus asymmetric Likert scale-If the position of neutrality (neutral/don't know) lies exactly in between two extremes of strongly disagree (SD) to strongly agree (SA), it provides independence to a participant to choose any response in a balanced and symmetric way in either directions. This construction is known as symmetric scale. On the other hand, asymmetric Likert scale offer less choices on one side of neutrality (average) as compared to other side. Asymmetric scale in some cases also indicatesipsative (forced) choices where there is no perceived value of indifference/neutrality of the researcher [12,13,14].
Seven /ten point scale -They are the variation of 5 point scale in which adjacent options are less radically different(or more gradually different) from each other as compare to a 5 point scale. This larger (step by step) spectrum of choices offers more independence to a participant to pick the 'exact' one (which he prefers most) rather than to pick some 'nearby' or 'close' option [15]. These variations are discussed in more details (in reference with validity and reliability) further in this paper.
Likert and Likert type scale-The construction of Likert (or Likert type) scale is rooted into the aim of the research Sometimes the purpose of the research is to understand about the opinions/perceptions of participants related with single 'latent' variable (phenomenon of interest) .This 'latent' variable is expressed by several 'manifested' items in the questionnaire. These constructed items in a mutually exclusive manner address a specific dimension of phenomenon under inquiry and in cohesion measure the whole phenomena. Here during analysis, the scores of the all items of the questionnaire are combined (sum) to generate a composite score, which logically in totality measures anuni-dimensional trait. This instrument is known as Likert scale.
Sometimes the primary interest of the researcher is not to synthesize the stance of the participants per se but to capture feelings, actions and pragmatic opinion of the participants about mutually exclusive issues around phenomenon/s under study. This fact demands the individual analysis of item to ascertain the participants' collective degree of agreement around that issue. The scale used so can be labeled as Likert type and not Likert scale [16]. A word of caution; this 'direction of enquiry' must be decided during the planning phase and at least during the designing of questionnaire and not at the time of analysis.

IS 7 POINT LIKERT SCALE BETTER THAN 5 POINT LIKERT SCALE? -A PERSPECTIVE CONTROVERSY OR ESTABLISHED WITH A CONSENSUS?
Since the advent of Likert scale in 1932, there have been debates among the users about its best possible usability in term of reliability and validity of number of points on the scale [17][18][19][20].
Likert (1932,7) in his original paper, discussed about the infinite number of definable attitudes existing in a given person with possibility of grouping them into "clusters" of responses. He further conversed about the assumption of his "survey of opinions" on which he provided his results and psychological interpretations [21].
The key assumptions of his survey being firstly, the presentation of item on scale are such that, so as to allow the participants to choose clearly opposed alternatives. Secondly, the conflicting issues chosen were empirically important issues thus, results themselves constituting an empirical check on the degree of success.

ANALYSIS OF THE ITEM RESPONSE
Before we proceed to the method of analysis available to Likert scale, a very fundamental but equally controversial question should be addressed-which type of scale Likert is?
There are two schools of thoughts -One school considers Likert scale as ordinal and other treats it as Interval scale. This conflict is primarily rooted into the question: whether points on a items are equivalent and equidistant? Points on scale are not close enough to consider them equal (in other words strongly agree is definitely away from agree and agree is away from neutral), they should be considered as nonequivalent entity. There is an agreement in both schools for the above fact. To understand this concept, let's assume a scenario in which the aim of the researcher is to measure the attitude towards classroom lectures and to make out relative preferences (library reading and small group teaching) compared with lecture. (Fig. 1) He designs the following survey instrument on a 5 point Likert scale for the stated aim- The first question of importance is: 'Can these items be clubbed (see together) in order to generate a composite index for measuring the attitude?' In order to evaluate their appropriateness for transformation into a single composite index, following points can be considered-1. Whether the items are arranged in logical sequence? 2. Whether the items are closely interrelated but provide some independent information as well? 3. Whether there is some element of 'coherence/expectedness' between responses (whether next response can be predicted up to some extent based upon previous one)? 4. Whether each item measures a distinct element of the issue?

Fig. 2. Choice of Analysis of Likert Items: Aim and Construct of Research
If answer to all the above questions is affirmative for all the items of a set, they may be combined to construct a composite index which measures the collective stance of the participant towards phenomenon under study. In the above example as item 1, 2 and 3 fulfill all four criteria for each other, they may be combined and can be treated further in unison.
On the other hand, item-4 and item-5, offer separate and sovereign (mutually exclusive) preferences regarding two different teachinglearning methods: self-directed reading and small group teaching. Hence, they can't be combined and further they should be analyzed independently from item 1, 2 and 3 and even from each other.
After this assertion of eligibility for combination, the next question arises-On what scale can item 1, 2 and 3 be treated and what is the appropriate measurement scale for item4 and item-5?
The answer of the above question lies in another question asked by Stevens in his famous paper: 'what are the rules (if any) under which numerals are assigned?' Here we see (a) the minimum score one can secure for first three items is 3 (and not an absolute zero). The reason for this apparently dislike for zero lies in the fact that in psychometrics, attitude is preferably measured in positive degree and being the 'strongly disagree 'cannot be equated with 'absolute disagreement'; there is always something below than strongly disagree. Zero also gives the notion of neutrality rather disagreement (the attitude is zero; means one is apathetic to issue) (b) Each numeral conveys the same meaning in all three items (i.e. 3 denotes the neutral in all three items) (c) As mentioned above, all three items can be clubbed while satisfying the content and criterion validity. This sentence needs a little more explanation. The idea or concept behind framing item 1, 2 and 3 is to capture the opinion of participants about the lecture. This theoretical construct how well can be transformed into operating reality, can be ascertained by looking at relevant content domains (content validity/reflection of construct), ability to distinguish opinion on lecture from other teaching modality (concurrent validity) and similarities among items 1 to 3 and dissimilarities from item 4 and 5 (convergent and discriminant validity). Concurrent, convergent and discriminant validities are the domains of criterion validity. Before deciding any statistical treatment to items, all the items must be scrutinized for validity issues.
If we look into point (a), (b) and (c) in cohesion for the set of item 1, 2 and 3, that composite score for the item-1, 2 and 3 can be compared with another composite score for another individual on an interval scale. A 'rank-order' among the composite scores can be presumed as well as equality of interval among related composite scores can also be postulated. The specific point on a particular item is conveying the same meaning for all individuals (for item -2 point 3 on Likert scale denotes 'neutral' among all individuals.) Moreover a specific point (say 2 for disagree) is conveying the same meaning (same extent of disagreement) in all the items and there is no absolute zero in scale (minimum achievable score is 3). From the discourse, this can besafely assumed (after going through all these mathematical characteristics with due consideration of validity related issues) that the obtained composite data for item 1, 2 and 3 for all the participants can be treated on an interval scale.
The truth has different dimension in case of item 4 and item 5. Item -4 and 5 being a mutually exclusive observation from each other (opinion on self-directed reading/ small group teaching) and from item 1, 2 and 3 should be treated differently. They may not be combined (validity restriction) for an individual as they are nowhere providing complementary observation.
Still item 4 and 5 can be treated on a certain measurement scale. The arguments for this assumption are -first, a specific point (say point -4) for a particular item ( (say for item-4) conveys the same meaning (agree) for all individuals treated on that item and second, response variables obtained for a single item from all the individuals can be arranged in any order preserving transformation (like square, multiplication, square root etc.) to the response variable(the rank order remains unaffected) .... so an ordinal scale's assumptions and treatment is applicable on this subset of items (4 and 5).
Once it is clear that under which rules the items are categorized and what the direction of inquiry is, it becomes obvious that the further statistical treatment as per their assignment into ordinal or interval scale.

CONCLUSION
The crux that can be extracted from the above inductive arguments and logical interpretation is that the methods adopted for Likert scale analysis largely depends on the item response variable assignment into ordinal or interval scale which in turn depends on the construct of the research instrument. This construct of research instrument can be derived from objectives of study and objectives are the operational form of theoretical construct of phenomenon under inquiry. In other words, designing of instruments based upon objectives and frameworks of study decides further statistical treatment.
Hence if one wishes to combine the items in order to generate a composite score (Likert scale) of a set of items for different participants, then the assigned scale will be an interval scale (Fig. 2 above). The measures for central tendency and dispersion for an interval scale are mean and standard deviation. Further this data set can be statistically treated with Pearsons' correlation coefficient (r), Analysis of Variance (ANOVA) and regression analysis.
As opposed to, if researcher wishes to analyze separate item (no composite score; Likert type scale), the assigned scale for such data set will be ordinal (Fig. 2 above). Needless to say, the recommended measure of central tendency and dispersion for the ordinal data are the median (or the mode) & frequency (or range). An ordinal data set can further be statistically tested by nonparametric techniques such as Chi-square test, Kendall Tau B or C test.
Before wrapping up, it is imperative to transform an abstract issue into figurative shape in order to measure it up to best possible extent. Simultaneously, this is an integrate process reason being influenced by perspective and subjectivity of researcher. Still all attempts should be directed for quantification of such qualitative attributes as -'what get measured, get managed.' (Peter Druker).