A practical discussion to avoid common pitfalls when constructing multiple choice questions items

This paper is an attempt to produce a guide for improving the quality of Multiple Choice Questions (MCQs) used in undergraduate and postgraduate assessment. Multiple Choice Questions type is the most frequently used type of assessment worldwide. Well constructed, context rich MCQs have a high reliability per hour of testing. Avoidance of technical items flaws is essential to improve the validity evidence of MCQs. Technical item flaws are essentially of two types (i) related to testwiseness, (ii) related to irrelevant difficulty. A list of such flaws is presented together with discussion of each flaw and examples to facilitate learning of this paper and to make it learner friendly. This paper was designed to be interactive with self-assessment exercises followed by the key answer with explanations.


INTRODUCTION
Assessment as an important educational tool for both the teacher and the learner is at the top of the agenda for many educators since it is the key to changes in medical education. MCQs are the most frequently used assessment tool worldwide. They have a high reliability per hour of testing and are easy to administer and mark. Without evidence of validity, assessments in medical education have little or no intrinsic meaning. The internal structure of an MCQ item is an important source of validity evidence [1] and therefore necessitates the avoidance of technical item fl aws. Such fl aws are basically of two main types: (i) related to testwiseness, (ii) related to irrelevant diffi culty. [2][3][4][5] A list of such fl aws is attached [ Table 1].

OBJECTIVE
With the recent expansion of both the undergraduate and postgraduate medical programs in Saudi Arabia and the wide use of multiple choice questions (MCQs) in assessment, the quality of MCQs have to be improved. This paper is an attempt to do this.

Grammatical cues
One or more distracters do not follow grammatically from the stem.

Example:
The easiest way to get an update on medical information is: a. Books. b. Medical Journals. c. Guidelines. d. The internet. e. Newspapers.
Because an item writer tends to pay more attention to the correct answer than to the distracters, grammatical cues are more likely to occur in the distracters. In this example, testwise students would eliminate a, b, c and e as options because they do not follow grammatically or logically from the stem (plural). Testwise students then have to choose d only, which is the correct answer.

Logical cues
A subset of the options is collectively exhaustive. In this item, options a, b, and c include all logical possibilities in any patient (a human being has either increased blood sugar, normal blood sugar or decreased blood sugar). The testwise student knows that the correct answer must be A, or b, or c whereas the non-testwise student spends time considering d and e.

Use of absolute terms
Terms such as "always" or "never" are used in options.

Example
In deep venous thrombosis (DVT) of the lower extremity, which of the following statements is the MOST appropriate? a. 70% of patients will have physical fi ndings on examination. b. The gold standard test for the diagnosis of DVT is the radiofi bronogen leg scanning. c. Anticoagulant is the treatment of choice for distal lower extremity DVT. d. Pulmonary emboli always come from lower extremity DVT.
Testwise students usually do not select items with the words (always or never like item d) and instead they tend to choose items with the words (usually or frequently) knowing that nothing is absolute in medicine.

Long correct answer
Correct answer is longer, more specifi c, or more complete than other options.

Example:
The parent of a 2-year-old boy reported that he shows in toeing when walking. On examination, the child exhibits femoral anteversion. The MOST appropriate treatment is: a. Reassuring to the parents that the condition usually corrects itself as the child grows older. b. Referral to an orthopedist. c. Referral to a physical therapist. d. Bracing to correct internal rotation of the femurs.
In this item, option A is longer than the other options and it is the correct one. Item writers tend to pay more attention to the correct answer than to the distractors. Because you are teachers, you write long correct answers that include additional instructional material, parenthetical information, caveats, etc. Sometimes this can be quite extreme.

Word repeats
A word or phrase is included in the stem and in the correct Answer.
Example [3] A 58-year-old man with a history of heavy alcohol use and previous psychiatric hospitalization is confused and agitated. He speaks of experiencing the world as unreal. This symptom is called: a. Depersonalization. b. Derailment. c. Derealization. d. Focal memory defi cit. e. Signal anxiety.
This item uses the word "unreal" in the stem, and "derealization" is the correct answer. Sometimes, a word is repeated only in a metaphorical sense, e.g., a stem mentioning bone pain, with the correct answer beginning with the prefi x "osteo".

Convergence strategy
The correct answer includes the most elements in common with the other options. Absolute terms such as always or never.
• A correct answer that is longer, more complete and more specifi c than other options. • A word or phrase included in both the stem and the correct answer. • Inclusion in the correct answer of most elements in common with the other options. Issues related to irrelevant diffi culty.
• The options are long (ideally, the stems should be longer than the options). • Numeric data in different formats (i.e. ranges or %) • Interaction between two options (e.g. more than 50%; 70%). • Vague frequency terms (e.g. rarely, usually, frequently … etc). • The options differ in a single continuum or dimension (e.g., which of the following statements concerning disease x is correct). • Negatively phrased items (those with except or not) in the lead in statement. • None of the above or options (c + d) or (All of the above are correct) is used as an option. • Tricky stem or options or numerically complicated. • Illogical order of the options (e.g. not in alphabetical sequence).
The correct answer is the option b. The underlying premise is that the correct answer is the option that has the most elements in common with the other options; it is not likely to be an outsider.

ISSUES RELATED TO IRRELEVANT DIFFICULTY
Several fl aws introduce irrelevant diffi culty into the MCQ item not necessary in assessment.

Options are long, complicated, or double Example
A team of health care planners wishes to estimate the prevalence and incidence of AIDS in a particular community, which of the following is the most appropriate: a. The prospective cohort study design is the most suitable for estimating the prevalence of AIDS. b. The cross-sectional study design is the most suitable for estimating incidence of AIDS and the study group should represent not less than 30% of the community under study. c. Incidence is estimated as the proportion of study subjects, who test positive for AIDS from a random sample of individuals selected from the community. d. Information on the incidence of AIDS helps to plan and evaluate health care needs and services to assess the health care burden imposed on the community. e. A difference between the incidence rates in two groups of subjects exposed and not exposed to a risk factor may be used to test the relative strength of the association between the factor and the occurrence of AIDS.
The options are very long and complicated. Trying to decide among these options requires a signifi cant amount of reading because of the number of elements in each option. This can shift what is measured in an item from knowledge of the content to reading speed. The second fl aw in this item is that only options a and b follow logically from the stem, while options c, d, and e are not directly related to the purpose of the health care planners in the stem. Careful look to option c shows that there are two facts (double facts) that need to be judged by the student. Double facts in an option introduce diffi culty when one fact is true and the other is wrong. As a rule, if you need to test knowledge of two facts each one should be made into an option.

Numeric data are not stated consistently
When numeric options are used, they should be listed in numeric order and in a single format (i.e., as single terms or as ranges). Confusion occurs when formats are mixed and when the options are listed in an illogical order or in an inconsistent format.

Example
A 24-year-old woman delivered a boy with Down syndrome; she is depressed and asks you: what is the likelihood of her having another child with a similar condition? Your answer will be: a. 2% b. 5% c. 25% d. Less than 10% e. 15 to 18% In this example, Options d and e are expressed as ranges (not specifi c percentages). All options should be expressed either as ranges or as specific percentages. Absolute numbers should be used when there is consensus on their value. Otherwise, it is better to use ranges to avoid differences between sources that may confuse the student. In addition, the range for option d includes options a and b, which is not advisable.

Frequency terms in the options are vague (e.g. rarely, usually)
Research has shown that vague frequency adverbs are not consistently defi ned, even by experts.

Example
A one month old infant presents with sudden onset of paroxysms of loud crying lasting several hours, a tense abdomen, drawn up legs, cold feet and clenched fi sts. Which of the following is a true statement about his problem: a. It is caused by excessive swallowing of air. b. It is due to cow's milk protein allergy. c. It is related to insuffi cient fl uid intake. d. It rarely lasts beyond 3 months of age. e. Sedation is usually effective therapy.
In this example, the defi nitions of the words "rarely" and "usually" used in options d and e, respectively should be stated or replaced with numbers.

Options are not parallel (heterogeneous)
The options following a stem belong to different categories. This is one of the frequently encountered mistakes.
Al-Faris, et al.: Common pitfalls in MCQs "None of the above" or "all of the above" or "option a + b" are used as an option: The phrase "None of the above" is problematic in items where judgment is involved and where the options are not absolutely true or false. Use of "none of the above" as the true option may turn the item into a true/false item; each option has to be evaluated as more or less true than the universe of unlisted options.

Example
Chicken pox is caused by the same virus that causes: a. Herpes simplex labialis b. Herpes zoster c. Herpetic stomatitis d. All of the above e. None of the above If the student knows that options a and b are true but has no idea about option c, it is easy to fi gure out that the correct answer is d.

Stems are tricky or unnecessarily complicated
Sometimes, item writers can take a perfectly easy question and turn it into something so convoluted that only the most determined student will read it. The following item is a sample of that kind of item. An item are sometimes complicated by the use of uncommon abbreviations that are not spelled out.

SELF ASSESSMENT EXERCISE OF MCQ ITEM FLAWS
Self-assessment is a valuable educational tool. It has been given a higher profi le in the last few years because of the switch of emphasis towards more self-directed learning and the popularity of distance learning. The following is a selfassessment exercise to help recognize poorly constructed Example A 75-year-old woman complains of severe neck and shoulder pain. You suspect polymyalgia. Which one of the following is the MOST appropriate? a. Shoulder girdle muscle tenderness is not related to the condition. b. Loss of weight and fever would be expected. c. Her urine is likely to contain Bence-Jones protein.

Options are not in a logical order
It is a good practice to be consistent in the use of alphabetical order. Whenever options are in spectrum, it is preferable to list them in either a logical order or alphabetically. For instance, numeric options should be listed in ascending or descending order. Logical order is more convenient for the students and eliminates human bias.

Example
End arteries are seen in the: a. Adrenals b. Testicles c. Thyroid gland d. Thymus e. Brain In this case, it is easier to follow and read if the options are arranged from top down in this order: brain, thyroid, thymus, adrenals, and then testicles.

Negatively phrased stem
It is not advisable to use a negative statement in the stem, particularly if one or more of the options contain negatives (double negative). It is confusing for the examinee and serves us no purpose.

Example
Which of the following is NOT a feature of neurofi bromatosis? a. Skin pigmentation b. No increased risk of brain tumors c. Hearing loss d. No renal anomalies e. Positive family history While the stem is negative, options b and d are also negative.
MCQs. Read the questions and list the problems before looking at the correct answer. Table 1 may be found helpful as a checklist and reminder. It must be noted that this exercise does not intend to cover all possible fl aws. The discussion section will concentrate on the major critiques.
There may be problems in some questions that are not addressed in the discussion. 7. Which of the following statements about Polymyalgia rheumatica (PMR) is true? a. PMR can affect any age-group, but is most often seen in geriatric patients. b. PMR responds to corticosteroids treatment within several days. c. If untreated, the symptoms of PMR will spontaneously resolve within 2 weeks. d. Muscle weakness is found on physical examination. e. Abdominal symptoms are present. 8. A 17-year-old lady is brought into your offi ce with a 3-day history of nausea, vomiting, generalized abdominal pain and lethargy. She appears acutely sick. On examination, the patient's blood pressure is 170/70 mmHg, and her pulse is 140 and regular. Her respirations are 45/minute and regular. She has no history of signifi cant disease.
Her random blood sugar is 40 mmol/L (720 mg%). Her urine has +4 ketones. Which of the following statements is (are) TRUE of this patient? a. This patient is at risk for going into hyperosmolar diabetic coma. b. This patient should be treated with intravenous fl uids, intravenous insulin, and intravenous potassium. c. This patient can be safely treated as an outpatient. d. a and b. e. b and c. 9. A 51-year-old lady who is worried about developing osteoporosis because her elderly mother broke her hip