Measuring competence: Validating a new Pelvic Assessment Tool (The PEAT Study)

are by the to be proﬁcient in performing a full female pelvic examination (FPE) prior to graduating as doctors. However, many UK medical schools do not conduct a formal summative assessment. Exploration of three electronic databases found no citations evaluating an assessment tool for pelvic examination. The ‘PEAT’ was developed with an evidence-based literature search and consists of six domains, evaluating attitudes; inspection; bimanual palpation; adnexal examination; speculum; communication; and a global score using a 10cm visual analogue scale. 60 participants including ﬁnal year medical students and junior doctors were divided into three groups of 20 according to their experience in performing gynaecological examination. Each participant performed a pelvic examination with a role player and pelvic model, which was video recorded. 20 Consultant assessors, who were blinded to the level of experience, used the PEAT for 15 video clips with a mix of participants from each group. Assessors also completed a questionnaire about the utility, relevance and face validity of the PEAT. The inter-assessor reliability of the PEAT was calculated using Cronbach’s alpha. Statistical analysis was performed for continuous variables using the Student t-test. the in the ‘inspection’ A was seen in the mean for consistency (α≥ 0.7) in 4/8 assessors (50%). The PEAT appears to be easy to use and have face validity. However, further reﬁnements are necessary for this PEAT to establish construct validity and improve reliability.


Introduction
Medical students are required by the GMC to be proficient in performing a full female pelvic examination prior to graduating as doctors. 1 Iyengar et al. found that some students were graduating without having performed a female pelvic examination on a conscious patient. 2 At present, in many UK medical schools a formal summative assessment of gynaecological examination is not undertaken. At medical schools where it is, this is based upon the use of manikins using non-validated assessment tools. In some UK Medical Schools such as the University of Birmingham (UoB), evaluation of the medical student's competence at pelvic examination is done by subjective faculty assessment during their obstetrics and gynaecology (O&G) placement. 3 This assessment is typically performed at the end of the teaching placement and is based on the assessors' recollection of the students' performance. This kind of assessment has been shown to have poor validity and reliability. 3 This is in contrast to objective structured assessments of technical skill (OSATS), which use task specific and global rating scales and have been used to evaluate a wide range of clinical and surgical skills. 4,5,6,7 After searching three electronic databases (Medline, Embase and CINAHL), no citations were identified that evaluated an assessment tool for pelvic examination. Thus, there is a need to develop an assessment tool for female clinical pelvic examination. Van der Vleuten described five criteria that determine the usefulness of an assessment method: namely reliability, validity, acceptability to learners and faculty, impact on future learning and practice, as well as costs. 8 We designed a pilot study with the objective of evaluating the first two stated criteria of reliability and validity of a pelvic examination assessment tool (PEAT).

Study population
We designed a cross sectional study to obtain provisional data pertaining to the potential reliability, utility and validity of a new PEAT for use in undergraduate medical education. The 60 study participants were divided into three groups of 20 according to their experience in performing gynaecological examination: (i) 'novice student' (defined as a medical student in week one of their O&G clinical placements); (ii) 'experienced student' (defined as a medical student having completed their five week clinical placement in O&G with signed-off competency to perform pelvic examination) and (iii) 'competent practitioner'. This latter group consisted of junior doctors in their 1st and 2 nd year of speciality training in O&G. For the purposes of our study we restricted competent practitioners to relatively junior trainees because it was felt that they were closer in age and appearance to the target student population. aware that outcome and assessment as part of the trial would not be incorporated into their final academic grade or postgraduate performance reviews.
Each participant was asked to fill in a pre-assessment questionnaire to assess the comparability of participant groups.
The following baseline demographic data were collected: age; gender; ethnicity; confidence in examination assessed using a 10cm VAS score; number of previously performed examinations; and an interest in a future career in O&G.

Clinical scenario and pelvic examination
Each participant was presented with the same clinical scenario which required the performance of a pelvic gynaecological examination. The written information provided instructions of the tasks to be completed and to provide a summary of findings to the examiner at the completion of the examination (Figure 1). A role player was used to simulate the verbal responses of a real patient. The written instructions informed the participants that they were to perform the pelvic examination on the manikin provided after positioning the role player as if they were a real patient. The scenario (including examination of the manikin) was video recorded and each recording was sent to three assessors unfamiliar with the participants to ensure that they were blinded to the participants' level of experience. Each assessor received 15 recordings each. Within these there was an equal mixture of recordings in a random order from all three participant groups.

Assessment
The PEAT (Figure 2) was formulated using points that were considered important by members of the clinical undergraduate O&G teaching faculty at the BWH and UoB after consultation with gynaecologists, patients and the available medical literature. The PEAT ( Figure 2) consisted of six domains assessing: (1) attitudes; (2) inspection; (3) bimanual examination; (4) adnexal examination; (5) speculum examination; (6) post examination communication; and an additional overall global score. Each domain was assessed using a 10cm visual analogue score (VAS). The assessors were sent a pack with a cover letter explaining the purpose of the PEAT study including the instructions given to the candidate. They were advised to use PEAT to assess the examination skills of the 15 participants, with the videos provided on a memory stick. To ensure confidentiality, the memory sticks were locked, requiring a password that was allocated to them by the researcher sent in the post. They were also sent 15 assessment tool marking sheets, a feedback form and a stamped addressed envelope to return all study material. The feedback form used VAS scoring to investigate whether the PEAT was easy to use, and whether the assessor considered this an effective tool to assess competence. There was a space for other comments that allowed assessors to give any additional feedback on PEAT. Themes were considered to have emerged where more than one respondent provided Janjua A, Chu J, Smith P, Clark T The validity of an assessment tool provides an indication as to whether the test is measuring what it is intended to measure. 9 We hypothesised that higher PEAT scores would be obtained for the participant groups with greater experience, i.e. junior doctors would score highest in all facets of the pelvic examination compared with novice medical students and possibly compared to experienced medical students. The feedback form completed after the assessments by the examiners was intended to determine face validity. The reliability of an assessment tool refers to its ability to give reproducible results. We examined inter-assessor reliability by observing agreement of two or more assessors evaluating the same video assessment.

Statistics
The sample size was influenced by the constraints imposed by the number of the trainees in O&G at the Birmingham Women's Hospital (BWH) which is the main academic teaching institution delivering undergraduate medical education in O&G on behalf of the UoB.
The validity and reliability of the PEAT were considered to be of equal importance and were the primary outcomes assessed. Validity was tested using the proxy measure of construct validity; this was demonstrated with scores derived from the PEAT reflecting the participant experience. Reliability was measured using inter-assessor reliability between assessors who were blinded to the other assessors' marks. Statistical analysis was performed for continuous variables using mean, standard deviation and standard error of mean. Cronbach's alpha was used to test reliability (inter-assessor variability) using a two way mixed, consistency, average-measures intra-class correlation for each element of PEAT (attitudes; inspection; bimanual examination; adnexal examination; speculum examination; post examination communication; and an additional overall global score) with p>0.7 considered good. 10,11 We also tested whether the baseline characteristics of the medical students (novice and experienced) were predictive of student global performance as scored by the independent assessors. The independent variables: gender, age, ethnicity, confidence in examination and expressed interest in a future career in O&G, were tested in univariable and multivariable analyses.

Results
The background demographics are shown in Table 1. The distribution of gender and ethnicity were comparable across the three participant groups. The level of interest in a career in O&G was comparable between the two student groups as was the age distribution whereas these demographics were higher in specialist trainees in O&G. Confidence in pelvic examination was highest in junior doctors in O&G, followed by experienced students and lowest in inexperienced students. No student had performed more than 10 gynaecological examinations prior to the study in contrast to specialist trainees in O&G who had all done so (Table 1)

Main findings
In our study we were unable to establish construct validity for the PEAT. There were no consistent differences between the three groups representing different experience levels in female pelvic examination. The PEAT demonstrated construct validity for adnexal examination, where the most experienced group, namely junior doctors, performed this aspect of examination better compared with novice and experienced medical students. However, no differences were observed between the levels of experience between student groups within this domain. Furthermore, in contrast to our expectation, novice students scored more highly in the inspection domain compared with their more experienced counterparts; experienced students and junior doctors.
Face validity was demonstrated for the PEAT with examiners considering the tool effective in measuring competence in gynaecological examination. Moreover, examiners reported the PEAT to be comprehensible, quick and easy to use. The reliability of the PEAT appeared to be excellent for the global score, but poor for the individual elements of the assessment. The individual elements of PEAT could be removed from the tool but the consideration of each element maybe important for informing the assessor when considering the global score of a participant.
This study demonstrates that when considering reliability, the global score should be used in preference to the other elements for hypothesis testing, although inferences should be made with caution due to the small number of respondents. The PEAT in its current form appears to have some strengths, especially when considering its utility and face validity. However, revision is needed to demonstrate construct validity and improve reliability before the PEAT can be routinely used in clinical education and formative assessment.
The useability of the PEAT expressed by the assessors may reflect its design. By incorporating a VAS for marking each domain we hoped to harness the sensitivity of a continuous scale and produce an easy, rapid way of scoring performance. We expected most clinicians to be familiar with VAS for measuring clinical outcomes and so anticipated that proficiency in using the PEAT would be quickly achieved. In addition, by explicitly defining the components to be considered when deciding where to score each domain within the PEAT, we hoped to optimise its Janjua A, Chu J, Smith P, Clark T user-friendliness thereby aiding its validity and reliability. In a clinical examination setting, standardised assessment instruments need to be clear and easily completed to score candidates in real time reducing the likelihood of recall bias and also to enhance their practical use especially where a large number of candidates are to be assessed.
The failure of the PEAT to consistently discriminate between the relative experiences of candidates in gynaecological examination suggests that the tool requires considerable refinement before it can be widely adopted. However, it is possible that the design of our study was biased in favour of the medical students for several reasons. Firstly undergraduates in their final year have generally become well versed in OSCE style assessments because they are commonly used in assessments across different specialties. In contrast, junior doctors at ST1 and ST2 levels may have become less familiar with this type of assessment. Secondly, whilst our choice of using role players combined with manikins for examination is representative of assessment methods for intimate examination used by the majority of UK medical schools (Janjua and Clark, unpublished), it may have biased against more experienced junior doctors who undoubtedly would be more familiar with examining real patients. Thirdly, the pelvic manikin used in our chosen scenario had no genital tract pathology. It is possible that the presence of pathology may have yielded different results with more experienced participants displaying higher levels of competence. 12 Fourthly, our population of students were in their fifth and final year and so many may have already gained some experience of pelvic examinations e.g. during their sexual health attachments in their fourth year, within general practice placements or during their student selected modules. Therefore, the baseline experience of students will have varied and the 'novice student' may potentially have more experience in performing gynaecological examinations than those deemed an 'experienced student', having completed their final year O&G placement.
Another explanation for the apparent lack of construct validity of the PEAT could relate to examiner training. A greater familiarity with the assessment tool would lead to improvements when assessing pelvic examinations. 13 Faceto-face training to improve understanding of the tool would have been ideal. 14 A final bias to consider is that of selection. Participation in this study was voluntary and so it is plausible that more self-confident students were recruited thereby overestimating average performance. However, to some degree these arguments apply to participating junior doctors and moreover confidence does not necessarily correlate with clinical skills. 15 It is therefore possible that our results are not externally valid in students who possess a lower level of initiative, and further studies may need to consider using accounting for personality traits 16 or designs whereby the full student cohort could be evaluated.

Strengths and limitations
Our study is novel because to our knowledge it is the first to attempt to validate a PEAT for use in evaluating competence in gynaecological examination in either undergraduate or postgraduates. The methodology to establish construct and face validity, as well as reliability was robust. A reasonable number of novice and experienced students, as well as junior doctors were recruited to this study to establish validity and reliability of the PEAT. The task required to be undertaken by candidates was standardised with identical instructions provided to all participants. We stipulated no time limit for the examination and inappropriate content that was unrelated to pelvic examination skills. 9 We recruited all assessors from outside our University Medical School and post-graduate training region in order to ensure blinding of assessors to the level of experience of the participants in the study thereby enhancing objective impartial assessment. 17 Limitations of our approach include the fact that pelvic examination was performed on a manikin rather than a real patient. We tried to make the encounter as realistic as possible with the presence of a role player to provide verbal Janjua A, Chu J, Smith P, Clark T responses during the examination of the manikin. Despite this, it is unlikely that this setting neither replicated the experience of examining a real patient nor induced similar levels of anxiety. 13 To create a more realistic examination, future studies could consider the use of expert or simulated patients (gynaecology teaching associates or 'GTAs'). However, given that most medical schools do not employ GTAs, any PEAT should be valid for use in simulated manikin-based scenarios and generalisable. Another limitation of our study is that we used a one-off assessment to establish competence. One could argue that using serial assessments looking at improvements in the VAS scores would give a superior indicator of competence 18 as this would provide a formative method of assessment. However, summative assessments of core clinical examination skills are still common components of final medical school examinations and validated measuring instruments such as the PEAT should be developed for such purposes.
Our sample size was influenced by the constraints imposed by the number of the undergraduate and postgraduate trainees in O&G at the BWH. This limited the number of participant assessments, thus limiting the videos we could send out to test construct validity. Furthermore we had a low response rate, with only 40% of assessors returning their assessment sheets on the participants impacting adversely on the power of the study to establish validity. 9

Implications for practice and research
Assessment in postgraduate medical education has a better evidence base than in undergraduate education although we are aware of no validated tools for evaluating proficiency in female pelvic examination. In the postgraduate years, performance is assessed by not only by senior clinicians, but also by patient, peers and other members of the team e.g. secretaries in multi-source feedback (MSF). 19,20 Mini-clinical evaluation exercise (mini-CEX) is commonly used in postgraduate education for observation of history taking, clinical skills as well as general attitude and professional behaviour. 21 Similarly, directly observed procedural skills (DOPS) are used to assess and provide feedback on particular skills e.g. injections. 22 Although these methods were initially created for assessment within a particular specialty, they are now more widely available, and applicable to other specialties such as O&G. 23 Previous studies in O&G have used observed structured assessment tools (OSATS) for establishing competence in postgraduate surgical skills and have demonstrated construct validity. 24 The PEAT we developed is a type of OSAT and with further revision we hope to show validity and reliability in both an undergraduate and postgraduate population. If developed it could be used for both formative assessment (where competency can be attained ideally over a period of time during the clinical placement) and summative assessment. 25 Many clinical competencies tested in undergraduate medical students are judged by methods which have not gone through rigorous psychometric testing. There is a need for valid, useable instruments for evaluating core clinical skills. In our study we attempted to develop a PEAT for assessing competence in gynaecological examination. However, qualitative work to obtain the views of students, patients, educationalists and clinicians may aid the development of a more valid and reliable PEAT. 13,26 Pilot studies, such as the current one, can then be undertaken to identify the potential reliability and validity of a refined PEAT. In addition, the derived data can be used to help inform the design and size of future, larger-scale studies. Such studies should incorporate more assessors to better evaluate psychometric performance of the revised PEAT before it can be introduced into routine educational practice.

Conclusion
Training in obstetrics and gynaecology is integral to the undergraduate medical school curriculum. Achieving experience in gynaecological examination can be challenging given the time constraints of placements and the intimate nature of the clinical examination. Thus, innovations in training and assessment are urgently needed to ensure medical undergraduates qualify with the necessary competence in this core clinical skill. To our knowledge, this is the first study to try to develop and validate a pelvic examination assessment tool which ideally can be used in both undergraduate and postgraduate medical education both formatively and summatively. The PEAT we tested demonstrated utility and face validity but further refinements are needed to establish reliability and construct validity in larger scale studies inclusive of strategies to optimise response rates of assessors.
Take Home Messages 1. Medical students are required by the General Medical Council to be proficient in performing a full female pelvic examination prior to graduating as doctors.
2. Exploration of three electronic databases found no citations evaluating an assessment tool for female pelvic examination.
3. The new PEAT (PElvic Assessment Tool) consist of six domains, evaluating attitudes; inspection; bimanual palpation; adnexal examination; speculum; communication; and a global score.