Observer Agreement on Computed Tomography Perfusion Imaging in Acute Ischemic Stroke

Supplemental Digital Content is available in the text.

T he use of computed tomography (CT) perfusion (CTP) imaging to assess patients with acute ischemic stroke can improve patient selection for revascularisation therapy 1,2 and may have particular value in selection beyond current time windows. Widespread use of CT perfusion in routine clinical practice is hindered by methodological differences in scan acquisition, scan processing, and interpretation, 3 as well as inconsistency in perfusion parameters used for patient selection. Previous studies assessing interobserver reliability of CTP have been limited to mostly single center studies with 2 to 4 observers [4][5][6][7][8][9][10][11] (Table I in the online-only  Data Supplement).
In this study, we used an online platform to assess observer agreement on qualitative interpretation of processed CT perfusion maps, using observers of different specialty and experience.

Methods
The data that support the findings of this study are available from the corresponding author upon reasonable request We used the Systematic Image Review System 2 platform, provided by the University of Edinburgh, an established method similar to that used in the ACCESS (Acute Cerebral CT Evaluation of Stroke Study) of observer reliability of plain CT in stroke 12 and to assess >7000 brain scans in the IST-3 (Third International Stroke Trial) 13,14 and other ongoing trials.
We selected 24 cases from 2 clinical trials: IST-3 (imaging substudy) 15 and the ATTEST trial (Alteplase Versus Tenecteplase for Thrombolysis After Ischaemic Stroke) 16 and 1 observational study, the POSH study (Post Stroke Hyperglycaemia). 17 Patients with ischemic stroke in all of these studies had CTP performed within 6 hours of symptom onset. All 3 studies were approved by relevant research ethics committees, and all participants had given consent for their anonymized scans to be used in further research.
For standardization, all raw perfusion imaging data were postprocessed by one researcher on a commercially available software platform (MiStar, Apollo Medical Imaging Technology, Melbourne, Australia) to produce maps of cerebral blood volume (CBV), cerebral blood flow (CBF), mean transit time (MTT), delay time (DT), and thresholded penumbra maps (PM). PMs dichotomized the lesion into core, defined as tissue with CBF of <40% and relative DT of >2 s compared with homologous tissue in the contralesional hemisphere, and penumbra, defined as tissue with relative DT of >2 s but relative CBF of ≥40% of that in the contralesional hemisphere. 18 Scans varied in their range of z-axis coverage and in the extent of the visible perfusion lesions. All scans covered the level of the basal ganglia and supraganglionic level required for calculation of Alberta Stroke Program Early CT (ASPECT) score.
Clinicians were invited to participate in the study through advertisement at stroke meetings and emails sent through professional organizations' and trials' mailing lists. Scan readers' details were collected at study registration including age, specialization, country and hospital where they worked, years of experience in their specialties, and frequency of reviewing stroke imaging and perfusion imaging. Participants were allowed to review as many scans as they could, in any order. Observers were presented with noncontrast CT (NCCT) and perfusion maps. A short clinical vignette which included age, National Institutes of Health Stroke Scale, and time from onset to scan was presented to observers after scans were fully reported. Participants could alter window settings for the NCCT, but not for the perfusion maps.
A structured questionnaire, adapted from a previously validated ischemic stroke questionnaire, 12,14,19 asked observers to comment on presence of an acute ischemic lesion on NCCT, its size, location, swelling, presence of a hyper-attenuated artery, background brain changes (old infarct, leukoaraiosis, brain atrophy), and on the presence of any perfusion deficit, its extent compared with ischemic changes on NCCT reflecting a previously used approach for assessing mismatch of perfusion lesions relative to the visible CT (or magnetic resonance imaging) lesion (no perfusion deficit <20% difference, equal volumes, >20% difference compared with the NCCT hypodensity), and the ASPECT score for hypodensity on NCCT and CTP maps. ASPECT score is a well-established semiquantitative scoring system for assessing the anatomic extent of acute ischemic changes in anterior circulation stroke 20 (http://www.aspectsinstroke.com) initially developed for NCCT but adapted for analysis of other imaging modalities including CTP. 21 At the start of the study, the questionnaire only allowed the observer to calculate an ASPECT score if acute ischemic changes were reported in the middle cerebral artery territory, but this restriction was later removed. Observers were also allowed to indicate if an ASPECT score could not be calculated for perfusion maps when the relevant region was not included. The total ASPECT score, ranging from 0 to 10 was used for analyses.
To assess intraobserver agreement, observers who completed all scans were invited to perform repeat review of 6 scans blind to their initial scoring, including 1 with no perfusion deficit, 2 with small to moderate perfusion deficit, and 3 with large perfusion deficit.
Statistical analysis was performed using IBM SPPS software, version 21. Descriptive statistics were used for scan and observer baseline characteristics and χ 2 tests to compare categorical data between groups. Krippendorff α, 22 was used to assess inter and intraobserver agreement, using an appropriate macro for SPSS. 23 Possible Krippendorff-α values range from −1.0 to +1.0, where +1.0 equates to perfect agreement, 0.0 means no agreement, and −1.0 implies perfect disagreement. Levels of agreement were interpreted as follows: 0 to 0.2 =slight, 0.21 to 0.4 =fair, 0.41 to 0.6 =moderate, 0.61 to 0.8 =substantial, 0.81 to 1.00 =almost perfect agreement. 24 Analysis was performed for all observers and repeated for readers who reported all scans and for different reader and scan subgroups.

Results
The clinical and radiological features of the 24 cases used in the study are summarized in Table 1.Scans were classified into 3 groups based on visual assessment of the perfusion deficit on PM: no perfusion deficit (4 scans), mild to moderate perfusion deficit (11 scans), and large perfusion deficit (9 scans). Examples are shown in Figure 1. Perfusion scans varied in their z coverage and slice thickness. Most scans (13/24) covered a 4 cm slab, acquired as 8×5 mm slices.
Fifty-seven observers participated in the study (Table 2), of whom 27 (47%) reviewed all scans, and 17 out of 27 (63 %) contributed repeat readings to determine intraobserver agreement ( Figure 2). Reviewers were from 10 countries, with most (36/57, 63%) working in the United Kingdom. There was no difference in experience, distribution of specialties, or frequency of viewing stroke scans between the observers who completed all scans and those who did not (Table II in  Interobserver agreement was fair to substantial for recognition of acute lesion on NCCT and on the different perfusion maps and fair to moderate for the extent of CTP-NCCT mismatch ( Table III in the online-only Data Supplement) with higher agreement for MTT and DT and PMs. Agreement on total ASPECT score was substantial to almost perfect, with agreement for MTT, DT, and PM higher than agreement for

Stroke
November 2019 NCCT, CBV, and CBF ( Figure 3). Agreement was slightly higher for the group that reviewed all scans compared with those who reviewed some scans only (Figures II, III, and IV in the online-only Data Supplement). Agreement varied among different observer and scan subtypes with the highest agreement among neuroradiologists (Tables IV through XV in the  online-only Data Supplement). Seventeen observers contributed to the intraobserver agreement part of the study (results shown in Figure 4). There were no differences in observer characteristics between those who did and did not undertake repeat readings (Table II in the online-only Data Supplement). The time between finishing the first set of readings and starting the repeat reading ranged from 36 to 314 days (median=233 days.). The shortest duration between finishing the first reading and starting repeat reading was 1 month. Levels of agreement varied widely between observer subgroups from slight to almost perfect (Tables XVI through XXVII in the online-only Data Supplement), with highest agreement for time-based parameters.

Discussion
The role of CTP in patients with acute stroke presenting within treatment windows of 4.5 hours for intravenous thrombolysis or 6 hours for endovascular thrombectomy remains uncertain, with some centers advocating routine use, 25 whereas others do not undertake CTP routinely due to concerns, such as delayed door to needle time or physician unfamiliarity. 26,27 Inconsistency in analysis software and proposed analysis parameters is likely to contribute to uncertainty and delay. Our study showed moderate to near-perfect interobserver and intraobserver agreement for time-based CTP parameters (DT and MTT) across observers with a wide range of experience and backgrounds.
Our results are consistent with previous studies that included few observers and fewer scans 4-11 and extend these previous reports by using a wider spectrum of observers and a larger number of scans, including repeat assessment to establish intraobserver agreement. Differences in the statistical measures used mean that direct comparisons with previous studies are difficult, but the main findings are in agreement, namely, that interobserver agreement is higher for perfusion parameters compared with NCCT, for time-based perfusion parameters (MTT, DT) and for penumbra/core dichotomized maps compared with CBV and CBF. More extensive brain coverage improved interobserver agreement. These trends were stable across all scan and reader subgroups. We also found higher agreement for neuroradiologists versus other specialties, although this was not explained by frequency of interpreting stroke or perfusion scans or years in speciality. Agreement was low in scans with limited z-axis coverage, but the size of the perfusion lesion had no effect.
The visual comparison of the perfusion lesion relative to the NCCT lesion size is analogous to the mismatch approach initially described for magnetic resonance imaging, with the hypoattenuation of brain on NCCT corresponding to ischemic core of largely irreversible damaged tissue 28 and more extensive hypoperfused tissue defined by prolonged MTT or DT corresponding to the potentially reversible penumbra. Because the conspicuity of acute ischemic tissue on NCCT is poor compared with diffusion-weighted magnetic resonance imaging, it is unsurprising that the agreement was lower for mismatch than for ASPECTS, which only depends on the perfusion component. Reduced CBF or CBV on CTP correlate more closely with the diffusion-weighted imaging lesion, but the discrimination of normal from reduced CBV or CBF is more difficult compared with time-based parameters. In general, MTT and DT are more uniform across gray and white matter, and between brain regions, than CBF and CBV which differ more between gray and white matter, and between brain regions, although all perfusion parameters are abnormal in old ischemic lesions and regions of leukoaraiosis. 29 CBV additionally has a narrow range of normal values. Interobserver and intraobserver agreement for CBF and CBV was similar to that for NCCT. Observer agreement for ASPECT scores applied to perfusion maps showed better agreement for time-based CTP and PMs compared with NCCT, CBF, or CBV, with agreement levels unaffected by the size of the ischemic lesion. This suggests that the visual interpretation of ischemic core on CT-based imaging is more consistent when readers use time-based perfusion parameters and a standardized scoring tool.
In previous studies, the volume of ischemic core tissue was weakly associated with higher risk of adverse outcomes in patients treated with reperfusion therapies, including for infarct swelling and symptomatic intracerebral haemorrhage 30,31 ; in several recent trials, a large core constituted an exclusion criterion, although an interaction of core volume and treatment effect could not be demonstrated. 15,32 The r esults of this study should be interpreted in view of its limitations. The observers were volunteers, presumably having some interest in CT perfusion imaging, and agreement may be different in the wider population of doctors routinely dealing with stroke imaging. The online platform experience deviates from real-life experience in several important aspects; all parameters are available to the observer simultaneously, so that interpretation of NCCT can be influenced by findings in CTP, as appears to have been the case here, and the time pressure related to reviewing a scan to make a treatment decision is not a factor. To reduce observer bias in the study through knowing stroke severity, the clinical information was introduced after the scans were reviewed to test readers' ability to detect scan findings without influence of clinical findings, but this is the reverse of clinical practice. The range of abnormalities in the scans we selected was limited to patients with mainly middle cerebral artery territory strokes. Finally, these results are limited to visual interpretation of perfusion maps produced by a single software and may not represent agreement with other software or means of presenting perfusion data.

Summary
Observer agreement on interpretation of CTP perfusion was fair to substantial, with better agreement for time-based sequences (MTT, DT) and threshold-based core/PMs than with CBF, CBV, or NCCT hypoattenuation. Agreement was higher when using the ASPECT score, compared with estimating the size of the perfusion lesion relative to the NCCT lesion, or to the presence versus absence of the perfusion lesion. Intraobserver agreement was also better with timebased maps, but varied significantly among individuals, showing opportunity for improvement.

Sources of Funding
This study was funded by the Efficacy and Mechanism Evaluation (EME) Programme (Grant EME 11/100/78), a partnership of the UK Medical Research Council and National Institute for Health Research. The views expressed in this publication are those of the authors and not necessarily those of the Medical Research Council (MRC), National Institute for Health Research or the UK Department of Health and Social Care; the IST-3 (IST-3 (Third International Stroke Trial)) was funded by the UK MRC (G0400069) and the UK Stroke Association among many sources, and the IST-3 perfusion substudy by Efficacy and Mechanism Evaluation programme (09-800-15) and support from many national funding agencies. 33