Comparison Between Lung Ultrasonography Score in the Emergency Department and Clinical Outcomes of Patients With or With Suspected COVID‐19

Chest CT is the reference test for assessing pulmonary injury in suspected or diagnosed COVID‐19 with signs of clinical severity. This study aimed to evaluate the association of a lung ultrasonography score and unfavorable clinical evolution at 28 days.

being mobilized to assess and determine the best management of suspected or confirmed COVID-19. 1,2Learned radiology societies in France and the United States have considerably reduced the role of chest radiography for the initial assessment of patients with suspected or confirmed COVID-19. 3,4he increasing availability of handheld ultrasound devices 5 and the non-irradiating nature of acoustic waves have led to considering ultrasonography (US) as a quick triage tool for suspected or diagnosed COVID-19.
Between March 2020 and August 2020, several studies documented the pulmonary US semiology of COVID-19. 6,7In July 2020, the WHO published a set of recommendations, mentioning lung US (LUS) as a tool of interest in the diagnosis. 8][11][12][13][14][15] These studies evaluated the prognostic value of LUS with respect to severity outcomes such as respiratory support needs, intensive care unit (ICU) admission, and death, with lung scoring systems acquired on more than 10 chest zones.Most of these studies were mono-centric, with low sample sizes (in general N < 100, with the exception of 16 with N = 447), rarely explored the clinical outcomes with sufficient follow-up (which matters when assessing death as an outcome), and led to more than 10 lung zone scores.Some studies showed an insufficient clinical prognostic value (again, mono-centric studies with low sample size). 17There is still a need to add to the scientific literature on the topic of LUS as a risk stratification tool in COVID-19, perhaps for inclusion in future meta-analyses.
In the eChoVid study, we use an 8 chest-zone scoring system.Buessler et al 18 showed that to detect lung interstitial lesions in the context of acute heart failure, a 28-point LUS examination carried no specific benefits over a 6-or 8-point examination.Simplified lung scoring acquisition may be of interest when considering LUS as quick triage tool and also wide usage with its enhanced ease of use for non-expert operators.
Many studies showed a significant effect of a short training on the ability to perform LUS. 19The use of US, non-invasive, point of care, in real time, could facilitate the initial assessment of patients and management of patient flow.
We hypothesized that 1) a specific LUS score could be helpful in the emergency department for predicting the clinical evolution of COVID-19 at 28 days and 2) LUS could be safely performed by emergency physicians who were not experts in pulmonary US.
We performed a prospective study to evaluate 1) the association between a specific global score (GS) based on LUS for severity of COVID-19 and unfavorable clinical course at 28 days; 2) the performance of the GS and severity of lung damage according to chest CT scan; and 3) the performance, based on the GS, of a newly trained US operator and a US expert.

Study Design and Patient Selection
This was a multicentric, observational non-randomized study conducted in 8 emergency departments in France (Table 1).From March 19, 2020 to April 28, 2020, we enrolled patients referred to one of the recruitment centers because of initial clinical The multicentric, observational non-randomized study was conducted in 8 Emergency Departments located in France.
b Physicians used the ultrasonography available equipment with no specific requirement on machine performance.suspicion of COVID-19 and with COVID-19 secondarily confirmed by RT-PCR.Exclusion criteria were patients for whom the LUS exploration could not be performed (ie, morbid obesity, extensive thoracic subcutaneous emphysema, absorbent subcutaneous infiltrations) or with any comorbidity that justified priority immediate intensive care.
Clinicians involved in the LUS assessment were emergency physician experts in pulmonary US (certification in US and more than 5 years' experience in US) or newly trained pulmonary US operators.All investigators had received the same curricula with regard to clinical US.

Intervention
After inclusion, each patient underwent both a clinical examination and LUS, each by a different emergency physician, with blinding to each other's findings.Emergency physicians who performed LUS were blinded to clinical, imaging, and biological patient data.
For each quadrant, we defined 4 grades of severity (GS): Grade 0: up to a maximum of 3 observed B-lines; Grade 1: 4 to 8 B-lines, through the intercostal space at one of the pulmonary bases; Grade 2: B-lines in "curtain sign" (>8 B-lines) and/or diffusion of more than 4 B-lines in two-thirds of the pulmonary field; Grade 3: consolidation foci.
The US score used in our study for assessing lung condition was derived from the standard LUS score for 12 lung points. 13,14We simplified this score to 8 points on the upper and lower parts of the anterior and posterior regions of the left and right chest wall.Our GS, summing over 8 quadrants, ranged from 0 to 24 points.This simplification was deduced on the basis of the consensus in France with regard to LUS practices based on the Lichtenstein work, 20 with which all the investigators were trained.
For US evaluation, physicians used the available US equipment with no specific requirement for machine performance.All acquisitions were performed with curved probes; a handheld device (Vscan) was used in one of the sites.
Follow-up was performed on days 5, 15, and 28 to detect one of three outcomes: ICU admission, mechanical invasive ventilation, or death.If patients were not hospitalized, investigators contacted them by phone (or their trusted person in case of nonresponse).

Training of Trainees
Four participants never trained in pulmonary US completed a standardized training course: a 30-minute protocol of US theory with a review of pathology images from an image bank and practice on 5 patients with suspected COVID-19.Each lung evaluation performed by the trainee was repeated by an expert, with blinding to the trainee's results.

Data Collection and Data Sources
For each patient, the local investigator used a standardized computer form to collect patient demographic data, vital parameters at emergency department admission, details of the scoring for the 8 quadrants and the GS.
The results of the RT-PCR COVID-19 test and CT scan were collected.Each chest CT scan was reviewed by an expert radiologist, which led to a secondary analysis for confirmation of COVID-19 severity.The severity of lung damage on the CT scan was scored as normal (0%), minimal (0-10%), moderate (10-25%), and severe (>25%). 21

Outcomes
The primary outcome was the association between GS at emergency department admission and clinical worsening, defined as the occurrence of at least one of ICU admission or mechanical invasive ventilation, or death within the 28-day follow-up.
Secondary outcomes were 1) a comparison of the performance of the GS and the severity of lung damage according to the chest CT scan and 2) the performance of a newly trained US operator and a US expert based on a comparison of the GS and score details of each quadrant in both groups.

Study Size and Statistical Analysis
A sample of 300 patients with documented level of severity would allow for estimating an area under the receiver operating characteristic curve (AUC) ≥85% with precision AE5% or better.All quantitative data are presented with mean SD or median (Q1-Q3; range) according to data distribution.Categorical data are summarized with number (%).
For the primary outcome, the association of clinical worsening and GS was evaluated by univariate logistic regression with the GS predicting clinical worsening.For each regression model, we calculated the AUC for the different values of GS.
The performance of the GS was evaluated by univariate logistic regression with the GS predicting CT disease quantification dichotomized as normal versus pathologic, normal or minimal versus moderate or severe, or normal, minimal or moderate versus severe.For each regression model, we calculated the AUC, Brier score, and Youden index for the different values of the scores.Model validation for calibration and discrimination ability involved bootstrap replications, and degrees of optimism were calculated for C statistics and the Brier score.For comparing GS and CT severity scores (normal or minimal vs moderate or severe, or normal, minimal, or moderate vs severe), we discarded patient data with collected CT status (pathological/normal) but missing CT severity score.For comparing the performance of a newly trained US operator and a US expert, some patients were evaluated for GS by both an expert and a newly trained operator.We evaluated the agreement between them by calculating the weighted kappa 22 of US severity grades for each quadrant and used the Bland and Altman method to evaluate agreement for LUS.

Primary Analysis
Clinical status at day 28 was collected for 328 of 332 patients: 13 (4%) died, 16 (4.9%)underwent invasive mechanical ventilation, and 28 (8.5%) were referred to an ICU.In addition, 90 (27.4%) were hospitalized in a standard-care unit during the 28-day follow-up, and 181 (55.2%) returned home.Details of the GS for each status are in Table 4.
The association between clinical worsening and GS was optimal, with GS ≥6, corresponding to an AUC of 0.83 (Figure 2), sensitivity 84.2%, and specificity 76.4% (Youden Index: 60.6%).The optimism-corrected C-statistic was À0.0003 and the Brier Score was 0.12 with optimism-corrected C-statistic À0.0007.For patients with COVID-19, the association between clinical worsening and GS was optimal, with GS ≥6 (Table 5).Moreover, if we consider the clinical outcomes individually, the GS was optimal, with GS ≥6 for only the ICU admission outcome, ≥6 for the invasive mechanical ventilation outcome and ≥10 for only the death outcome (Table 6).

Secondary Analysis
The association between CT evaluation of disease (normal or minimal vs moderate or severe) and GS was optimal with GS ≥7, corresponding to an AUC of 0.84 (95% confidence [CI]: 0.77-0.92),with sensitivity 77.2% and specificity 83.7% (maximal Youden index) (Table 7).The AUCs to predict clinical status (death or mechanical ventilation or ICU admission) for patients with full CT data were 0.71 (95% CI: 0.62-0.79)for the GS and 0.74 (95% CI: 0.65-0.82)for the CT severity score (Figure 3).In total, 48 (14.6%)LUS evaluations were performed by newly trained operators.Bland and Altman plots revealed substantial agreement between expert and newly trained operators considering each quadrant individually (n = 48, 4 new trainees), with weighted kappa values of 0.62 to 0.81 (Table 8).Considering the above GS with maximal Youden index = 7 as a reasonable score to discriminate lung injury severity, we found good agreement between experts and newly trained operators to distinguish a GS ≥7, with kappa values of 0.85 (95% CI: 0.69-1.00)and 0.85 (accelerated bias-corrected 95% CI: 0.60-0.96)(Table 9).

Discussion
In this multicentric, prospective, observational study, our GS was associated with clinical worsening for patients with suspected or diagnosed COVID-19 at 28 days after presentation to an emergency department.The AUC was 0.83 for comparing lung damage severity with the GS, summing the severity over 8 chest points, to the CT scan severity score.Another key finding, although very preliminary, is the concordance in GS scoring between an expert and newly trained US operator, with kappa values >0.85.
With regard to the existing literature on LUS as a risk stratification tool, our study was performed with reasonable statistical power (N = 328), executed with both a standard and handheld US device, and led to a simplified score for lung scoring.Although this point is essentially not discussed in the studies we referenced, we took into account "true hospitalization" and ICU-like hospitalization (defined in our investigation centers as patients requiring over 5 L/min oxygeno-therapy) because in the event of  eChoVid may serve as a supplementary input for meta-analysis (one meta-analysis 23 did not take into account many relevant studies) mitigating the low size, mono-centricity of many studies or eventually negative findings on the topic 17 and helping the medical community define the place of LUS as a risk stratification tool.
Our study shows the interest of LUS in the first clinical assessment of patients with suspected COVID-19 and the triage of patients at high vs low risk of clinical worsening.Also, we merged the clinical outcomes of severity into one, which was relevant to measure the overall risk of clinical deterioration in a patient given a first US examination.In addition, at the time the study protocol was drafted (beginning of March 2020), hospitals in France were overwhelmed with COVID-19 patients, so we added days 5 and 15 as intermediate points to acquire clinical outcomes.Indeed, we thought that intermediary results regarding outcomes at days 5 and 15 may lead to a significant correlation between the US severity score and severity clinical outcomes and that these could be shared within the French medical authorities and radiology learned societies.However, this expected outcome was not observed in the intermediary analysis.
The good sensitivity of LUS is not surprising. 12,24evertheless, LUS findings must be interpreted with caution and should not lead to ruling out other causes of dyspnea (pulmonary embolism). 25Without a blinding condition, the specificity of LUS may be improved with interpretation considering the clinical context.Also, LUS may be of interest for severity assessment of lung lesions; however and especially given the rapidity of PCR testing, there is no substantial argument to value LUS as an initial diagnostic tool.
LUS must not compete with chest CT, especially when the patient requires closer lung status assessment.Indeed, the GS may be difficult to determine for patients with causes that prevent GS interpretation (morbid obesity, sub-cutaneous emphysema, etc) or pre-existing conditions (emphysema or fibrosis), thus affecting the relevance of the operator's interpretation.However, the GS score and CT severity score performed comparably for predicting the primary outcome (death or mechanical ventilation or ICU admission): AUCs were 0.72 (95% CI: 0.62-0.79)and 0.73 (95% CI: 0.65-0.82)for the GS and CT, respectively, and the difference between the two scores was nonsignificant (Mann-Whitney, P = .52).
We standardized the evaluation criteria for LUS and simplified according to Buessler et al. 18,26 The GS for LUS severity may be improved.The score carries information about the severity of lung injuries, but low scores do not sufficiently address their topographic distribution.Indeed, a high lung injury score in a sole quadrant and low scores affecting several quadrants may lead to the same GS.The median, average or maximum severity is too naive to provide specific or sufficient information on the extension of tissue injury throughout the lung.
To improve the US scoring, an ongoing study is comparing CT and LUS for each chest point.The study includes more refined statistics based on machine-learning techniques, exploring linear and non-linear effects (such as the non-linear jump in condition between GS at 0 and 1) or injury distribution patterns throughout chest zones.Although this approach suggests gains in specificity, a more complex score may not be easy to compute in practice.
In our study, although the agreement between experts and newly trained operators seems promising, training protocols may be improved and tested with a larger pool of newly trained operators.8][29] Our agreement findings are less significant when comparing the performance for chest zone by chest zone, especially because the assessment of B-lines may be variable for new trainees.This observation may be related to the fact that, together with the investigators of the study who performed the training sessions (both expert ultra-sonographers and experienced trainers), we set the training protocol as a quick practice of 5 scans.This training was grounded on experience and the fact that training focused on some specific signs and did not cover the wide range of US lung semiology.Though US training protocols are not well standardized, 30 our protocol may have benefited from following some practices documented in the literature, such as training with a higher number of scans (eg, 25 scans in Reference 31).
Also, we wanted to compare performance with regard to predicting clinical outcomes, and our results show that the global assessment was comparable between experts and new trainees.All images acquired and scored by new trainees were reviewed by two experts (both investigators of the study), and disagreements were resolved by discussion.
Our study has several limitations.First, we did not evaluate the treatment received by patients, which could highly modify the prognosis and bias our primary outcome.Second, we used a clinical approach and included all patients with suspected COVID-19 and with radiology or RT-PCR confirmation during a period with a remarkably high incidence of COVID-19 in France.This situation could suggest recruitment bias.Our results must be confirmed in other studies including patients with all etiologies of febrile dyspnea.

Conclusion
The LUS severity GS could be used for assessing the severity of lung injuries in patients with suspected or diagnosed COVID-19 and for predicting clinical worsening at 28 days.LUS findings seemed to be consistent with chest CT findings.The assessment of training and learning curves need an enhanced protocol for further studies.The point-of-care nature of the examination, the accessibility of the device, the real-time interpretation, and the non-invasive technology may suggest LUS as a relevant screening tool for assessing lung injury severity.

Figure 1 .
Figure 1.Flow chart of included patients.

Figure 2 .
Figure 2. Area under the receiver operating (RCC) curve for the association between clinical worsening within the 28-day follow-up and the lung ultrasonography global score (GS).

Table 1 .
Investigation Centers and Material a

Table 3 .
Details of the Lung Ultrasound Score for Each Quadrant (n = 328)

Table 4 .
Details of Lung Ultrasound Global Score for Each Status at 28 Days Follow-Up (n = 328)

Table 5 .
Association Between Clinical Worsening Within the 28 Days Follow-Up Period and Lung Ultrasound Global Score for Patients Diagnosed With COVID-19 a Optimal cutoff point as the point maximizing the Youden index.Benchoufi et al-Lung Ultrasonography for Predicting COVID-19 Severity Outcomes J Ultrasound Med 2023; 42:2883-2895

Table 6 .
Performance of the Lung Ultrasonography GS to Predict Each of the Severity Outcomes: Intensive Care Unit Hospitalization, Invasive Mechanical Intubation, and Death

Table 7 .
Performances Lung Global Score GS to Predict CT Severity Score AUC (95% CI) a Optimal cutoff point as the point maximizing the Youden Index.

Table 8 .
Bland and Altman Plot for Agreement Between Experts and New Trainees on Each Chest Zone (n = 48, 4 New Trainees)

Table 9 .
Agreement Between Experts and New Trainees for Discriminating GS ≥7 (N = 48 Pairs of Raters)