Introduction

Chronic obstructive pulmonary disease (COPD) is a respiratory condition characterized by persistent and progressive limitation in the airflow1. As one of the leading causes of disability and mortality globally, COPD accounted for nearly 3 million deaths in 20162. Due to an increasing proportion of an aging population, coupled with the high cigarette smoking rate, the prevalence of COPD in China has increased by 67% in the last 10 years. The total number of COPD patients stands close to 100 million3. As a result, COPD has become one of the major public health challenges in China, with attendant heavy economic and social burden.

The 2021 Global Initiative for Chronic Obstructive Lung Disease (GOLD) document categorically points out that pulmonary function test (PFT) is the gold standard for COPD diagnosis. In addition, it states that PFT is the reference basis for grading the severity of COPD and guiding follow-up treatment; thus making it key in the diagnosis, treatment, and management of COPD1. Studies suggest that the prevalence of undiagnosed and underdiagnosed cases of COPD in primary care is substantial4, with most patients only getting diagnosed when they have already lost their lung function5,6. Gao and colleagues summarized the current status of the application of PFTs in China and decry that these tests are under-used in primary care. In fact, they noted, some primary care centers did not even provide these tests7. The low utilization of pulmonary function tests has been cited as the main reason for the failure to diagnose or underdiagnose COPD8. Conducting the traditional laboratory PFTs may not be feasible under primary care settings due prohibitive costs relating to acquisition, storage, and maintenance of the instruments besides the lack of professional technicians capable of operating the machines. However, if all patients suspected to have COPD are referred to tertiary hospitals for PFTs, this will increase their costs of seeking medical care.

Portable spirometers are attractive to use in clinical practice due to their affordability, portability, and easy-to-operate characteristics. Several studies have shown that the measurements obtained with the use of portable spirometers are highly consistent with those of traditional spirometers9,10. Thus, portable spirometers have gained prominence in medical practice and clinical research and can offer a suitable alternative for the early detection of COPD in resource-limited healthcare settings. The purpose of this systematic review and meta-analysis was to quantitatively evaluate the diagnostic accuracy and feasibility of the use of portable spirometers in the diagnosis of COPD.

Methods

Study identification and selection

Two authors searched independently from PubMed, Embase, CNKI, Wan Fang and the Web of Science databases. The search strategy was based on the following keywords and text words: (“COPD” OR “chronic obstructive pulmonary disease” AND (“portable spirometers” OR “handheld spirometry” OR “screening tool”) and related synonym extensions. The search time was from January 2000 to July 2021 with no language restrictions.

Inclusion and exclusion criteria

For inclusion, Studies that designated the target disease as COPD. In addition, the individuals must have completed respiratory examinations using both a portable and traditional spirometer. Although peak flow meters are technically not spirometers, it was found to be used for COPD detection and pulmonary function evaluation in some studies. Therefore, we included peak flow meters in our study for comparing their sensitivity and specificity in COPD detection with other spirometers.

The following were excluded from the meta-analysis: (1) studies which did not report the numbers of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) as well as any other relevant data for the construction of two-by-two contingency tables; (2) onference proceedings, expert forums, systematic reviews, translations, and such like articles.

Data extraction

Data were extracted and cross-checked independently by two researchers. In case of any discrepancies, a third researcher was involved to adjudicate over the differences so that a common decision was reached. Information obtained from the studies include: (1) basic information such as author’s name(s), date of publication and sample size; (2) use of portable spirometers (including types, clinical setting, operators); (3) the number of TP, FP, FN, TN, and the threshold for identifying the positive values of the two tests. If more than one set of data (TP, FP, FN, and TN) was found, the set of data with the best diagnostic performance was chosen.

Quality assessment

We divided the risk of bias of included studies into “high risk”, “low risk”, and “unclear risk”. The quality of each article included in this meta-analysis was assessed using the QUADAS-2 checklist as provided in Review Manager, version 5.2 (Copenhagen: The Nordic Cochrane Centre, The Cochrane Collaboration, 2012)11.

Ethics statement

Procedures and experiment protocols were performed in accordance with the National Institute of Health Guide for Care and were approved by the Ethics Committee of China Medical University in accordance with the Declaration of Helsinki.

Statistical analysis

Forest plots of sensitivity and specificity were constructed using Review Manager, version 5.2. These plots were used to visually explore the diagnostic accuracy of each test. Statistical analyses was conducted using Stata, version 13.1 (Stata-Corp, College Station, Texas, USA). The “midas” command was used to fit the bivariate mixed-effects model to estimate coefficients and the variable-covariate matrix. The same was used to calculate the pooled sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), and diagnostic odds ratio (DOR) with 95% confidence intervals (CI) for each of the included studies. A summary receiver operating characteristic curve (SROC) was drawn, and the area under the curve (AUC) was calculated to describe and compare the accuracy of portable spirometers in the diagnosis of COPD. The accuracy of the diagnostic test was evaluated according to the value of the AUC, which was divided into five parts: non-informative (AUC = 0.5), less accurate (0.5 < AUC < 0.7), moderately accurate (0.7 < AUC < 0.9), highly accurate (0.9 < AUC < 1), and perfect tests (AUC = 1)12.

The I2 test was used to estimate the heterogeneity of the included studies contributing to the pooled estimate. After the influence of the threshold effect was excluded by Meta-Disc version 1.4 (Unit of Clinical Biostatistics Team of the Ramón y Cajal Hospital, Madrid, Spain). Random effect model was used to provide a conservative estimate of statistics, afterwards, potential heterogeneity was explored using subgroup analysis which both were conducted using STATA 13.1 Software. The subgroup analyses included the grouping based on threshold selection method (fixed value or the cutoff value), the type of portable spirometer indicators (multi-index or single index), country (developed countries or developing countries), study setting (hospital or normal population), type of executive place (tertiary hospitals or primary cares and communities), population (non COPD or the whole crowd), and the year of publication (2000–2016 or 2017–2021). The level of significance ɑ was adjusted for multiple comparisons. Sensitivity analyses were performed to determine the reliability of the results, and Deeks’ funnel plot was used to detect publication bias. The results were considered statistically significant when P < 0.05.

Results

Search results

A total of 2578 related articles were obtained in the initial database inspection according to the previously described search strategy (Fig. 1). After removal of duplicate publications, title, and abstract screening, 244 articles were identified as being potentially suitable for inclusion. Subsequently, 188 articles were selected after reading the full text. Finally, 29 articles were included in the meta-analysis after applying the inclusion and exclusion criteria (including 2 in Chinese and 27 in English). One of the articles used two portable spirometers in the same population, and one of the articles conducted one portable spirometer in two different population, for these reason, it was split into two independent studies in the subsequent analysis. Consequently, 31 studies were included in this meta-analysis.

Fig. 1: Studies selection for meta-analysis.
figure 1

COPD chronic obstructive pulmonary disease.

The studies included in this meta-analysis were conducted in 15 countries (6 in China, 5 in Spain, 3 in the UK, 3 in Japan, 2 in Australia, 2 in Korea, 2 in India, and 1 in the United Arab Emirates, Germany, Netherlands, Croatia, Sweden, Iran, Malaysia, and Greece, respectively), in tertiary hospitals (10 articles), primary care units or community settings (21 pieces), and utilized nine types of portable spirometers. These devices were COPD-6 (n = 14), PIKO-6 (n = 6), PEF (n = 4), Hi-Checker (n = 2), and IQ-Spiro (n = 1), Medikro SpiroStar (n = 1), MS01 Micro spirometer (n = 1), SP10BT (n = 1), Spirobank Smart (n = 1) (Table 1).

Table 1 Characteristics of studies included in the meta-analysis.

Literature bias risk assessment and publication bias

Generally, the quality of the included studies ranged between medium to high. Assessment using QUADAS-2 tools found that “patient selection” and “flow and timing” parts of tools were clear for most studies. The risk of bias mainly arose from the selection method of the threshold of the index test and the lack of a strict blinding method between the index test and the reference test (Figs. 2 and 3).

Fig. 2: Summary map of risk of bias across domains of the included studied.
figure 2

Using QUADAS-2 tool. Key domains: patient selection; index test; reference standard; study flow and timing. The risk of bias is indicated by three colors, red for high risk of bias, yellow for unclear risk of bias, and green for low risk of bias.

Fig. 3: Risk of bias and corresponding applicability concerns across included studies.
figure 3

Using QUADAS-2 tool. Key domains: patient selection; index test; reference standard; flow and timing. The risk of bias is indicated by three colors, red for high risk of bias, yellow for unclear risk of bias, and green for low risk of bias.

There was no significant publication bias as determined by the Deeks’ funnel chart, which showed that the angle between the regression line and the horizontal axis was close to 90° (P = 0.34) (Fig. 4).

Fig. 4: Deeks’ funnel plot asymmetry test for evaluation of publication bias.
figure 4

The closer the angle between the regression line of the Deeks’ funnel plot and the horizontal axis (x) is to 90, the less likely it is to suggest that there is publication bias.

Diagnostic accuracy of spirometry

The results of TP, FP, FN, and TN in the diagnosis of COPD in each study are shown in (Fig. 5). The Spearman correlation coefficient between the logit of sensitivity and logit of 1-specificity was 0.011 (P = 0.955), indicating that there was no threshold effect in the study. However, the I2 values were high (I2 = 99%, P < 0.01). We chose the random-effects model to conservatively estimate the summary statistics. The results show that the pooled sensitivity, specificity, PLR, NLR, and DOR with 95% CI are 0.85 (0.81–0.88), 0.85 (0.81–0.88), 5.6 (4.4–7.3), 0.18 (0.15–0.22), and 31 (21–46), respectively. The area under the SROC (AUC) was 0.91 (0.89–0.94) showing that the accuracy of the portable spirometer is 91% and is very close to the reference test (Fig. 6).

Fig. 5: Forest plot of sensitivity and specificity of each screening test.
figure 5

TP true positive, FP false positive, FN false negative, TN true negative.

Fig. 6: Bivariate summary estimates of sensitivity and specificity, with corresponding 95% confidence ellipse around the mean values, for all studies.
figure 6

SENS sensitivity, SPEC specificity, SROC summary receiver operating characteristic, AUC area under the curve.

Subgroup analysis

The outcomes of the subgroup analyses are summarized in Table 2. PIKO-6 had the highest diagnostic accuracy with the area under the SROC (AUC) of 0.95 (0.92–0.96). AUC value for COPD-6 was 0.91 (0.88–0.93), and for PEF it was 0.82 (0.78–0.85). There were statistically significant differences in the diagnostic accuracy indices among COPD-6, PIKO-6, and PEF (P all < 0.0167) after adjusting the level of significance ɑ due to multiple comparisons (Fig. 7). According to the classification of detection indicators, portable spirometers with FEV1/FEV6 showed the area under the SROC (AUC) was 0.92 (0.90–0.94). There were statistically significant differences in the diagnostic accuracy indices between PEF and FEV1/FEV6 (P < 0.001), and between FEV1/FEV6 and FEV1/FVC (P = 0.007) after adjusting the level of significance ɑ due to multiple comparisons.

Table 2 Subgroup analysis of all included studies.
Fig. 7: The SROC of portable spirometers classified by type.
figure 7

All the other portable spirometers including IQ-spiro, SP10BT, Medikro SpiroStar, MS01 Micro spirometer and Spirobank Smart.

Based on the subgroup analyses, sources of heterogeneity could not be traced with regard to the threshold selection method, study setting, population, and year of publication. However, the source of heterogeneity can be attributed to the place of execution, the type of portable spirometer classified by indicators and the country sorted by Human Development Index (HDI). When classified by indicator type, the area under the SROC (AUC) was 0.92 (0.89–0.94) for the multi-indices group and 0.82 (0.78–0.85) for the single-index group. This difference was statistically significant both in sensitivity and AUC (P all < 0.001). When we grouped studies by the clinical setting in which the spirometry was conducted, the area under the SROC (AUC) of the tertiary hospital group was 0.96 (0.94–0.97), and 0.89 (0.86–0.91) for the primary care and community group. Statistically significant differences in AUC and specificity were noted between these groups (P all < 0.001). When we grouped studies by country, the area under the SROC (AUC) of the developed country was 0.90 (0.88–0.93), and 0.94 (0.91–0.95) for the developing country statistically significant differences in AUC (P = 0.015) and specificity (P = 0.023) were noted between these groups.

Discussion

COPD has been widely underdiagnosed so far. PFT has been recommended as the gold standard for COPD diagnosis and monitoring1. However, such tests are not readily available or applied to all patients in need, leading to the absence of standard diagnosis and treatment, and subsequently the deterioration of COPD13. A decision-analytic model conducted by Qu S et al. showed that portable spirometer is likely the optimal option in the early screening and follow-up of patients in China14. In addition, multiple screening questionnaires have been developed as active case-finding tools to identify potential COPD patients in primary care15. Haroon16 compared the diagnostic accuracy of screening tests in primary care in 2015, finding that portable spirometers had a sensitivity of 79.9% (74.2–84.7%) and a specificity of 84.4% (68.9–93.0%). He concluded that portable spirometers demonstrated higher test accuracy than questionnaires for COPD screening in primary care. However, There were only three relevant references in Haroon’s study concerning portable spirometers, which was too small to further analyze the clinical application effects and influencing factors of portable spirometers. To address this gap, we performed a more detailed and comprehensive meta-analysis in this field by including 31 studies for systematic evaluation and quantitative analysis. Across the studies, nine types of portable spirometers were used under three different kinds of medical environments. We excluded the influence of threshold effects and used random-effects models to pool the data. The results show that the area under the SROC (AUC) of 0.91 indicate that the portable spirometer has high accuracy and can be used as an alternative for traditional pulmonary function tests in COPD screening, primary diagnosis and subsequent monitoring. COPD-6, PIKO-6, and PEF are three commonly used portable spirometers in clinical practice. From a diagnostic accuracy perspective, PIKO-6 has the highest diagnostic accuracy rate (95%), followed by COPD-6 (91%) and PEF (82%) with statistically significant difference among them (P < 0.05).

The heterogeneity in these studies was explored by subgroup analyses. According to the GOLD guideline, post-bronchodilator FEV1/FVC < 0.70 is the criterion for diagnosing COPD1. However, a qualified FVC measurement based on the ATS guideline has high requirements for the subject and the operator. A growing number of studies indicated that the FEV1/FEV6 could be served as an alternative choice for FEV1/FVC17,18,19. FEV6 is more accessible to measure than FVC and reduces the probability of spirometry complications. Several portable spirometers, such as COPD-6, PIKO-6, were designed to measure FEV6 instead of the original FVC. As mentioned above, some studies in this meta-analysis defined the fixed FEV1/FEV6 < 0.70 as airflow obstruction. However, considering that the reference formula for FEV6 was originated from the lung function database of the Third National Health and Nutrition Examination Survey(NHANES III) conducted in American20. There is still a debate on whether its application to the population of other countries and regions will make a difference. Some studies have been modified its ratio from the fixed value to an optimal cutoff value based on the national population. Therefore, we compared the diagnostic accuracy of portable spirometer using the fixed value with the cutoff value. Our study found that differences in the diagnostic accuracy of portable spirometers have nothing to do with the threshold selection method. Although the cutoff value can get the best diagnostic effect, a fixed value can also be acceptable if applied to primary diagnosis or community screening. Besides, we also investigated the diagnostic accuracy of all spirometers using FEV1/FEV6 ratio. Our pooled estimates showed a diagnostic accuracy of 92% with FEV1/FEV6, compared with the gold standard using FEV1/FVC. This study showed that the FEV1/FEV6 could be served as an alternative choice for FEV1/FVC for the diagnosis of COPD. Still, Soares et al. compared the sensitivity of FEV1/FEV6 with that of FEV1/FVC and concluded that although FEV1/FEV6 showed a good sensitivity of 85.6–95%, when it comes to mild airway obstruction, the sensitivity will be decreased21.

We found that the heterogeneity in test accuracy between studies was likely to arise from differences in the type of portable spirometer index, executive place, and country, but not in the threshold selection method, study setting, population, or year of publication.

Portable spirometers, for example, PIKO-6 and COPD-6 can give multiple respiratory function indicators, such as FEV1, FEV6, and FVC. These indicators can help in the diagnosis of patients with COPD, as well as in estimating the severity of the disease, and then guiding the choice of an appropriate treatment plan. However, there are still some studies22,23,24,25 that applied PEF for screening and diagnosis of COPD. As single-index spirometry, PEF has been widely used to diagnose and monitor asthma patients26,27. Liu YN28 and Jackson et al.29 concluded that PEF has extremely high sensitivity (98.5–100%) in screening moderate to severe COPD patients. To assess whether the type of indicators impacts the diagnostic accuracy. This meta showed that the diagnostic accuracy of multi-index spirometry (92%) was higher than that of single-index (82%) (P < 0.001) and also showed an advantage in sensitivity, 85% in multi-index and 77% in single-index. Therefore, from a clinical implementation perspective, PEF is inexpensive, easy to operate, and the patient can use it easily. Still, its diagnostic accuracy rate renders it unsuitable for diagnosing COPD. It is appropriate to be used to follow-up patients with stable COPD. Previous studies have shown that PEF could be regarded as a prediction tool for the prognosis of the disease30,31. Large variability in daily PEF indicated instability in the condition and was susceptible to acute exacerbations.

In our study, the PFTs in tertiary hospitals were all conducted by professional technicians while in primary care centers and communities, these were completed by trained doctors, nurses, or physician assistants. The area under the SROC obtained by the portable spirometer in tertiary hospitals was much higher (0.96) as compared to that in primary care centers or communities (0.89). The results show that PFTs performed by trained general practitioners, nurses, or laboratory assistants in primary care centers and communities can effectively identify persistent airflow limitation, however, compared with professional technicians, there is still room for improvement in diagnostic accuracy measurement. Previous studies have demonstrated that at least 90% of subjects can get acceptable and reproducible results under the operation of experienced professional technicians32 but this rate is much lower (58.5–71%) for primary care institutions33,34,35. Taken together, the observations suggest the need to strengthen the supervision of the normative diagnosis and treatment of COPD in resource-limited settings. The professional knowledge, reproducibility, and accuracy of PFTs can be significantly improved for practitioners in medical institutions who have undergone standardized training33.

The prevalence of COPD differed across countries and regions. In this study, the diagnostic accuracy of portable spirometers conducted in developing countries was superior to developed countries. We examined the composition of two groups and found that the difference may be that the executive place of the portable spirometer in developed countries was a larger proportion of primary care or community, 73.91% (17 articles) in developed countries, and 50% (4 articles) in developing countries. As mentioned above, there was a difference in the quality of PDTs between trained general practitioners in primary cares and professional technicians in tertiary hospitals. Generally speaking, the accuracy of spirometers conducted by professional technicians in tertiary hospitals could meet an acceptable quality, whereas temporary trained operators in primary cares or communities could not meet for so far. Therefore, regardless of countries or regions, we need to strengthen regular training on spirometer operators to turn this situation around, especially operators in primary care.

Although providing useful insights, there are some limitations to the present study. First, although we used subgroup analyses to explore the sources of heterogeneity, the subgroup variables could not offer complete explanations. This suggests that there may be other confounding variables as sources of heterogeneity. Second, only three included studies were randomly assigned the test order. A portable spirometry test usually precedes the traditional one, which may cause a bias in the test order, that is, the learning effect36. Our results show that portable spirometers exhibit high accuracy even in the presence of learning effect. Third, in some studies, both PFTs were performed by the same operator so that blinding was not strictly achieved. Finally, the accuracy of the instrument itself, the choice of the target population, the differences in research design, and the operating procedures may also affect the accuracy of results achieved.

In conclusion, portable spirometers have high accuracy in the diagnosis of COPD. Multi-index Spirometer, such as COPD-6 and PIKO-6, shows superior accuracy over single-indicator. Compared with FEV1/FVC, FEV1/FEV6 can be regarded as a viable surrogate indicator for diagnosing COPD. It is worth noting that although portable spirometers are easier to manoeuvre than laboratory spirometries, they also need to be performed under strict quality control. Standardized training for operators should be strengthened to ensure reliable and reproducible measurements. Portable spirometers are characterized by high accuracy, user-friendly, patient-friendly, inexpensive, and portable, making them suitable for primary care use and providing a feasible pathway for early diagnosis of COPD.