A non-contact camera-based method for respiratory rhythm extraction

,


Introduction
The analysis of the respiratory signal has been a topic for study in medical practice in the last century, with special relevance on detecting sleep disorders or respiratory related pathologies [1,2]. Nowadays the analysis of the respiratory signal has been expanded to other fields that differ from medical practice, such as detection of respiratory frequency during exercise [3] or even to assess the parasympathetic activity of the autonomous nervous system related to the breathing frequency [4].
Advanced driver-assistance systems (ADAS) have been also a persistent topic for the past few years. These systems have been designed to increase the safety on the roads; one example of these systems could be the drowsiness detection algorithms while driving. Those, are of an special interest as sleeping while driving has 17% average prevalence in Europe, 7% of which suffered a crash as a direct consequence of falling asleep while driving [5]. To assess this issue, multiple systems have been proposed to detect attention [6] or even try to assess drowsiness by the means of physiological variables such as EEG [7] or ECG [8]. Although all these examples have been designed to obtain and analyse these variables, none of them can be used in real life. Among their limitations, we find a dependence on devices that have to be placed on the driver, either in the head or chest (EEG and ECG), moreover both methods suffer from motion artefacts. Another issue is that the implemented algorithms are not suitable for the measurement environment, such as sudden movements of the subject or real-time performance. A new novel approach to detect drowsiness based on the acquisition and processing of physiological variables has been presented [9,10]. This method uses the respiratory signal to detect the drowsiness of the subject while driving, hence the algorithms can perform in real-life conditions.
On the other hand, acquisition of physiological variables through non-contact methods have also been a recurring topic in the past few years. Nowadays, signals such as respiration or heart rate can be easily acquired with a wide variety of methods [11][12][13], comprising from ultrasound systems [11] to more advanced Doppler radar-based methods [12].
More recently, acquisition methods based on consumer-grade cameras have been proposed [14][15][16][17][18]. This methods comprise a wide variety of processing algorithms ranging from Eulerian Video Magnification [19] to colour intensity variation analysis or even Optical Flow approaches [20]. On the other hand, multiple studies can be found in the literature where there is a lack of validation of the obtained respiratory signal in terms of breath to breath error reporting. Nevertheless, just a few of them can actually work in real-time or be suitable to perform with real-life conditions. As an example, the methodology proposed in [14] can be used to obtain the physiological variables of multiple subjects, but as a limitation, it does not provide any breath to breath error assessment and it relies on a manual ROI selection for the algorithm to properly work. Another example of respiratory signal extraction is presented in [16], although this method does not require any ROI, as the algorithm uses the whole image to compute the Optical Flow in order to measure the respiratory signal, may not be suitable to be used in a moving environment where the background is always changing. Moreover, the study does not report the error in terms of breath to breath analysis but as the mean respiratory rate in a 10 s window. Another example of non-contact method is [17], that uses an RGB-D camera (Microsoft Kinect) to obtain the respiratory signal. In this study, the ROI is obtained by the means of the pose-skeletal estimation given by the Kinect camera. One downside of the method proposed in [17], and although it can perform in real-time and the ROI is automatically obtained, is that it requires specialised hardware.
The method proposed in [18], it performs in real-time and while the subject does not need to select a ROI as it is already preselected, as the subject needs to stand still in front of the camera any postural change will have a negative impact in the extracted respiratory signal. [15] requires a ROI to be selected by the subject before the algorithm can start to measure, this processing step could limit the use case inside a vehicle, as multiple subjects may need to recalibrate the ROI prior to driving. One novelty of the method in [15] lies on the breath to breath analysis of the method compared to a reference method.
The aim of this work is to present a non-contact method [21,22] to acquire the respiratory signal with the use of a consumer-grade camera. The proposed algorithm detects the thoraco-abdominal movements by the means of a custom-designed pattern placed on the chest of the subject. The algorithm has been designed to work in real-time as the extracted signal is intended, but not limited, to be used in the future for the detection of drowsiness [9,10] while driving. To validate the proposed method, the extracted signal has been compared with a commercial inductive plethysmography system (Respiband from BioSignalsPlux™ [23]) used as a reference method on a breath to breath basis. A car chair has been used to emulate a real cockpit as the final objective of this method is to acquire the respiratory signal while driving. Both reference method and video-feed from the camera have been acquired simultaneously with the same computer.  Fig. 1 shows the full diagram of the proposed algorithm with all the steps and its relationships. In the diagram, three distinctive stages are defined: Pattern Detection, Feature Tracking and Signal Extraction. The algorithm was designed to perform in real-time, which implies that the execution time of the whole algorithm should not exceed the the time interval between two consecutive frames. The proposed algorithm was built based on OpenCV version 3.4.

Reference image
The generated pattern and the reference image used by the algorithm are presented in Fig. 2. The pattern consists on a series of vertical an horizontal lines, with three squares on the top and bottom of the image, respectively, to maximize the amount of features that can be extracted, as well as the contrast between lines and background. An example of the generated pattern is depicted in Fig. 2a. While the generated pattern has been designed to maximize features, number of corners present in the image and contrast between lines and background, as it is a computer generated image there is no "texture" on it, and provided that a printed pattern will be used on the subject (Fig. 2c), the computer generated pattern cannot be used as a reference image as it would produce less matching characteristics between the pattern located inside the scene and the reference image. A photography of the printed pattern on a black background has been taken to be used as a reference image, as it can be seen in Fig. 2b. This later image provides more information about the real texture (features and characteristics), as well as more similarities with the one used on the subject, than the computer generated one.
Once the patterns are placed on the subject, as depicted in Fig. 2c, the algorithm can start the detection of the pattern inside the frame, tracking of the obtained features, and the posterior respiratory signal extraction.

Pattern detection
This section refers to the part of the proposed algorithm represented in the dotted region inside Fig. 1 comprised on: extracting the features of both reference image and frame, matching the features to obtain the locations of the patterns in the image, clustering the detected features so multiple patterns can be found inside one frame, and finally, performing an Homography [24] for each pattern (cluster of features) to guarantee that no false-positives are found.
The feature extraction is done by the means of the ORB algorithm [25]. ORB (Oriented FAST and Rotated Brief) is an algorithm proposed by Rublee et al. and implemented in OpenCV, that extracts the most relevant features (characteristics of the image related to corners and texture) from both the reference pattern and the given frame. The use of this algorithm is justified as it has been proven reliable to rotations and illumination changes [25] in comparison to other feature extractors such as SIFT [26]. Moreover, its computing efficiency has been proven in different scenarios [25], making it ideal for real-time applications. The maximum number of extracted features are limited to 1000 for the frame and 400 for the reference image, this difference in extracted features is due to the difference in the size of the images, being the frame a 1080p image and the reference pattern a 108 × 108 px image. To perform the matching between the features of the reference image and the ones obtained from the frame, the FLANN matching algorithm [27,28] has been used. This algorithm uses the k-nearest neighbours [29] with an l2 norm to match each feature obtained with the previous step, forming feature pairs. After this step the features that did not match the ones on the reference image are discarded.
Once the matching features are extracted from the frame, and as it can be more than one pattern placed on top of the subject, a k-means algorithm [30,31] is applied to the features in order to extract the different feature clusters that belong to each pattern. The k-means clustering algorithm can be applied as the features that belong to one concrete pattern have a higher probability to be close to each other, while at the same time be far from other features that belong to other patterns. This step is crucial in order to discern the location of the different patterns inside the frame and to perform its posterior tracking, thus allowing multiple measuring points of the subject.
After the clustering is applied, for each cluster an Homography algorithm [24] using RANSAC (Random Sample Consensus) [29,32] is used to find true pairs between the obtained features from the previous steps and the ones in the reference image.
Homography, as defined by Vincent et al. [24], is the projective transformation of the same feature between two images, the formal definition of homography can be seen in Eq. (1).
where x ′ T and x represent a pair of corresponding points on images x ′ and x respectively, F is the fundamental matrix relating both points and H is the projective transform between the images x ′ and x, if the images are of the same world point and plane as defined by Vincent et al. [24].
In order to show the basic operative of the RANSAC algorithm to find true pairs of features, three simplified steps are shown: 1 The variance normalized correlation is applied between all feature pairs, if the correlation is sufficiently high, the pair is deemed candidate. 2 From the candidate pairs, four points are selected and the Homography (Eq. (1)) is computed. 3 If the l2 norm between Hx and x for a set of candidate pairs is below a certain threshold, the used features are selected as valid.
A more detailed explanation of how the RANSAC algorithm is applied to each pair of features by Vincent et al., can be found in [24]. These algorithms are used to find true matches between pairs of features (reference image vs predicted pattern) in order to guarantee that all the features are completely bijective between each other and to remove any possible outliers. This step is performed once for each cluster. As a result, each one of the obtained features corresponds to a unique characteristic (corner or texture) present in both the reference image and the pattern inside the scene.
Finally, once all the clusters are analysed, the remaining features in each cluster are marked as features to track and passed on to the next section of the algorithm.

Feature tracking
Once the pattern features are located inside the frame and there are enough features to track, the next step is to compute the evolution of these features frame by frame. This tracking is performed by the means of the pyramidal implementation of the Kanade-Lucas-Tomasi (KLT) [20] optical flow algorithm. The optical flow is defined by Bouguet et al. as the estimated movement of an object inside a frame given two consecutive images. A more formal description of the algorithm, extracted from [20], can be found in Eq. (2)  As it can be seen in Eq. (2), in order for this algorithm to work two consecutive frames are needed. After the detection stage the one that was originally used to locate the features and the next frame from the camera are used. For clarification, the original frame will be named frame1 and the next frame from the camera will be named frame2 from now on.
The tracking algorithm and the feature verification have been extracted from the lk_track.py [33] example from the OpenCV library. The algorithm is implemented in the following way: 1 First, the locations for each feature on frame2 are predicted with the KLT algorithm. 2 The predicted locations are then used to compute a prediction of the original locations on frame1. 3 The l2 norm between the original features and the predicted original features from the last step is computed. The features which computed norm exceeds 1 pixel are automatically discarded. This distance has been chosen empirically as it was the one that yielded better results. 4 Finally, the features that were not discarded in the previous step, are then updated with the new location (frame2) and then used to compute the respiratory signal.
Computing the distance between the original location and the predicted one is crucial to guarantee the position of the different pattern features through the video feed [33], as this step prevents errors in the respiratory signal produced by these features "wandering" off the region of interest.
It has to be noted that the tracking stage does not update the position of the whole pattern but the features in it, as the real position of the pattern is not necessary for the obtention of the respiratory signal. In the steps described in the next subsection, it can be seen how the tracked features are used instead of the pattern to obtain the respiratory signal.

Signal extraction
The respiratory signal is extracted from each pattern features by computing a centroid from the location of its features, and obtaining the distance from this centroid to the origin of the image (the upper left corner of the image). This centroid is computed by averaging the x and y coordinates for each feature as it can be seen in Eq. (3).
where N is the maximum number of features for each cluster, x avg is the averaged x component, y avg is the averaged y component. Once the centroid is computed, an l2 norm (Eq. (4)) is performed to obtain the distance between the centroid and the origin of coordinates.
In order to extract the respiratory signal from the pattern, the pattern must be placed on the chest of the subject. Then, the variations on the computed distance are proportional to the displacement of the thorax [13], hence proportional to the respiration of the subject. The concatenation of this computed distance for each frame conforms the Raw Respiratory Signal as it can be seen in Fig. 3.
Once a respiratory point is extracted, the algorithms loops back to receive a new frame from the camera, if there are enough features to track from the last iteration the algorithm continuously tracks these features and extracts a new point of the respiratory signal. If there are not enough features, the algorithm will perform a new pattern detection to extract new features from the given frame.
The proposed architecture allows to perform the detection of the pattern inside the frame only when needed, for this reason the performance of the proposed method while extracting the respiratory signal is only limited by the tracking and extraction stage and not by the detection of the features inside the frame.

Setup
The reference method used to validate the method is the RespiBand inductive plethysmographic system from BioSignalsPlux™ [23] (PLUX wireless biosignals S.A., Portugal), which is comprised on a thoracic band and a Bluetooth transmitter. This system acquires the respiration of the user by sensing the volumetric changes in the thorax by the means of an inductive band. The displacement is sampled at 40 Hz with a 12 bit ADC, then the signal is filtered with a 1st order analogue band-pass filter between 0.058 Hz and 0.9 Hz. The sampled signal is sent via a Bluetooth classic serial port to the computer.
The consumer-grade camera used in the setup was the Logi-tech™C920 (Logitech International S.A., Switzerland). A consumergrade camera was chosen for this experiment to ensure that the proposed method could work without the need of dedicated hardware. The camera was configured to acquire at 15 frames/s with a full HD resolution (1080p), although the camera could be configured to record at 30 fps at full HD, preliminary tests showed a systematic drop of framerate which in return produced a respiratory signal that was not sampled at a regular frequency. This does not pose as a problem in real-life situations where the lost sample can be interpolated, but solving this issue eases the posterior comparison with the reference method. Also, automatic exposure was disabled and white balance was blocked in order to maintain the framerate constant. The field of view (FOV) of the camera is 74.42 • × 43.30 • (H x V).
The light source used to illuminate the subjects was an LED bulb from the Verbatim manufacturer with reference 52130. The light has the following specifications: warm white colour (CCT: 3000 K), 6.5 W, luminous flux of 480 lm and a beam angle of 130 • . The light source was placed at 70 cm from the subject by the means of a parabolic light holder.
In Fig. 4a, the disposition of the camera within the setup can be appreciated. The camera was placed approximately at 70 cm from the subject as it can be seen in Fig. 4b. Three patterns were placed on top of the seatbelt and inside the field of view of the camera. The Respiband system was placed on the subject's thorax below the chest. The location of the patterns and the Respiband system can be seen on Fig. 2c.
Prior to any test, the lighting in the room was conditioned and both the exposition and white balance of the camera were fixed, to ensure the same level of illumination for each subject. Although the algorithm was conceived to run on real-time systems, to make the study more reliable and repeatable, and for the further analysis of the obtained signals, the video feed from the camera and the reference method were recorded using a laptop PC. Both signals were later synchronized using the timestamp of the PC.
The used PC was an ASUS ROG gaming laptop with the following specifications: Intel i7-4710HQ, Nvidia GeForce GTX 850M and 8 GB of RAM. The OpenCV library was compiled without CUDA support, and only multithreading support with default settings was enabled.

Measurement protocol
Twenty-one healthy subjects with ages comprised between 20 years and 54 years (Mean: 26.6 years, SD: 6.8 years), with 10 of the subjects being female, with height comprised between 160 cm and 190 cm (Mean: 170.8 cm, SD: 7.4 cm) and chest perimeter comprised between 74 cm and 110 cm (Mean: 88.4 cm, SD: 10.2 cm) volunteered for the study. Each subject gave their oral informed consent to freely participate in this study, and this study was performed in accordance with the principles of the Declaration of Helsinki [34]. All the measurements performed complied with the regulations of the Universitat Politècnica de Catalunya (UPC).
Prior to performing the measurements, each subject was asked to put on the RespiBan system below the chest near the abdominal region and on top of the belly, to seat on the seat and to fasten the seatbelt placed on the setup and to remain as still as possible during the test.
Each subject was asked to perform four tests. The test consisted on the subject breathing at a given frequency or with a given constraint, in two of the four tests the subject had to breath at 0.1 Hz and 0.3 Hz, and in the two remaining tests to breath freely and to read out loud a text. To aid the subject in the first two test (constant frequency) a custom visual aid was developed. The aid consisted on a moving bar with 1/3 of the period for inhaling and 2/3 of period for exhaling. Each test had a duration of 3 min with a 30 s pause between them. In total, the duration of the four tests for each subject was approximately 15 min.

Signal processing
Prior to the analysis of the acquired signals a normalization must be performed. The same procedures and methodologies used in [22,35] have been applied in this article (Fig. 5). The normalization steps taken in both signals were the following: 1 The signals were interpolated at 40 Hz using a cubic spline, in order to homogenise the sample frequencies of both methods. 2 A bandpass filter between 0.05 Hz and 1 Hz was applied to eliminate undesired components and to remove possible base-line drifts in the signal. The applied filter was a digital zero-phase 2nd order bidirectional Butterworth filter. 3 A moving median filter [36] was applied to the signal to remove peaks induced by the previous stage, produced by transitory periods due to the rapid involuntary movements of the subject (that do not trigger the pattern detection stage). The window of the filter was set to three seconds, this length was proven enough to smooth the signal, and at the same time shorter than an average breath cycle. The resultant signal was obtained by subtracting the median filtered signal to the original signal. 4 Finally, to compress the signal between 1 and -1 a non-linear function was applied [37] as described in Eq. (5).

S n [n] = arctan S[n]
where S n [n] is the discrete normalized respiratory signal, S[n] is the raw respiratory signal after re-sampling and filtering and S is the mean of S[n].

Error characterization
To characterize the error between the signal obtained from the proposed method and the one from the reference method, the respiratory cycle series (RC) has been computed for each measurement method. The methodologies used are the ones described in [35]. Each RC series was obtained using the following steps: 1 First, both respiratory signals were aligned using the intra-class Fisher correlation (ICC) [38] iterating around one period. Using one period ensures that both signals are perfectly aligned between each other. 2 The first 10 s of each signal were cropped to avoid the initial transitory of the filters. 3 The percentile 65 was computed from the respiratory signal to obtain a threshold. 4 The previous threshold was used to detect intersections with positive slopes in the respiratory signal. 5 From the detected slopes, the time between positive slopes was computed to obtain the length of each respiratory cycle, hence the RC series.
To assess the accuracy of the respiratory cycle detection, a cycle-tocycle comparison has been performed using the following statistical methods: mean absolute error (MAE) (6), mean absolute percentage of error (MAPE) (7) and the standard deviation of the error (SDE) (8). The intra-class Fisher correlation (ICC) has also been computed between the respiratory signal from the two studied methods.
where S[i] represents the RC series obtained from the proposed method, and G[i] represents the RC series obtained from the reference method, N represents the total number of breath cycles per subject and k is the analysed subject.
To verify that the comparison between RC series has accurate results, and to evaluate the performance of the cycle detection on both the proposed method and the reference method, a confusion matrix has been computed with the following parameters: Finally a Bland-Altman [39] plot has been computed for all the tests, the magnitudes being compared are: all the respiratory cycles from the proposed method versus the ones obtained from the reference method, without making any distinction between subjects. The Bland-Altman plots have been computed using the methodology described in [39], where the x axis contains the mean of the cycles of proposed method and the reference method, while the y axis contains the cycles of the proposed method minus the cycles of the reference method. The limits of agreement have been computed as d ± 1.96s, where d represents the difference between the cycles from the proposed method and the cycles from the reference method, d represents the mean of the differences and s represents the standard deviation of the differences. Fig. 6 shows an example of the comparison between a processed respiratory signal obtained with the reference method and the one obtained with the proposed method, in the same figure it can also be seen a comparison of the respiratory cycles obtained from the reference method and the ones obtained from the proposed method.

Computational cost
The computational cost of the algorithm was obtained by computing the timings of the Pattern Detection stage and the combined Feature Tracking and Signal Extraction stage. The results obtained were (Mean ± SD): for the Pattern Detection stage the computing time was 254.7 ± 2.8 ms and for the combined Feature Tracking and Signal Extraction stage the computing time was 12.4 ± 0.54 ms, this last time can be translated into a maximum of 80 frames/s at 1080p on the tracking stage.

Statistics
In order to clarify the following tables and figures, a naming convention has been adopted which renames the respiration test at 0.1 Hz as "0.1 Hz", the test at 0.3 Hz as "0.3 Hz", the free breathing test as "Free" and finally, the test in which the subject was asked to read out loud a text as "Reading". Table 1 shows the results obtained after characterizing the error between methods for each one of the four tests. Given the correlation between signals, it can be seen that the 0.1 Hz test was the one presenting a highest correlation (0.945) and the Reading test had the lowest one (0.85). The MAE and SDE results show that the 0.3 Hz test presented the lowest mean and standard deviation error and 0.1 Hz presented the highest. Finally, the MAPE results showed that the 0.1 Hz test was the one with highest accuracy and Reading had the lowest. The interactions between 0.1 Hz, 0.3 Hz, Free and Reading had been assessed by computing the paired t-test for all possible combinations, the only relevant results were for the 0.3 Hz versus the others where the t-test showed significant differences (p < 0.05), all the other results showed non-significant differences. Table 2 summarizes the aggregated confusion matrix for all the subjects for each test. The SEN results showed that all tests had a sensitivity greater than 90% being the lowest the Reading test with a 94.02%. The PPV results show that: for 0.1 Hz, 0.3 Hz and Free the PPV was greater than 95% being only the Reading test was the one below 90%. Fig. 7 contains four Bland-Altman plots comparing all the respiratory cycles (RC) obtained from the reference method with the ones from the proposed method for each one of the four tests, where no distinction between subjects has been made. As it can be seen in Table 3 the 0.1 Hz test was the one with the highest mean error and both Reading and 0.3 Hz tests presented the lowest. Analysing the standard deviation of the differences the 0.1 Hz test presented the highest deviation and the 0.3 Hz test presented the lowest, agreeing with the results shown in Table 1.

Discussion
In this study, an algorithm to extract the respiratory signal from a subject using a consumer-grade camera has been presented. The proposed algorithm performs (in its tracking stage) up to 80 fps, which makes it feasible to be used in real-life environments and in real-time situations. As it can be seen in Fig. 6 the respiratory signal extracted from the proposed algorithm presents a high agreement with the reference method.
Taking into account the results in Table 1, the controlled respiration tests (0.1 Hz and 0.3 Hz) showed a higher correlation than the uncontrolled tests for the ICC between signals, this could be due to involuntary movements of the subject which produces artefacts and reduces the correlation between signals. In the case of Reading, this could also be explained by the changes induced in the respiratory signal produced by the subject reading out loud a text.
Although there was a high correlation between methods, the correlation alone does not give information about how good the proposed method is, and how close are the detected cycles to the ones detected from the reference method. For this reason the MAE, SDE and MAPE of the respiratory cycles has also been computed.
Relative to the MAE results: the 0.3 Hz test was the one that showed the lowest mean and standard deviation, this can be explained as the 0.3 Hz has shorter respiratory cycles than the other tests, therefore the error in the estimation of the respiratory cycles is lower if compared with the other methods, hence reducing the mean and standard deviation of the error. The 0.1 Hz test was the one that showed the highest mean and standard deviation. This test was also the one with the longest respiratory cycles (in Table 2 it can be seen that 0.1 Hz has 350 instead of the 1040 cycles of 0.3 Hz, but both had practically the same FP and FN), this implies that any misalignment between the RC signals produces a higher error (mean and standard deviation) than when the cycles are short. For the Free and Reading test, the MAE results were practically the same being the SD of the Free test the highest, this can be explained if it is taken into consideration that both tests have approximately the same cycle length.
For the SDE results, the 0.3 Hz test had the lowest SD and 0.1 Hz was the one with the highest. This results have the same interpretation as before, the tests which had shorter cycles presented lower errors. For both MAE and SDE all tests had a mean and standard deviation below 0.5 s which is a fairly low error.
The MAPE results for 0.1 Hz and 0.3 Hz tests were very similar to each other, being 0.1 Hz test the one with the lowest mean (high accuracy). The Free and Reading test on the other hand, presented a higher mean and SD due to the more turbulent nature of the respiratory signal on such tests, being the Reading test the one with the highest error. It can also be appreciated that the MAPE results showed a higher error when  To assess the error when computing the cycles from the respiratory signal, in Table 2 the results for the SEN and PPV are shown. The test with the lowest SEN result was the Reading test, this can be interpreted as when the subjects were reading out loud a text every subject had their own way of breathing, in other words, this translates in different artefacts for each subject which in return increases the total number of FN cycles detected. If the SEN values for the Free and Reading tests are compared, it can be appreciated that both have practically the same number of TP, but for the reasons explained above, the Reading test almost doubles the amount of FP, having a significant increase in FN. As for the 0.1 Hz test the SEN is 96.69%, this can be explained by the low number of TP compared with the rest of the tests. Finally the 0.3 Hz test is the one with the highest SEN value (up to 98.77%). From the SEN results it can be inferred that in the worst case scenario (Reading), with the proposed algorithm more than 90% of Sensitivity can be achieved when detecting cycles from the extracted respiratory signal.
For the PPV results, the Reading test is the one with the lowest value, this can be due to the same reasons as why SEN results are low for this particular test. For the rest of the tests the PPV results are greater than 95% which implies a very accurate cycle detection for the proposed method in comparison with the reference method.
The Bland-Altman results in Fig. 7 and in Table 3 showed that: for all the tests the mean is practically zero being the highest 9 ms for the 0.1 Hz test. Regarding the constant frequency tests, it can be appreciated that the 0.3 Hz test presents lower limits of agreement with the cycles concentrated in a narrower interval in both axis if compared with the 0.1 Hz test. being the latter the one with the highest SD (534 ms), which can be explained taking into consideration that a low number of Avg. Cycles induces more variability to this particular test. The results for the constant breathing test agree with the ones presented in Table 1, where the 0.3 Hz test also presents a lower MAE and SDE results than the 0.1 Hz test. Regarding the Free and Reading tests, both present a higher dispersion than the constant breathing tests, being the limits of agreement comparable between them with the Reading presenting more SD than the Free test as it can be seen in Table 3. Finally, no bias error can be appreciated in any case, and a high agreement between the reference method and the proposed method can be found for all tests.
In order to compare the proposed algorithm with the one proposed by Massaroni et al. [15], the results in this study have been adapted to the metrics of breaths/min presented in [15], the results are shown in Table 4. The only results that can be compared are the ones from the Free test as [15] does not use controlled respiratory frequencies.
The mean MAE results obtained for the Free test are 0.862 breaths/ min, if compared with the best results in [15] (0.55 breaths/min) the obtained results are slightly worse, but if compared with their worst results (1.53 breaths/min) the ones presented in this study are better. For the Bland-Altman (BA), the proposed algorithm has a mean and std of 0.026 ± 1.425 breaths/min for the differences in the Free test, while [15] shows − 0.03 ± 1.78 breaths/min for the best result and − 0.06 ± 2.08 breaths/min for the worst result. The proposed algorithm, if taking into account the MAE and BA results, performs slightly better in terms of breath to breath detection than the one presented in [15].
The main differences between this study and the one presented in [15], is the lack of performance in real time and the ability to compute the region of interest (ROI) automatically, as [15] relies on prior information given by the subject to compute the ROI in order to extract the respiratory signal.
A comparison of the proposed method and other methods in the literature regarding the aforementioned parameters and if the study computes the breath to breath cycles are shown in Table 5. This table serves as an overview of the different novelties and limitations of the aforementioned methods compared with the proposed method.
There were several limitations to this study, being the first the number of subjects that participated on the study. Not all the subjects could be included due to errors during the acquisition stage or due to errors on the signal extraction stage, only 21 subjects of the 23 that participated in the study could be used. The two subjects that were discarded had less than one minute of valid signal due to continuous excessive movement of the subject, which continuously triggered the pattern detection stage due to loss of features, during the tests.
The second limitation was that all the tests were performed in a very controlled environment, for this reason it cannot be assured that the proposed algorithm would have the same performance in light changing conditions or in environments that present external vibrations as present inside a vehicle, which certainly will affect the results of the tracking stage. To assess this limitation, the proposed method will be tested in Fig. 7. Bland-Altman of the computed periods for each frequency. The central dashed line for each plot represents the mean of the points, the two upper and lower dashed line represent the confidence interval given by the mean ± the 95% of the standard deviation of the points, finally the dashed grey line represents the zeromean line.

Table 3
Mean ± SD of the differences in the Bland-Altman plot in Fig. 7 0.  demanding environments which include external vibrations and light changing conditions.

Conclusion
A new non-contact video-based method to acquire respiratory signals using a consumer-grade camera has been presented. The proposed algorithm consists on detecting a known pattern inside the FOV of the camera, once the pattern is detected a tracking stage updates the location of the pattern for each frame. The respiratory signal is extracted from the location of each pattern forming the respiratory signal. The algorithm has been validated (21 subjects under four different breathing frequencies) using an inductive plethysmography system as a reference method. The results showed a high correlation between the proposed method and the reference method (≥ 0.85), with low error results (MAE < 0.34 s) and with a high sensitivity (SEN ≥ 94%) when detecting respiratory cycles.
In this article it has been proven that the proposed algorithm acquires the respiratory signal with high performance when compared with a reference method, and that it could be applied to real-life situations. As a future work, because the proposed method is built based on the OpenCV library, it can be potentially used on other hardware platforms. For this reason, the performance of the proposed method will be assessed in embedded devices to broaden the field of application i.e in applications regarding sleep apnoea studies or ICU monitoring. Moreover, as the proposed method has only been tested with healthy subjects, further studies will need to be performed to characterise the performance of the method in the presence of different respiratory conditions, while also broadening the number and variety of measured subjects.