Influence of Ambient Factors on the Acquisition of 3-D Respiratory Motion Measurements in Infants—A Preclinical Assessment

In recent decades, great progress has been made in the contactless detection of infant and child respiratory motion. Systematic camera errors and disturbing environmental influences can have a strong impact on the signal quality of depth cameras. The prevalence of respiratory diseases in children and the central role that respiratory activity plays in the treatment of these conditions necessitate a robust and nonintrusive way of quantifying nocturnal respiratory activity in children. The aim of our study was to assess the robustness of various depth cameras (Microsoft Kinect V2, photonic mixing device (PMD) CamBoard pico flexx, PMD CamBoard pico monstar, ORBBEC Astra Pro, Intel RealSense D435) regarding ambient factors typically found during the use in clinical and home settings. We investigated the influence of viewing angles, reflectivity, and infrared (IR) light on signal quality. By correlating depth images with respiratory—belt signals, our analyses show that three cameras achieved adequate performance and might be considered for use in the contactless assessment of infant or child respiration. Several cameras failed under some of the ambient conditions which are likely to be encountered in real-life settings. Therefore, a careful camera selection regarding environmental influences is pivotal.


Influence of Ambient Factors on the Acquisition of 3-D Respiratory Motion Measurements in Infants-A Preclinical Assessment
Niklas Alexander Köhler , Claudius Nöh , Marcel Geis , Sebastian Kerzel , Jochen Frey , Member, IEEE, Volker Groß , and Keywan Sohrabi Abstract-In recent decades, great progress has been made in the contactless detection of infant and child respiratory motion. Systematic camera errors and disturbing environmental influences can have a strong impact on the signal quality of depth cameras. The prevalence of respiratory diseases in children and the central role that respiratory activity plays in the treatment of these conditions necessitate a robust and nonintrusive way of quantifying nocturnal respiratory activity in children. The aim of our study was to assess the robustness of various depth cameras (Microsoft Kinect V2, photonic mixing device (PMD) CamBoard pico flexx, PMD CamBoard pico monstar, ORBBEC Astra Pro, Intel RealSense D435) regarding ambient factors typically found during the use in clinical and home settings. We investigated the influence of viewing angles, reflectivity, and infrared (IR) light on signal quality. By correlating depth images with respiratory-belt signals, our analyses show that three cameras achieved adequate performance and might be considered for use in the contactless assessment of infant or child respiration. Several cameras failed under some of the ambient conditions which are likely to be encountered in real-life settings. Therefore, a careful camera selection regarding environmental influences is pivotal.

I. INTRODUCTION
R ESPIRATORY diseases are among the most common health problems found in children. This includes infection-related diseases such as bronchitis and laryngitis ("croup") and several chronic diseases (such as asthma). They are associated with an immense burden of disease for the affected children and their families [1]. Physicians currently do not have an objective diagnostic tool to continuously monitor respiratory work and potential respiratory disorders in infants and children. Regarding nocturnal symptoms, which are a hallmark of many respiratory disorders, physicians rely entirely on anamnestic information provided by parents. However, these caregiver-reported findings are prone to significant bias, mostly because the parents themselves sleep at night. Furthermore, objective interpretation and assessment of symptoms are difficult [2]. As a result, there is a large diagnostic gap for an objective assessment of breathing in children, especially during sleep. Due to the high costs, polysomnography (PSG) in the sleep laboratory is reserved for certain indications, and can hardly be performed at home. In addition, PSG means direct contact with many wires and sensors (e.g., respiratory belts), which is associated with considerable stress for children. For this reason, and to minimize risks to patients (e.g., due to strangulation), contactless detection of respiratory motion has gained interest in the scientific community [3], [4], [5], [6], [7]. There are numerous technical approaches, including thermographic imaging [8], [9], Doppler radar sensing [10], [11], and video acquisition [12], [13]. The availability of low-cost depth sensor systems has sparked a growing interest in this field. Particular attention has been paid in the literature to the Microsoft Kinect (V1, V2) systems [14], [15], [16], [17], [18], [19], [20]. However, these systems are no longer commercially available since 2017. For this reason, this article compares the state-of-the-art indepth cameras with the Microsoft Kinect V2. Many findings in the literature focused on improving the recording quality under ideal measurement conditions (e.g., multiple cameras, close distance to the region of interest (ROI), absence of blankets, and absence of interfering light sources). This impression is consistent with the results of the recent review article by Cabon et al. [21]. However, since the monitoring system considered here is also intended for use in a home environment, the preconditions for measurement can diverge greatly [22]. Use by laypersons, under direct sunlight, or in the presence of a bedside lamp are realistic scenarios in the home environment. In addition, a sufficiently safe distance between the monitoring device and the patient should be considered regarding risk reduction.
The aim of this study was to define and analyze the influence of realistic environmental conditions for contactless detection of respiratory motion in infants and children. For this, five different camera systems were compared in a preclinical setting. In addition, the limitations of the individual cameras were analyzed, such as the influence of infrared (IR) light and various reflection factors. Furthermore, the interference factors of the camera systems on a clinical monitoring system were investigated.

A. Materials
All cameras were tested in a laboratory environment that mimicked a clinical setting. A neonatal simulation manikin C.H.A.R.L.I.E (Nasco Healthcare Inc., Saugerties, NY, USA) was fit with a 0.5 l reservoir bag (Intersurgical Ltd., Wokingham, U.K.) concealed in a baby sleeping bag and placed in a cot, similar to Rehouma et al. [19]. The artificial lung was ventilated with a Hamilton C-2 ventilator (Hamilton Medical AG, Bonaduz, Switzerland) in a volume-controlled mode (tidal volume: 50 ml) to ensure regular, age-appropriate respiratory movement with constant volume plateaus.
To record the breathing motion a SS5LB respiratory effort belt (BIOPAC Systems, Inc., Goleta, CA, USA) was strapped around the manikin's upper body and connected to a recording system BIOPAC MP36 (BIOPAC Systems, Inc., Goleta, CA, USA). A digital input channel of the MP36 was connected to a synchronization switch. Both signals were sampled at 100 Hz. This system served as the reference system for the contactless measurements of respiratory motion by the cameras.
The following five cameras were tested regarding their suitability for respiratory monitoring in the clinic or at home: . The cameras were mounted on a frame that allowed vertical and lateral measurements, an angle of 0 • and approx. 25 • , respectively. The steep lateral acquisition angle of 25 • for the measurement from the side was chosen to minimize occlusion of the hidden side of the child simulator in this particular camera setup, while maintaining a position out of reach of a patient. The vertical measurement setup is shown in Fig. 1.
Four reflective markers were positioned in the corners of the ROI, a rectangular area above the manikin's ventilated torso. These markers were visible during each measurement so that approximately the same region of movement could be selected for all measurements during data analysis.
The infrared and depth images of the Kinect were captured using a custom program in LabView 2015 [23], the Kinect for Windows Software Development (SDK) Kit 2.0 [24], and the Kinect-One-Toolbox-MakerHub Interface [25]. The photonic mixing device (PMD) camera data was recorded with a Python 3.7.8 [26] script and a Python wrapper provided by the manufacturer as part of the Royal SDK [27]. The Astra Pro data were recorded using the OpenNI 2 Framework [28], an OpenNI python wrapper [29], and the Orbbec device driver [30]. The camera data of the D435 was recorded with the RealSense Viewer software [31] and the frame sequences were extracted afterward using the pyrealsense2 package [32] inside of Python. The sampling rate of the cameras was lowered as much as possible to a frequency suitable for capturing respiratory movements, but without being memory inefficient. According to the Nyquist-Shannon sampling theorem, a sampling rate of at least 2 Hz is required to capture the respiratory motion signal at a rate of up to 1 Hz. To better preserve the waveform, a sampling rate of about 10 Hz was targeted. The capturing rates of the flexx and the monstar were set to 10 Hz and the capturing rate for the D435 was set to the nearest available rate of 15 Hz. The Kinect and the Astra Pro had fixed rates at 30 Hz. For these cameras every other frame was discarded, resulting in a data rate of 15 Hz (see Table I). In this way, we have tried to account for both, the comparability of the performance and the reduction of the amount of data generated.
The software filter and acquisition settings were set to default values or, where available, to the recommended value and were held constant for all measurement settings. For the monstar and the flexx, the longest available exposure time (monstar: 720 µs and flexx: 1000 µs) at a capturing rate of 10 Hz was chosen for optimal depth measurement conditions. The filtering was set to "legacy" to exclude pixels with a low-depth measurement confidence. The IR emitter of D435 was optional for the operation, but can be beneficial for the 3-D measurements in situations without a textured target surface [33]. For our study, the IR emitter was activated to increase the depth quality in measurements of blankets without strong texture and with weak IR background illumination. Additionally, the resolution was set to the recommended value and a decimation filter was applied (see [33]). No temporal filtering was applied to any camera data while recording. Table I shows the settings for all cameras and the ventilator.

B. Experimental Setup and Execution
For some of the cameras, various influences on measurement quality have been identified in the literature. To test these factors for all cameras, they were included in our camera comparison if they are likely to occur and vary in a clinical or home monitoring setting as well.
1) Viewing Angle: a) The influence on the signal-to-noise ratio of Kinect for X (a structured light camera) and Kinect for Xbox One (TOF camera) was investigated by Sarbolandi et al. [34]. b) The influence on the robustness of the TOF camera Kinect for Xbox One was described by Fankhauser et al. [35]. 2) Reflectivity and Prints on the Blankets: a) The reflectivity of captured surfaces affects the depth measurements of the Kinect for Xbox One [34]. b) The stereo vision principle applied by the RealSense D435 camera depends on captured texture details [36].

3) Global Infrared Illumination:
a) Sunlight degrades the quality of the Kinect Xbox One TOF camera's depth measurement [34], [35]. b) Sunlight degrades the quality of the Kinect for X structured light 3-D camera's depth measurement [34]. A sleeping bag and three different blankets were used (light-toned, dark-toned, and textured) to vary the reflectivity factors. In addition, the lighting conditions were modified by using either a 10 W 850 nm infrared LED room floodlight R M25-F-120 (Raytec Global, Ashington, U.K.) or a halogen bulb (1600 Lux from a distance of 0.45 m) operated with both direct current (dc) and alternating current (ac).
The quality of the depth measurements varies across the image. The best depth measurement accuracy can be achieved in the central image areas, which has already been shown for the Kinect [34] and the flexx [18]. For this reason, the ROI was located in the central image area in this setup without any further investigations. Two conditions were set as prerequisites for determining the distance from the camera system to the mattress. The first condition was that the field of view included the entire mattress area. This is essential for longitudinal monitoring of an infant based on nocturnal movement activity.
The second condition was to maintain a safe distance between the camera system and the mattress, so that the infant has no possibility to touch the system. Both conditions resulted in a distance of approximately 1.3 m for vertical measurement and 1.4 m for angled placement.
As for the vertical position, all cameras were tested regarding a combination of the following influences: IR room floodlight on and off; use of sleeping bag; and blankets (darkcolored, light-colored, and textured). Additionally, vertical measurements were made for different blankets under 50 Hz halogen lighting. A vertical measurement with a white blanket and a dark gray blanket was performed with dc halogen lighting. Measurements from an oblique viewing angle were performed with the textured blanket and sleeping bag. For each setting, the reference data and camera data were recorded and the measurements were repeated twice to check the repeatability. A total of 180 measurements were taken. Although changes between measurements were kept to a minimum, slight changes in belt load, child position or blanket position could not be excluded when converting between the different settings.
Warm-up times of the cameras were considered to minimize temperature-related errors. For the Kinect varying warm-up times were published ranging from 10 [14] to 40 min whereas the flexx does not require any warm-up time [18]. To our knowledge, no data was reported for the other camera systems. In the preliminary analysis, the mean ROI distance signals from the monstar and the Astra Pro showed baseline drifts for about 15 and 120 s, respectively. The D435 did not exhibit any temperature-related errors. The final warm-up times used in this experiment including safety margins, were For each measurement, the reference signal had to be synchronized. For this purpose, a switch was used to turn an IR-LED-Array on and off and to pull a digital pin on the BIOPAC acquisition to ground or 5 V, respectively.
The following steps were performed for all measurements: 1) Ventilation of the artificial lung was started.
3) The end of the camera warm-up phase was awaited.
4) The first synchronization signal was set by switching on the LED for approx. 3 s. 5) The recording lasted 2 min.
6) The second synchronization signal was set by switching on the LED for approx. 3 s. For the measurements with IR floodlight, the following steps were added: 1) The third synchronization signal was set switching on the LED for approx. 3 s. 2) The recording was performed for 2 min under IR illumination.
3) The fourth synchronization signal was set.

A. Mean ROI Distance
All 3-D cameras captured an infrared image and a corresponding depth image at each sampling time. Both frames shared the same image resolution because they originated from the same imager. The infrared image gave the perpixel intensity of the infrared illumination while the depth images gave the per-pixel distance in mm of the corresponding projected world coordinate to the camera plane. All cameras marked pixels with a low confidence depth measurement as invalid by assigning a distance value of zero. The respiratory motion had to be extracted from the depth image sequences for all measurements. For a stationary ROI, where there was no movement other than respiration motion, the mean distance was used to capture the respiratory motion [3], [16]. The mean distance from a quadrangle area enclosing the ventilated torso of the children simulator was used to extract the respiratory motion. The quality of the extracted motion signal depends on the size of the ROI and should cover mainly the moving pixel regions [3]. The ROI was defined as the torso area, as it contains all areas that are relevant to respiratory movements. Additionally, it is easily recognizable in the data and can be thereby relevant for future computer vision-guided automatic ROI detections. In our case, the manual selection of the relevant region was guided by reflective markers in the corners of the ROI, which ensured comparability for all cameras and settings. The pixel positions of the markers were manually selected once for each captured image sequence. In the automatic analysis, pixels were selected by applying two criteria. First, the pixel needs to be inside the selected pixel region. Second, the depth measurements need to be valid. For each depth pixel in row i and column j at sample x a binary function B was used for the selection of relevant pixels The mean distance y of n inlier pixels of image sample x was computed by (3) Fig. 2. Inverted respiratory belt signal (blue) and synchronized mean ROI distance sequence of the monstar TOF camera (red).
In the resulting time series of mean ROI distances y[x], the distance increased during the expiration phase of the ventilation and during the inspiration, the ROI surface moved toward the camera and thereby decreased the mean distance of the ROI from the camera. The respiratory belt signal was inverted before analysis to increase the readability of the figures (see Fig. 2).

B. Synchronization
The reference system and the separately captured image data were synchronized for each measurement. In the camera images, the time of change in the pixel values, triggered by the synchronization IR illumination, was aligned with the change in values in the digital BIOPAC synchronization channel. The camera and reference data were synchronized by aligning manually the second synchronization marker. The synchronized camera and BIOPAC signals were plotted against the common time axis and synchronicity over the entire measurement was confirmed.

C. Signal Quality: Correlation of Respiratory Effort and Mean ROI Distance
The periodic signal can be correlated with the cyclic strain measurement from the respiratory belt as an indicator of similarity with the reference signal [16]. The Pearson correlation coefficient was adopted in this article to measure the association between the two measures of respiratory effort in mV and mean ROI distance in mm in the presence of different levels of depth measurement noise. A high correlation means a high association of camera and belt signal and low stochastic noise whereas a low correlation indicates the presence of independent noise as the main source of variance. First, a linear resampling was applied to the camera signal to account for frame losses during the acquisition, as shown by Procházka et al. [16] for the Kinect. Before computing the correlations, the different sampling rates of the cameras had to be accounted for. A higher sampling rate leads to an increase in variance resulting from high-frequency random noise outside of the signal bandwidth. Therefore, all signals, cameras, and references, were low pass filtered to half of the lowest camera sampling rate using a butterworth zero phase shift filter with a cutoff frequency of 5 Hz and an order of 16. All signals were then down-sampled to 10 Hz, except for the flexx and monstar signals, which were already sampled at 10 Hz. The BIOPAC signal was inverted to allow an easier qualitative comparison of the signals. After the inversion, a stronger strain is associated with lower amplitude and vice versa. The synchronized, filtered, and resampled camera and respiratory belt signal was correlated for a time window of 20 s.

IV. EFFECTS ON RESPIRATION MOVEMENT MEASUREMENTS
A. Examplary Distance Measurements of an ROI Fig. 3 shows representative depth images from a vertical camera position. On each image, the corner points of the selected ROI are marked with black diamonds. In addition to the associated depth images, the mean ROI distances and the synchronously captured respiratory belt signals are displayed. All image areas of the cameras captured the entire mattress surface, allowing the identification of the child simulator position. The D435 had the highest image resolution. The resolution of depth measurements differed depending on the measurement principles. The TOF cameras had one depth value per pixel which results in a higher spatial resolution than the structured light camera Astra Pro or stereo vision camera D435. All cameras were able to measure the cyclic depth variation compared to the reference signal but with different levels of noise. The Astra Pro had the least noisy motion followed by the monstar, the flexx, the D435, and the Kinect in the order of increasing noise level. With both the D435 and Kinect, the signal-to-noise ratio of the periodic motion signal to the captured noise is too low for qualitative analysis of the respiratory motion. Regarding the influence of different blankets on the signal quality, the TOF cameras performed better with more reflective surfaces, which is consistent with the results of Fursattel et al. [14]. When the active illumination was low compared to the noise from other sources, the uncertainty increased more [37]. A blanket with low reflectivity leads to an increase in stochastic noise in the depth measurements and thereby to a decreased correlation with the reference signal. In particular, the intensity of the illumination seems to be a decisive factor for low-reflective material. This assumption is reinforced by the observation, that using the dark blanket, the measurement uncertainty of the flexx [one vertical cavity surface emitting laser (VCSEL)] increased stronger than that of the monstar (four VCSEL). Likewise, the quality of the camera signal of the Kinect and flexx decreased with the use of a sleeping bag. The combination of dark colors with a more complex geometry of the sleeping bag could lead to a reduction in the quality of the captured motion signals. The latter has already been shown at low angles due to the multipath effect [34]. For the D435, the textured blanket performed slightly better compared to the sleeping bag or other blankets. This was in line with the recommendation to use a textured target surface to aid the triangulation of feature points [36]. The projected IR pattern helped with the 3-D reconstruction if the reflectivity of the otherwise nontextured  blanket was high enough. This camera consequently performed worst with dark and nontextured blankets.

B. Influence of Different Colored and Textured Blankets on the Correlation Between Camera Signal and Respiratory Belt Signal
There was no discernible influence on the Astra Pro for the above-mentioned influences. The correlation of the camera signal with the reference signal was the highest in each setting compared to the other cameras.

C. Influence of the Camera Angle on the Correlation Between Camera Signal and Respiratory Belt Signal
For the chosen angle of 25 • , no distinct influence was present when compared to the vertical camera view as displayed in Fig. 5. Fig. 6 shows the influence of a narrowband 850 nm infrared floodlight on the correlation between motion signal and reference. As shown in Fig. 4, the IR light had only a marginal effect on the signal quality of the camera systems. There was a small tendency toward a slightly reduced correlation with low-intensity narrow-band IR illumination.

D. Influence of a LED 850 nm Floodlight Background Illumination on the Correlation Between Camera Signal and Respiratory Belt Signal
Regarding the TOF cameras, our findings are consistent with previously published results, as it is known that TOF cameras use global illumination filtering to suppress external influences (Kinect: [38] and PMD: [37]). Furthermore, this experiment shows, that the effect of narrowband IR light on the other camera systems is also negligible. Fig. 7 shows the influence of an IR illumination by a dc halogen bedside lamp on the correlation between the motion signal and reference signal as displayed in Fig. 1.

E. Influence of a Direct Current Halogen Infrared Lamp on the Correlation Between Camera Signal and Respiratory Belt Signal
The Astra Pro and the D435 both used a static projection pattern to enable and support 3-D reconstruction, respectively. Suppression of static illumination was not possible for those cameras without suppressing their own active illumination. The depth images of the D435 showed strong line artifacts in the illuminated region for high-reflectivity surfaces. The output of the D435's IR projector only slightly increased the contrast for the different blankets during the illumination by the halogen light source and the signal quality degraded inside the selected ROI. This behavior can be partly explained by the camera's automatic exposure and loss of contrast in the ROI. The measured correlation was below 0.2. Depth measurements were not possible when the dark blanket was used. Measurements with the Astra Pro led to overexposed images and thus to the invalidity of all depth measurements in the crucial region for the light blanket. For this reason, no depth values were available for correlation. The correlation coefficients for the dark blanket were scattered and ranged between 0.26 and 0.98.
All TOF cameras were able to suppress the infrared illumination and continue to measure with high reliability for the light blanket. The decrease in correlation when a dark blanket was used, and was significantly higher than in the absence of broadband IR illumination (see Figs. 4 and 7). Therefore, the use of a blanket with a low reflectivity factor in combination with IR-intensive light seems to have a significant influence on the measurement accuracy of the TOF cameras.
The monstar performed most stable for both, high and low reflectivity surfaces (correlation between 0.62 and 0.92). Fig. 8 shows the influence of an ac halogen bedside lamp illumination on the correlation between the motion signal and reference signal. The monstar and the flexx showed strong periodic distortions of the measured depth signal with a frequency of about 0.1 Hz. With the flexx, the low-frequency deviation was about 16 mm, which was a multiple of the signal amplitude to be measured. With the monstar, the deviation was in the order of magnitude of the signal amplitude. The disturbances on both camera signals showed a strong systematic character instead of more random distortions. One possible reason for this behavior is to be found in the principle of the Fig. 8.

F. Qualitative Influence of an Alternating Current Halogen Illumination
Influence of an ac halogen light source on the depth camera measurements (vertical measurement, dark blanket). camera's 3-D data acquisition. For each 3-D image, datasets of several (four) sensor readouts were used for calculation. If we assume that the halogen lamp modulated its intensity at 100 Hz due to the 50 Hz voltage supply, these different intensities could have influenced the datasets at different times of exposure and thus resulted in the distortion of the distance reading.
The behavior of the PMD cameras was not present in the data of the other camera systems. The Kinect and the D435 have low correlations, but do not show periodic distortions. The Astra seems to cope best with the ac illumination.

G. Qualitative Influence of the 3-D Camera Illumination on Side-by-Side Sleep Laboratory Measurements
All cameras except the D435 used active illumination for the depth measurements. The Astra Pro and the Kinect captured data at 30 frames/s. The illumination was sampled by the room camera at 30 frames/s. The illumination cone can be seen for the Kinect and the Astra Pro in Fig. 9(b) and (e). The monstar and flexx used modulated illumination with a light burst frequency that depended on the frames per second set. At 10 frames/s the illumination was visible as short flashes in the room infrared camera which captured images at 30 frames/s. The reason for this finding is the high-intensity IR LEDs that emitted light in high-intensity bursts to improve the signal quality [37]. The optional projected dot pattern of the D435, that supported the 3-D reconstruction, is visible in Fig. 9(a).

V. CAMERA SELECTION FOR HOME AND CLINICAL APPLICATIONS
Our findings show that ambient factors have a pivotal influence on the signal quality of 3-D camera systems. Therefore, a thorough specification of the application type should be performed.
Particularly, a distinction should be made between clinical and home applications, as the ambient factors could differ considerably.

A. Reflectivity, Texture, and Surface Geometry
The TOF cameras performed best with high-reflectivity materials. Less reflective material led to a significant deterioration of the signal quality. The D435 performed best with textured surfaces or a bright surface, reflecting the projected emitter pattern. The performance of the Astra was not affected by the tested surfaces and was best in all measured settings.
For the home setting, special attention should be paid to the reflection factor of the blankets and the sleeping bag, since it can be assumed that a wide variety of blanket colors will be used here. As shown in Section IV, this can have a strong impact on the signal quality, especially in combination with other ambient factors.
In the clinical setup, the reflection factor plays a rather minor role, since it can be assumed that light-colored bed covers are used. Only the reflectivity of the sleeping bag would have to be considered.

B. Interfering Light
The 850 nm LED infrared floodlight had no considerable effect on the correlation of the video respiration data and the reference respiratory belt, regardless of the 3-D camera used. Moreover, this finding should only play a role in the clinical setting, as no additional monitoring is expected in the home setting.
In contrast to this, a broader infrared dc illumination source can lead to significant degradation of the signal quality. The PMD camera sensors were able to suppress this type of illumination. The signal quality was mainly affected when low-reflectivity surfaces were used. Especially the flexx was affected, due to the relatively weak IR illuminator. The Astra Pro and D435 both used a static projection pattern to enable and support 3-D reconstruction, respectively. Suppression of static illumination was not possible for those cameras. This partially led to overexposure and loss of depth measurement capability.
With ac halogen room lighting, the depth measurements PMD cameras showed strong low-frequency distortions, while the Astra Pro did not. This is a finding, which, to our knowledge, has not yet been considered in any publication but can strongly affect the depth measurement systems.
In summary, the influence of interfering light should be imperatively included in the selection of the camera. In both settings it should be noted, that broadband IR light (e.g., halogen light or direct sunlight) can lead to a degradation of the measurement quality for TOF cameras and also even cause a total loss of the camera signal for structured light or stereo vision-based cameras.
Therefore, for best quality, strong and broad IR light should be avoided in the home as well as clinical applications.

C. Influence on Infrared Surveillance Cameras
For clinical use, a possible disturbance of an additional video PSG needs to be considered. The Kinect and the Astra Pro both captured at 30 frames/s. The pulsed active illumination of the Kinect was not visible in the room camera, which captured at 30 frames/s as well. The cone of illumination was superimposed in the ROI around the child, but static. The same was true for the D435 and the Astra Pro. Both, static and projected light textures were visible in the IR camera of the room. This could affect the readability of the video recordings.
For the PMD cameras, a lower capture rate was set to increase memory efficiency and to use the available exposure time for better signal quality. The lower capture rate with burst illumination of modulated light in conjunction with the higher capturing rate of the room camera led to perceptible light flashes in the room capturing. The light flashes could interfere with IR video room surveillance in clinical settings. An approach toward solving this problem could be the use of an emitter with a different wavelength outside of the IR room camera's sensitivity. The manufacturer PMD recently launched the follow-up flexx2, which offers the same resolution and a similar field of view, but a VCSEL with 940 nm instead of 850 nm.

D. Availability, Price, Device Size, Power Consumption, Software
To select a suitable camera system, device parameters such as device size, price, power consumption, and integration of the sensor technology into other systems should also be taken into consideration. The Kinect is the largest camera, requires an additional power supply, and is no longer commercially available. All other cameras are powered over USB.
The D435 is small, but relatively more expensive and with a less reproducible performance under the influence of the investigated ambient factors.
The Astra Pro is the least expensive of all the cameras. The camera is bigger than the PMDs' but additionally includes an RGB camera and two microphones. According to our experience, the main disadvantage is the insufficiency of the software documentation.
The PMD cameras are particularly interesting because they offer an extensive software development kit, which enables adaption to the respective setting. The monstar is the most expensive tested camera with high resolution, a wide field of view, and strong IR illumination which makes it interesting for benchmark testing. The flexx and its follow-up the flexx2 have the advantage of being small, compact, and inexpensive. The performance is weaker than the monstar, but still sufficient for most tested settings.

E. Limitations
When estimating the expected signal quality of the contactless respiratory motion measurement, it must be considered that in our experiments only the pixels from a fixed region around the center of the image were used for analysis. It should be noted, that ROIs outside of the image center are expected to have higher noise levels. In the image corners, the quality deteriorated for the Kinect [34] and flexx [18].
For the Kinect and Astra Pro, the depth data was used as provided by the programming interface, but the data rate was reduced by discarding every other frame. In further applications of these sensors, the discarded data could be used for live filtering to increase signal quality or to further decrease the data rate.
Additionally, the quality of the computed respiratory motion depends on the ROI selection. Our marked and selected ROI contains static pixels and moving pixels. In our present analysis, this ratio of pixels was constant between cameras because of the marker-guided ROI selection and therefore the signal quality was comparable between cameras. An ROI refined to the pixel region solely containing nonstatic pixels could improve the signal quality of all cameras. The signal quality of the extracted respiratory motion in our article allowed the comparison of cameras but can be improved by postprocessing and filtering with respect to an application setting. Despite these differences, studies measuring the in vivo correlation came to similar correlations as our correlation with a respiratory belt: 0.85-0.91 for a thoracic region and abdominal region for adults (distance 60-100 cm) [39] and 0.86-0.9 [40]. Most recently, the computed correlation values cannot be directly compared with in vivo measurements of adults, as the motion amplitude and thus the signal-to-noise ratio should be greater.

VI. CONCLUSION
In this article, the influence of different ambient factors on 3-D camera systems regarding respiratory motion measurement in infants was investigated.
Our findings show, that the influence of ambient factors, such as broadband IR light or reflectivity factors, can have a significant impact on the measurement quality of the respective camera systems, whereas other factors like measuring angle and narrowband IR seem to be negligible. Therefore, it should be reflected under which conditions the measurements take place and the camera should be chosen accordingly to guarantee adequate signal quality for further processing and analysis.
Special attention should be paid to the distinction between clinical and home ambient factors. In the former, especially the influence of IR interfering light should be considered as well as possible interferences on other monitoring systems (i.e., video PSG), while in the home environment specifically the reflection factors of different colored blankets in combination with broadband interfering light (i.e., bedside lamp or direct sunlight) should be considered for reproducible quality. In that case, the resulting quality of extracted respiratory motions is mainly dependent on further processing like ROI segmentation, artifact detection, motion extraction, and postprocessing. Based on our experiments, we suggest the Astra Pro for its high signal quality across different settings and the PMD cameras for their ability to suppress ambient lighting and provide clean infrared images for computer vision tasks developed for grayscale images.
In further studies, we plan to assess the selected cameras in a clinical setting to record 3-D data in parallel to a video PSG for analysis of real live motions and respiratory events including varying motion amplitudes (e.g., apneas and hypopneas) with the aim of achieving further improvements toward longitudinal 3-D respiratory monitoring for infants.