Semi-Automatic Calibration Method for a Bed-Monitoring System Using Infrared Image Depth Sensors

With the aging of society, the number of fall accidents has increased in hospitals and care facilities, and some accidents have happened around beds. To help prevent accidents, mats and clip sensors have been used in these facilities but they can be invasive, and their purpose may be misinterpreted. In recent years, research has been conducted using an infrared-image depth sensor as a bed-monitoring system for detecting a patient getting up, exiting the bed, and/or falling; however, some manual calibration was required initially to set up the sensor in each instance. We propose a bed-monitoring system that retains the infrared-image depth sensors but uses semi-automatic rather than manual calibration in each situation where it is applied. Our automated methods robustly calculate the bed region, surrounding floor, sensor location, and attitude, and can recognize the spatial position of the patient even when the sensor is attached but unconstrained. Also, we propose a means to reconfigure the spatial position considering occlusion by parts of the bed and also accounting for the gravity center of the patient’s body. Experimental results of multi-view calibration and motion simulation showed that our methods were effective for recognition of the spatial position of the patient.


Introduction
An aging population makes up a progressively larger proportion of society in developed countries in recent years. With this trend, the number of accidents involving falls has increased in hospitals and other care facilities [1][2][3][4]. Because possibly delayed discovery of such accidents is a life-threatening risk, their early detection is important for these facilities. Some fall accidents happen during unmonitored walking. For early detection of these accidents, fall-detection systems for use during walking have been studied over the last few years [5][6][7][8][9][10][11][12]. The researchers used various devices, such as RGB cameras, depth sensors, infrared sensors, and accelerometers.
Some fall accidents also have happened around beds, and fall-risk assessment tools have been used to evaluate the risk of accidents involving falling from bed [13,14]. If a patient has been evaluated as having a high risk of fall accidents, risk-mitigating measures are taken; these include attaching a sensor to the patient to detect when they get up or when they get out of bed [15]. The main sensors in current use to monitor patients are clip sensors that attaches to clothing, floor-mat sensors, and bed-mat sensors [16,17]. The clip sensor sounds an alarm when the clip is pulled. It is inexpensive but invasive. The floor-and bed-mat sensors are noninvasive but occasionally do trigger false alarms, which are burdens for medical staff [15]. Various sensors have been tested in recent years in an attempt to reduce the false alarms. In some studies, multiple piezoelectric elements [18,19] or strain gauges [20] placed on the bed recognized the position of the patient in the bed, while some other studies used ultrasonic or infrared depth sensors [21][22][23][24] to recognize the position of the patient both in bed and out of bed. In our study, we used an infrared depth sensor, which can obtain a wide range of depth information as a depth image and can monitor patients day or night.
Some conventional methods using infrared depth sensors installed vertically from the ceiling to the floor detect bed-exit and fall events by determining the position of the patient from the depth image [22,25]. Although this simplifies the calculation of the position, installation work is required. Although other conventional methods do not require ceiling installation of the sensors [23,24,26], they have other drawbacks. The method proposed by Asano et al. [23] is limited to use under the condition that only the bed and wall (not the floor) are imaged by the sensor. Methods by Ogura et al. [24] and by Ni et al. [26] both require manual intervention to identify the part of the captured depth image that includes the bed, and this laborious initial calibration is required every time the position of the sensor or bed is changed.
Beyond simply detecting the bed region, other researchers have studied several alternative methods that recognize spaces from a three-dimensional (3D) point cloud. Banerjee et al. [12] devised a method that automatically detects the floor surface and recognizes floor-level falls by applying the dense scale-invariant feature-transform (dense SIFT) method [27] and the random-sampling consensus (RANSAC) algorithm [28] to the depth image. Nurunnabi et al., Limberger et al., and Vera et al. studied methods for detecting planes from 3D point-cloud data [29][30][31]. In particular, Vera et al. [31] proposed a specialized plane-detection method that combines range images with principal component analysis (PCA) and 3D Hough transform. Although this method, known as the depth kernel-based Hough transform (D-KHT) method, is fast and robust, because it was not designed for bed monitoring, it does not calculate either the bed region or the sensor position and attitude (an angle of rotation).
We propose a bed-monitoring system that incorporates a semi-automatic initial-calibration method using an infrared depth sensor attached in any position without installation on the ceiling. This method automatically calculates the bed region, sensor position and attitude, and a spatial domain for recognizing various behaviors such as sitting up in bed or getting out of bed. The floor and bed surfaces are planar features in the space analyzed, therefore we extract them using PCA and k-means++ clustering [32]. There is also a difference in level between the floor and bed surfaces in the real space, and the boundary appears as an edge on the depth image. We therefore distinguish between the floor and bed regions using Canny's edge-detection method [33,34]. Then, geometric calculation based on principal component axes of the bed region automatically yield the sensor position and attitude and the spatial position of the patient (see Section 2.2; we previously reported this approach in part at a forum in 2017 [35]). Alternatively, the floor and bed regions can be extracted by the D-KHT method (see Section 2.3). We compare the precision and calculation speeds of the D-KHT based depth sensor calibration (DDC) and PCA based depth sensor calibration (PDC) methods in Section 3.
We also took into consideration error factors such as occlusion and gravity-center misalignment. When the bed is not directly under the sensor, part of the bed region on the image is hidden by the head of the bed, resulting in occlusion. Therefore, when the depth sensor measures the 3D position of the surface of an object such as the patient, the person's position as acquired by the sensor may be slightly different from the actual gravity center of the patient. Because these occlusion and misalignment error factors affect actual monitoring, we propose the following correction methods: • Installation of the depth sensor in any position and attitude without installation on the ceiling, such as on a tripod or on a bed fence. However, both the bed and floor need to be captured and their total area needs to be larger than the wall area.

•
Automatic calculation of the bed and floor regions, sensor position and attitude, and spatial domain. • Reconfiguration of the spatial domain considering occlusion by the head or foot of the bed.

•
Reconfiguration of the spatial domain considering the gravity center of the patient.
Among these four, the automatic calculations of regions are incorporated into the PDC and DDC methods. Our system automatically calculates a series of processes for the above initial calibration. However, calibration may fail depending on the installation conditions; therefore, it is necessary to visually check the calibration results and, in some cases, re-install.

System Configuration
The initial-calibration method of the bed-monitoring system uses an infrared depth sensor installed in any position and attitude. As shown in Figure 1a, X, Y, and Z denote the length, width, and height axes of the bed surface in the world coordinate system, respectively, and the X', Y', and Z' axes denote the horizontal, vertical, and optical axes in the sensor coordinate system, respectively. Both the origin O and O', which respectively indicate the world and sensor coordinate systems, were set as the focal position of the sensor. • Automatic calculation of the bed and floor regions, sensor position and attitude, and spatial domain. • Reconfiguration of the spatial domain considering occlusion by the head or foot of the bed.
• Reconfiguration of the spatial domain considering the gravity center of the patient.
Among these four, the automatic calculations of regions are incorporated into the PDC and DDC methods. Our system automatically calculates a series of processes for the above initial calibration. However, calibration may fail depending on the installation conditions; therefore, it is necessary to visually check the calibration results and, in some cases, re-install.

System Configuration
The initial-calibration method of the bed-monitoring system uses an infrared depth sensor installed in any position and attitude. As shown in Figure 1a, X, Y, and Z denote the length, width, and height axes of the bed surface in the world coordinate system, respectively, and the X', Y', and Z' axes denote the horizontal, vertical, and optical axes in the sensor coordinate system, respectively. Both the origin O and O', which respectively indicate the world and sensor coordinate systems, were set as the focal position of the sensor.  The infrared depth sensor includes an infrared projector and an infrared camera. It can measure the distance from the sensor to the object, i.e., the Z' value of the object, by time of flight (TOF) or triangulation. Figure 1b is a depth image drawn by converting linearly so as to become brighter as Z' value increases using an infrared depth sensor XTION PRO of ASUSTeK Computer Inc. Black regions with intensities 0 of Figure 1b are defective regions where we could not acquire depth data. The The infrared depth sensor includes an infrared projector and an infrared camera. It can measure the distance from the sensor to the object, i.e., the Z' value of the object, by time of flight (TOF) or triangulation. Figure 1b is a depth image drawn by converting linearly so as to become brighter as Z' value increases using an infrared depth sensor XTION PRO of ASUSTeK Computer Inc. Black regions with intensities 0 of Figure 1b are defective regions where we could not acquire depth data. The resolution of the XTION PRO is 320 × 240 pixels, and 3D point cloud data (76,800 points) can be acquired simultaneously.  Figure 1c is a schematic of the spatial domain; axes components Y and Z correspond to those in Figure 1a. The space is divided into eight regions (I to VIII) with reference to the bed boundary and preset threshold values d 1 to d 3 . Threshold d 1 is a height from the bed chosen to discern whether or not the patient is in the bed, and threshold d 2 is a height from the bed to discern whether the patient is in a sitting or a sleeping position. The d 3 is a height threshold from the floor to distinguish whether or not the patient (or an object) is on the floor. If nobody is on the bed, the collected 3D points data do not include region I or II. However, if the patient is asleep on the bed, the 3D points include region II, and if the patient is sitting up on the bed, the 3D points also include region I. In addition, if the patient leaves the bed, the 3D points include regions IV to VII, and if the patient falls on the floor, the 3D points also include region VII and VIII. Previously, some conventional methods detected a patient's getting up by monitoring regions I and II and detected a patient's getting out of bed or falling by monitoring regions IV to VII [22,23].
In order to construct such a bed-monitoring system as we propose, it is necessary to recognize the bed region on the captured image. If the sensor is installed in any position, it is also necessary to recognize the position and attitude of the sensor. In practice, a simple method with an easy initial setup is desirable. In the next two sections, we describe two initial-calibration methods, PDC and DDC, for automatically calculating the bed region and the sensor position and attitude. We also describe how to reconfigure the spatial domain while considering occlusion by the head of the bed and the gravity center of the patient in the later Section 2.3.5, and finally, we describe the spatial-domain monitoring method using these in Section 2.3.6.

PCA-Based Depth Sensor-Calibration (PDC) Method
The relationship between the captured-image coordinate (u, v) and sensor coordinates P u,v is expressed as where I u,v is an intensity of coordinate (u, v); s is a depth interval per unit intensity on Z' axis, C u and C v are the U and V axis coordinates of the optical center, in pixel; and a is a ratio between the focal length and physical pixel size of the sensor. According to the sensor specifications, we used s = 1.6 cm/intensity, (C u , C v ) = (160, 120), and a = 0.003452. Normally, the bed and floor surfaces are flat in real space, and both are the same gradient. We therefore calculate the object gradient in the sensor coordinate system as N u,v , the unit normal vector of the surface of the object whose coordinate on the image are (u, v), by where f PCA3 is a function to calculate the unit eigenvector of the third principal component (the third largest eigenvalue of the covariance matrix); and b is the range of neighboring pixels used for the gradient calculation (we used the empirical value b = 5 pixels). Because N u,v appears as two vectors with the point symmetry, we use a vector closer to P u,v such that (N u,v ·P u,v ) > 0. The condition I u,v > 0 means that defective pixels are excluded. N u,v represents the gradient of the coordinates (u, v) in 3D space. We calculate N u,v for the all pixels on captured image using Equation (2) and make a gradient image using by where J R u,v , J G u,v , and J B u,v are the red, green, and blue intensities in the coordinate (u, v), respectively, and each of these three intensities is in the range of 0 to 254. Figure 2a shows the gradient image of the captured depth image (Figure 1b) visualized by Equation (3). Equation (3) is an equation for linearly rescaling N u,v in which each element is in the range of −1 to 1 to a color image in which each element is in the range of 0 to 254. It is used only for visualization in this paper and debugging of this system, not directly for calibration. for linearly rescaling , in which each element is in the range of −1 to 1 to a color image in which each element is in the range of 0 to 254. It is used only for visualization in this paper and debugging of this system, not directly for calibration.
As shown in Figure 2a, the bed and floor regions are the same color, indicating that these surfaces are the same gradient in real 3D space. In addition, these regions occupy a large area on the image. We therefore using k-means++ algorithm [32] to clustered , ; we set the number of clusters k = 4 and extracted the class with the largest area. The horizontal plane region Φ is defined by pixels belonging to this largest class. The result of extracting Φ from the captured depth image (Figure 1b) is shown in Figure 2b. The region Φ includes the bed, floor, and other horizontal surfaces such as the upper surface of the shelf. These boundaries usually have steps, and in the case of depth images, the steps appear as edges with high-intensity gradients. To distinguish these regions, we extract the edges by applying the widely used Canny algorithm [33,34] and taking a mask (logically multiplying) of Φ and the edges. Figure 2c,d shows the results of detecting the edge of the captured depth image (Figure 1b) and the mask image, respectively. The Canny algorithm requires edge extraction thresholds T1 and T2 and a parameter σ. These will be examined in the pre-experiment in Chapter 3. Note that T1, T2, and σ in Figure 2c were calculated as 50, 100, and 3, respectively. Although the bed and floor surfaces are partially connected in the lower part of Figure 2b, they are separated in Figure 2d.
Next, we extract only the bed region from the mask image. In this method, since the bed surface is the monitoring target, that surface area is normally the largest in the mask image. We therefore extract only the bed region by labeling the mask image and extracting the label with the maximum area. Figure 2e shows the result of extracting the bed region, Ψ, from the captured depth image ( Figure 1b).
Then, using the extracted horizontal-plane region Φ and the bed region Ψ, we automatically calculate the distance l between the sensor and the bed surface and the distance m between the sensor and the floor (Figure 1c). We also automatically calculate the sensor attitude (an angle of rotation of the sensor) R which is 3 × 3 matrix that relates the world coordinate system XYZ to the sensor coordinate system X'Y'Z' as When PCA is performed again on the sensor coordinate values , of the pixels in the bed region Ψ, the eigenvectors of the first, second and third principal components respectively indicate the bed length (long axis), bed width (short axis), and the gradient of the bed surface in the sensor coordinate system. Because these directions are defined as X, Y, and Z axes in the world coordinate system, R can be obtained by where fPCA is a function to calculate the unit eigenvectors of the three principal components. From the calculated sensor attitude R, world coordinate value Pu,v of the image coordinates ( , ) can be derived by As shown in Figure 2a, the bed and floor regions are the same color, indicating that these surfaces are the same gradient in real 3D space. In addition, these regions occupy a large area on the image. We therefore using k-means++ algorithm [32] to clustered N u,v ; we set the number of clusters k = 4 and extracted the class with the largest area. The horizontal plane region Φ is defined by pixels belonging to this largest class. The result of extracting Φ from the captured depth image (Figure 1b) is shown in Figure 2b.
The region Φ includes the bed, floor, and other horizontal surfaces such as the upper surface of the shelf. These boundaries usually have steps, and in the case of depth images, the steps appear as edges with high-intensity gradients. To distinguish these regions, we extract the edges by applying the widely used Canny algorithm [33,34] and taking a mask (logically multiplying) of Φ and the edges. Figure 2c,d shows the results of detecting the edge of the captured depth image ( Figure 1b) and the mask image, respectively. The Canny algorithm requires edge extraction thresholds T 1 and T 2 and a parameter σ. These will be examined in the pre-experiment in Chapter 3. Note that T 1 , T 2 , and σ in Figure 2c were calculated as 50, 100, and 3, respectively. Although the bed and floor surfaces are partially connected in the lower part of Figure 2b, they are separated in Figure 2d.
Next, we extract only the bed region from the mask image. In this method, since the bed surface is the monitoring target, that surface area is normally the largest in the mask image. We therefore extract only the bed region by labeling the mask image and extracting the label with the maximum area. Figure 2e shows the result of extracting the bed region, Ψ, from the captured depth image ( Figure 1b).
Then, using the extracted horizontal-plane region Φ and the bed region Ψ, we automatically calculate the distance l between the sensor and the bed surface and the distance m between the sensor and the floor (Figure 1c). We also automatically calculate the sensor attitude (an angle of rotation of the sensor) R which is 3 × 3 matrix that relates the world coordinate system XYZ to the sensor coordinate system X'Y'Z' as When PCA is performed again on the sensor coordinate values P u,v of the pixels in the bed region Ψ, the eigenvectors of the first, second and third principal components respectively indicate the bed length (long axis), bed width (short axis), and the gradient of the bed surface in the sensor coordinate system. Because these directions are defined as X, Y, and Z axes in the world coordinate system, R can be obtained by where f PCA is a function to calculate the unit eigenvectors of the three principal components. From the calculated sensor attitude R, world coordinate value P u,v of the image coordinates (u, v) can be derived by where X u,v , Y u,v , and Z u,v are respectively defined as X, Y, and Z components of P u,v . Given that each eigenvector in Equation (5) appears as two vectors with the point symmetry, we used the vector where Z u,v > 0.
In the bed region Ψ, where X, Y, and Z correspond to the set of X u,v , Y u,v , and Z u,v (lengths, widths, heights in the world coordinate system of 3D points on the bed surface). We calculate a height from the bed surface to the sensor ( Figure 1c), l, by where f Ave is a function to obtain the average value.
In addition, many pixels in the horizontal surface region Φ excluding the bed surface region Ψ represent the floor surface. However, since it does not represent only the floor surface, m is calculated by where f Mode is a function to obtain the mode value. The mode is calculated by making a histogram with interval width τ and extracting the value with the maximum frequency. We set τ = 0.1 cm in consideration of the resolution of the sensor. Details of the spatial domain division ( Figure 1c) using these parameters are described in Section 2.3.6.

D-KHT Based Depth-Sensor Calibration (DDC) Method
We based our depth sensor initial-calibration method (DDC) on the D-KHT method proposed by Vera et al. [31]. They also calculated the gradient of the depth image using Equation (1) and PCA. However, they computed recursively and in block units using Quadtree instead of incorporating all pixel calculations as we proposed in Section 2.2. Their method, using Quadtree, can calculate faster than the method that uses all pixels. They used Quadtree's recursive division criterion (s t ) as the square root of the third eigenvalue of PCA (i.e., the standard deviation of the distance between the plane and the point). Figure 3a shows the result of calculating the gradient of Q i as (Figure 1b), where the i th Quadtree is Q i , s t = 2 cm, and color coding is based on Equation (3).
where Xu,v, Yu,v, and Zu,v are respectively defined as X, Y, and Z components of Pu,v. Given that each eigenvector in Equation (5) appears as two vectors with the point symmetry, we used the vector where Zu,v > 0.
In the bed region Ψ, where , , and correspond to the set of Xu,v, Yu,v, and Zu,v (lengths, widths, heights in the world coordinate system of 3D points on the bed surface). We calculate a height from the bed surface to the sensor (Figure 1c), l, by where fAve is a function to obtain the average value.
In addition, many pixels in the horizontal surface region Φ excluding the bed surface region Ψ represent the floor surface. However, since it does not represent only the floor surface, m is calculated by where is a function to obtain the mode value. The mode is calculated by making a histogram with interval width τ and extracting the value with the maximum frequency. We set τ = 0.1 cm in consideration of the resolution of the sensor. Details of the spatial domain division (Figure 1c) using these parameters are described in Section 2.5.

D-KHT Based Depth-Sensor Calibration (DDC) Method
We based our depth sensor initial-calibration method (DDC) on the D-KHT method proposed by Vera et al. [31]. They also calculated the gradient of the depth image using Equation (1) and PCA. However, they computed recursively and in block units using Quadtree instead of incorporating all pixel calculations as we proposed in Section 2.2. Their method, using Quadtree, can calculate faster than the method that uses all pixels. They used Quadtree's recursive division criterion (st) as the square root of the third eigenvalue of PCA (i.e., the standard deviation of the distance between the plane and the point). Figure 3a shows the result of calculating the gradient of Qi as Ni = (Ni,x, Ni,y, Ni,z) of the image (Figure 1b), where the i th Quadtree is Qi, st = 2 cm, and color coding is based on Equation (3). Vera et al. [31] then converted Ni to 3D polar coordinates μi = (μi,ρ, μi,φ, μi,θ) using Equation (10) and voted the 3D voxel space Wρ,φ,θ in consideration of the area of Quadtree and the probability density function (PDF) of the Gaussian distribution. Vera et al. [31] then converted N i to 3D polar coordinates µ i = (µ i,ρ , µ i,ϕ , µ i,θ ) using Equation (10) and voted the 3D voxel space W ρ,ϕ,θ in consideration of the area of Quadtree and the probability density function (PDF) of the Gaussian distribution.
where M i is the average value of P u,v in the Quadtree; and S ρ , S ϕ , and S θ are voxel sizes of ρ, ϕ, and θ in the W ρ,ϕ,θ , respectively. Subsequently, they smoothed the W ρ,ϕ,θ with a low-pass filter and used the polar coordinates (ρ, ϕ, θ) of the local maximum of the W ρ,ϕ,θ as planes. Because Vera et al. [31] aimed at plane extraction, not bed-region extraction, we propose a method to extract bed-and floor-region candidates by the following procedure of The bed and floor surfaces are the same gradient. That means the polar coordinates (ϕ, θ) of the bed surface are equal to the polar coordinates (ϕ, θ) of the floor surface. They also occupy a large area in the image. Therefore, we respectively set the mode values of ϕ and θ as the horizontal-plane polar coordinates ϕ and θ which represent the bed or floor surfaces. Using the same voxel voting space W ρ,ϕ,θ as Vera et al. [31], these are calculated by 2.3.2.
Step 2: Calculation of the Distance between the Sensor and Floor ρ We set the voxel space of coordinate (ϕ , θ ) as a horizontal-plane space W ρ,ϕ ,θ and set the distance between the sensor and the floor calculated in this procedure as ρ . We set the local maximum value of W ρ,ϕ ,θ farthest from the sensor as ρ .

Step 3: Calculation of Distance ρ between the Sensor and the Bed
We set the distance between the sensor and bed calculated in this procedure as ρ . Assuming that the bed surface is a large area between the floor surface and the sensor, ρ is calculated by where δ, given as a constant in advance, is the lower limit of the height of the bed (to prevent erroneous extraction). The polar coordinates (ρ , ϕ , θ ) represent the bed surface.

Step 4: Calculation of the Floor Surface Φ and Bed-Region Candidate Ω
In this procedural step, we propose two alternatives for calculating the floor surface Φ and bed-region candidate Ω. The first (DDC-D) calculates the bed region with high-density pixel units, and the second (DDC-S) calculates the bed region at high speed in block units of the Quadtree. For both, extraction thresholds η ρ , η ϕ , and η θ are given as constants in advance. For all pixels in the image, gradients N u,v = N u,v,x , N u,v,y , N u,v,z are calculated by using Equation (2) and polar coordinates ν u,v are calculated by If the pixel (u, v) is the floor surface, ν u,v is approximately equal to (ρ , ϕ , θ ). We therefore calculate Φ as follows: Also, if the pixel (u, v) is the bed region, ν u,v is approximately equal to (ρ , ϕ , θ ) of Section 2.3.3. We therefore calculate Ω as follows: Red and blue in Figure 3b show the calculation results of Ω and Φ of the image (Figure 3a) using Equation (15) and Step 2, respectively. We used the following constants: δ = 15 cm, η ρ = 10 cm,

Alternative 2: DDC Method at High Speed (DDC-S)
Recalculation for each pixel is not performed, and horizontal plane Φ is considered the floor surface if the Quadtree is used directly and following is satisfied: Similarly, Ω is considered a bed-region candidate if the Quadtree is used directly and following is satisfied: 2.3.4.3.
Step 5: Calculation of the Bed-Region Ψ Although red in Figure 3b indicates the bed-region candidate, there are some small red areas outside the bed because Ω is extracted not only for the bed surface but also for horizontal surfaces at the same height as the bed. We therefore implement the labeling method for Ω as in Section 2.2 and use a label with the largest area as the final bed region Ψ. Figure 3c shows the result of extracting Ψ from Figure 3b Although it is possible to calculate the sensor attitude R and distances l and m directly using (ρ, ϕ, θ), the Hough transform generates quantization errors [36]. Instead, we calculate R, l, and m by the PDC method (see Section 2.2), using Equations (5) through (9).  (Figure 1a), respectively. The bed regions as shown in Figures 2e and 3c seemed capable of being extracted, but part of the bed region is hidden by the head of the bed; this "occlusion," shown in Figure 4, does not occur at the foot of the bed (in this case), as it happens where the obstruction is closer to the sensor end of the bed. Therefore, we measure the bed length d 4 beforehand, and if the automatically calculated result for bed length is shorter than d 4 , the bed region is extended toward the sensor end. Also, in order to exclude the head and foot of the bed from the bed region, we shorten the bed region by 2ε, as illustrated in Figure 4.   The values of and obtained by Equation (7) are used in Equation (18) to calculate bed edge coordinates Xmin, Xmax, Ymin, and Ymax, and the bed-edge coordinates (Xmin, Xmax, Ymin, and Ymax), and the bed region is expanded and contracted using Equations (19) and (20).
where, and are bed-edge coordinates after expansion and contraction ( Figure 4); and    The values of and obtained by Equation (7) are used in Equation (18) to calculate bed edge coordinates Xmin, Xmax, Ymin, and Ymax, and the bed-edge coordinates (Xmin, Xmax, Ymin, and Ymax), and the bed region is expanded and contracted using Equations (19) and (20).
where, and are bed-edge coordinates after expansion and contraction ( Figure 4); and , , , and are threshold values for excluding outliers. The latter (L values) are calculated using the interquartile method [37]. The values of X and Y obtained by Equation (7) are used in Equation (18) to calculate bed edge coordinates X min , X max , Y min , and Y max , and the bed-edge coordinates (X min , X max , Y min , and Y max ), and the bed region is expanded and contracted using Equations (19) and (20).
where, X min and X max are bed-edge coordinates after expansion and contraction ( Figure 4); and L Xmin , L Xmax , L Ymin , and L Ymax are threshold values for excluding outliers. The latter (L values) are calculated using the interquartile method [37].
where, f q1 and f q3 are functions to calculate the first and third quartiles, respectively. Furthermore, since the depth sensor acquires the surface shape, the patient position acquired by the sensor may be slightly different from the actual gravity center of the patient. For example, if the patient protrudes halfway from the end of the bed on the sensor side (e.g., "Human Model 1" in Figure 5), the patient's image is in the blue-arc position, mostly outside the bed. Conversely, if the patient protrudes halfway from the end of the bed opposite the sensor side (e. g., "Human Model 2" in Figure 5), the patient's image is in the red-arc position, mostly inside the bed. As a result, if the patient gets out of or falls off the bed on the opposite side of the sensor, the alarm sensitivity will be poor.
To address this problem, we consider human models with a radius r that protrude halfway from the end of the bed as shown in Figure 5, where r is set in advance. Next, we calculate differences e 1 and e 2 between surface center positions and gravity-center positions of the human models using and e 2 = r sin ϕ 2 = r sin Finally, bed positions Y min and Y max are moved to Y min and Y max respectively using and where X min , X max , Y min , and Y max are initial calibration variables; and e 1 and e 2 are the surface-and gravity-center differences, as calculated in Equations (22) and (23), respectively.

Recognition of the Spatial Domain
In the actual monitoring process, using the previously calculated initial-calibration variables, we derive the world coordinate values P u,v = (X u,v , Y u,v , Z u,v ) by applying Equations (1) and (6) to each pixel (u, v) on the monitoring image. Then the eight spatial domains D u,v (Figure 1c) are calculated by where E u,v is a variable indicating whether location is inside or outside the bed region (1 or 0, respectively) as Finally, the number of pixels (area) included in each domain is counted, and automatic detection of action such as sitting up in bed or exiting the bed is performed by using the area variation of each domain.

Experimental Methods
We devised experimental methods to verify the effectiveness of our proposed method for bed monitoring. First, we set the ASUS XTION PRO depth sensor (resolution: 320 × 240 pixel) at 18 viewpoints H 1,j to H 6,j (six X-Y locations, at three heights from the floor: j = 1.5, j = 1.8, and j = 2.0 m) ( Figure 6) and take depth images with the bed fence down. (The height difference between the bed surface and the bed fence is 4 cm) Next, we conducted the pre-experiments for examining the optimal parameters of the Canny algorithm of the PDC method in Section 2.2. The results are described in Section 3.2. Afterwards, we compare the calibration precision and time of the three methods (PDC, DDC-D, and DDC-S) described earlier in Section 2, by running Experiments 1 through 4 below. We also verified the previously discussed reconfiguration of the bed region and recognition of the spatial domain in Experiments 5 and 6 below. where , is a variable indicating whether location is inside or outside the bed region (1 or 0, respectively) as Finally, the number of pixels (area) included in each domain is counted, and automatic detection of action such as sitting up in bed or exiting the bed is performed by using the area variation of each domain.

Experimental Methods
We devised experimental methods to verify the effectiveness of our proposed method for bed monitoring. First, we set the ASUS XTION PRO depth sensor (resolution: 320 × 240 pixel) at 18 viewpoints H1,j to H6,j (six X-Y locations, at three heights from the floor: j = 1.5, j = 1.8, and j = 2.0 m) ( Figure 6) and take depth images with the bed fence down. (The height difference between the bed surface and the bed fence is 4 cm) Next, we conducted the pre-experiments for examining the optimal parameters of the Canny algorithm of the PDC method in Section 2.2. The results are described in Section 3.2. Afterwards, we compare the calibration precision and time of the three methods (PDC, DDC-D, and DDC-S) described earlier in Section 2, by running Experiments 1 through 4 below. We also verified the previously discussed reconfiguration of the bed region and recognition of the spatial domain in Experiments 5 and 6 below.
Since the PDC method recognizes a label having the maximum area as a bed area, the bed area must be taken larger than the floor area. Also, our system requires is that both the bed and floor are taken and this total area is larger than the wall area. Therefore, in the above Experiments 1 through 6, the sensor was installed so as to satisfy these conditions. On the other hand, in order to verify these conditions themselves, we experimented without satisfying these conditions in Experiment 7. These methods are shown in Section 3.1.7 and results are shown in Section 3.5.
The experiments in this section are accuracy verification experiments of the proposed method. Although we should compare with some conventional methods, conventional methods listed in this paper [23,24,26] did not have detailed designs sufficient to reproduce; therefore, we excluded the comparison with the conventional method. The height of the bed as measured (46 cm) is compared to the value calculated, which is the average of 18 viewpoints of (m − l). The standard deviation of (m − l) is also calculated and discussed.  Since the PDC method recognizes a label having the maximum area as a bed area, the bed area must be taken larger than the floor area. Also, our system requires is that both the bed and floor are taken and this total area is larger than the wall area. Therefore, in the above Experiments 1 through 6, the sensor was installed so as to satisfy these conditions. On the other hand, in order to verify these conditions themselves, we experimented without satisfying these conditions in Experiment 7. These methods are shown in Section 3.1.7 and results are shown in Section 3.5.
The experiments in this section are accuracy verification experiments of the proposed method. Although we should compare with some conventional methods, conventional methods listed in this paper [23,24,26] did not have detailed designs sufficient to reproduce; therefore, we excluded the comparison with the conventional method.

Experiment 1
The height of the bed as measured (46 cm) is compared to the value calculated, which is the average of 18 viewpoints of (m − l). The standard deviation of (m − l) is also calculated and discussed.

Experiment 2
The width of the bed as measured (95 cm) is compared to the value calculated, which is the average of the bed width |Y max − Y min | obtained during calibration. The standard deviation of |Y max − Y min | is also calculated and discussed.

Experiment 3
The image coordinates set K of the four corners of the bed is calculated using and where, B and B' are the coordinates set of the four bed corners in the world coordinate and sensor coordinate systems, respectively, and Next, we visually evaluate whether K properly indicates the bed corners, according to the following criteria: The default parameters are results of empirical calculations for the highest precision and are the same as in the "Calibration Method" section. Specifically, we used the range of neighboring pixels b = 5 in Equation (2), the number of clusters k = 4 in the k-means++ algorithm, the Quadtree's recursive-division criterion s t = 2 cm, the lower limit of the height of the bed δ = 15 cm in Equation (12), the extraction thresholds of the horizontal plane η ρ = 10 cm, η ϕ = 20 • , and η θ = 20 • , and voxel sizes S ρ = 2 cm, S ϕ = 1 • , and S θ = 1 • . The Canny parameters T 1 , T 2 , and σ will be examined in the pre-experiment in Section 3.2. K is recalculated by replacing X min , X max , Y min , and Y max in Equations (28)-(30) with X min , X max , Y min , and Y max respectively; the parameters r and ε are set to 20 and 10 cm, respectively; and the bed length d 4 is set to 198 cm from actual measurement. Whether the method described in the section "Reconfiguration of the Bed Region" yields the expected result is verified visually.

Experiment 6
The motion of a patient's getting out of bed is simulated, and we take the depth images of the motion at 5 frames per second using the sensor installed at the viewpoint H 4,2 ( Figure 6). Thresholds d 1 , d 2 , and d 3 are set to 15, 50, and 30 cm, respectively, and then the spatial domain D u,v is calculated for all images. Whether the method described in the section "Recognition of the Spatial Domain" is correct is verified visually.
In this paper, given that we focus on initial calibration, verifications of r, d 1 , d 2 , and d 3 are excluded.

Experiment 7
In order to confirm the limitations of our method, we experiment whether initial calibrations are possible under the following two specific conditions.

•
Install and take an image so that the floor area is larger than the bed area.

•
Install and take an image so that the wall area is larger than the sum of the bed and floor areas.
In both cases, we installed the depth sensor at the position H 4,1.5 in Figure 6 and took depth images with the sensor tilted.

Results of Pre-Experiment
In order to investigate the optimal parameters of the Canny algorithm of the PDC method in Section 2.2, we tried edge extractions under the following conditions for depth images (value range 0-255) taken from 18 viewpoints in Figure 6. Figure 7 shows processing results of captured images at the viewpoint of H 2,1.5 . In all cases where σ = 5, too many lines were extracted (noisy). When σ = 3 and T 1 = 0, the results were slightly noisy. When σ = 3 and T 1 = 100, the bed heads were lacking (arrow parts). When σ = 3, T 1 = 50, and T 2 = {150, 200}, although they did not affect the calibration, the bed heads were slightly lacking (arrow parts). Therefore, when σ = 3, T 1 = 50, and T 2 = 100, the edge could be extracted most clearly. Since the same results were obtained from the other 17 viewpoints, we used the above values in Experiments 1 to 7. Note that the optimum parameters for the Canny algorithm may vary slightly depending on the type of depth sensor used.

•
= 3, 5 . Figure 7 shows processing results of captured images at the viewpoint of H , . . In all cases where = 5, too many lines were extracted (noisy). When = 3 and = 0, the results were slightly noisy. When = 3 and = 100, the bed heads were lacking (arrow parts). When = 3, = 50, and = 150, 200 , although they did not affect the calibration, the bed heads were slightly lacking (arrow parts). Therefore, when = 3, = 50, and = 100, the edge could be extracted most clearly. Since the same results were obtained from the other 17 viewpoints, we used the above values in Experiments 1 to 7. Note that the optimum parameters for the Canny algorithm may vary slightly depending on the type of depth sensor used. Figure 7. Pre-experimental results for analyzing optimal parameters of the Canny algorithm. Figure 7. Pre-experimental results for analyzing optimal parameters of the Canny algorithm.

Results and Discussion of Calibration Experiments (Experiments 1-4)
The results of the calibration experiments (1 through 4) are shown in Table 1 and Figure 8. Across all three calibration methods (PDC, DDC-D, and DDC-S), the average values for Experiments 1 and 2 were close to the measured values 46 and 95 cm, respectively, and there were no significant differences among the three methods. Also, the standard deviations for Results 1 and 2 were about 1 and 5 cm, respectively, and there were no significant differences among the three methods. Since the resolution of the sensor used in the experiment is about 2 cm, all three methods were able to acquire the bed height and width with accuracy close to sensor resolution. In Result 3, only DDC-S yielded a low point score. To investigate this, we examined the data progression during the calibration process. Figure 8 shows an example of the process of calibration of the image acquired at viewpoint H 2,2 , and Figure 8a-c show the results of the three methods of drawing the bed region as rectangles made by calculating K, using Equation (30). Figure 8d shows the result of drawing Ω and Φ regions by the PDC method, and Figure 8e,f respectively show the drawing results of Ψ and Φ regions by the DDC-D and DDC-S methods. The red color in each image represents the bed surface Φ. However, note that the blue color in Figure 8e,f represents only the floor surface Ψ, whereas that in Figure 8d represents all the Ω horizontal planes, including the floor, shelf, and other surfaces. Figures 1b, 2 and 3 also show progressive data for images acquired at viewpoint H 2,2 . In Figure 8d,e, the bed regions were extracted appropriately at the pixel level. However, in Figure 8f, because DDC-S is an algorithm to extract the regions at the block level, the bed region was not extracted at the pixel level. We assume that this adversely affected the bed position calculation, as shown in Figure 8c, and lowered the visual-observation score for Result 3 in Table 1.  The results of the calibration experiments (1 through 4) are shown in Table 1 and Figure 8. Across all three calibration methods (PDC, DDC-D, and DDC-S), the average values for Experiments 1 and 2 were close to the measured values 46 and 95 cm, respectively, and there were no significant differences among the three methods. Also, the standard deviations for Results 1 and 2 were about 1 and 5 cm, respectively, and there were no significant differences among the three methods. Since the resolution of the sensor used in the experiment is about 2 cm, all three methods were able to acquire the bed height and width with accuracy close to sensor resolution.
In Result 3, only DDC-S yielded a low point score. To investigate this, we examined the data progression during the calibration process. Figure 8 shows an example of the process of calibration of the image acquired at viewpoint H2,2, and Figure 8a through 8c show the results of the three methods of drawing the bed region as rectangles made by calculating K, using Equation (30). Figure  8d shows the result of drawing Ω and Φ regions by the PDC method, and Figure 8e,f respectively show the drawing results of Ψ and Φ regions by the DDC-D and DDC-S methods. The red color in each image represents the bed surface Φ. However, note that the blue color in Figure 8e,f represents only the floor surface Ψ, whereas that in Figure 8d represents all the Ω horizontal planes, including the floor, shelf, and other surfaces. Figures 1b, 2 and 3 also show progressive data for images acquired at viewpoint H2,2. In Figure 8d and 8e, the bed regions were extracted appropriately at the pixel level. However, in Figure 8f, because DDC-S is an algorithm to extract the regions at the block level, the bed region was not extracted at the pixel level. We assume that this adversely affected the bed position calculation, as shown in Figure 8c, and lowered the visual-observation score for Result 3 in Table 1.
For the three methods, calibration time (Result 4 in Table 1) was the fastest for DDC-S and For the three methods, calibration time (Result 4 in Table 1) was the fastest for DDC-S and slowest for PDC. Initial calibration is performed only once after installing the sensor, and robustness is more important than calculation speed in many cases. Therefore, DDC-D and PDC are usually more effective than DDC-S.
In this paper, we experimented with a depth sensor with a resolution of 320 × 240 pixels and obtained calibration accuracy of several centimeters. In bed monitoring, it will be possible to operate with this accuracy. On the other hand, depth sensors with higher resolution have been developed in recent years and we can be expected to improve accuracy by using them although processing time will increase.

Results and Discussion of Bed-Region and Spatial-Domain Experiments (Experiments 5-6)
The results of bed-region-reconfiguration and spatial-domain-recognition experiments (5 and 6) are shown in Figures 9 and 10. The result of reconfiguration of the bed region imaged in Figure 8b and calculated using the method described in the section "Reconfiguration of the Bed Region" is shown in Figure 9a. In this reconfiguration, the bed region spread to include more of the lower sides than in Figure 8b and included the region at the foot of the bed. Figure 9b,c show the results of calculation of the bed region before and after reconfiguration for images acquired at another viewpoint, H 4,2 . As expected, the bed region shown in Figure 9c is extended more to the lower right (toward the sensor) than the one of Figure 9b. All the other viewpoints also produced the expected results. These outcomes suggest that the calculations in Equations (18) through (21) are effective for solving the problem of occlusion by the head of the bed. than in Figure 8b and included the region at the foot of the bed. Figure 9b,c show the results of calculation of the bed region before and after reconfiguration for images acquired at another viewpoint, H4,2. As expected, the bed region shown in Figure 9c is extended more to the lower right (toward the sensor) than the one of Figure 9b. All the other viewpoints also produced the expected results. These outcomes suggest that the calculations in Equations (18) through (21) are effective for solving the problem of occlusion by the head of the bed. Subsequently, we simulated a state of being in the lateral decubitus position on the edge of the bed, acquired the depth image at the viewpoint H4,2, and carried out the verification experiment by applying the spatial-domain calculation Equation (26). Figure 10a shows the captured depth image, and Figure 10b,c respectively show segmentation images before and after reconfiguration of the bed by applying calculations in Equations (19), (20), (24), and (25). For each pixel in Figure 10b,c, regions I to VIII were colored with light blue, blue, purple, yellow, red, light pink, dark gray, and light gray, respectively. Note that black indicates defective pixels and white indicates saturated pixels which are located farther than the photographable region and the exact position cannot be calculate.
In Figure 10b, most of the body was determined to be in the bed (blue) despite simulating the state in which part of the body protruded outside the bed. In contrast, in Figure 10c, about half of the body was determined as outside the bed (red). These results suggest that the calculations in Equations (22) through (25) are effective for reconfiguration of the spatial domain considering the gravity center of the patient. In addition, whereas the head of the bed in Figure 10b is mostly region II (blue), it is than in Figure 8b and included the region at the foot of the bed. Figure 9b,c show the results of calculation of the bed region before and after reconfiguration for images acquired at another viewpoint, H4,2. As expected, the bed region shown in Figure 9c is extended more to the lower right (toward the sensor) than the one of Figure 9b. All the other viewpoints also produced the expected results. These outcomes suggest that the calculations in Equations (18) through (21) are effective for solving the problem of occlusion by the head of the bed. Subsequently, we simulated a state of being in the lateral decubitus position on the edge of the bed, acquired the depth image at the viewpoint H4,2, and carried out the verification experiment by applying the spatial-domain calculation Equation (26). Figure 10a shows the captured depth image, and Figure 10b,c respectively show segmentation images before and after reconfiguration of the bed by applying calculations in Equations (19), (20), (24), and (25). For each pixel in Figure 10b,c, regions I to VIII were colored with light blue, blue, purple, yellow, red, light pink, dark gray, and light gray, respectively. Note that black indicates defective pixels and white indicates saturated pixels which are located farther than the photographable region and the exact position cannot be calculate.
In Figure 10b, most of the body was determined to be in the bed (blue) despite simulating the state in which part of the body protruded outside the bed. In contrast, in Figure 10c, about half of the body was determined as outside the bed (red). These results suggest that the calculations in Equations (22) through (25) are effective for reconfiguration of the spatial domain considering the gravity center of the patient. In addition, whereas the head of the bed in Figure 10b is mostly region II (blue), it is Subsequently, we simulated a state of being in the lateral decubitus position on the edge of the bed, acquired the depth image at the viewpoint H 4,2 , and carried out the verification experiment by applying the spatial-domain calculation Equation (26). Figure 10a shows the captured depth image, and Figure 10b,c respectively show segmentation images before and after reconfiguration of the bed by applying calculations in Equations (19), (20), (24), and (25). For each pixel in Figure 10b,c, regions I to VIII were colored with light blue, blue, purple, yellow, red, light pink, dark gray, and light gray, respectively. Note that black indicates defective pixels and white indicates saturated pixels which are located farther than the photographable region and the exact position cannot be calculate.
In Figure 10b, most of the body was determined to be in the bed (blue) despite simulating the state in which part of the body protruded outside the bed. In contrast, in Figure 10c, about half of the body was determined as outside the bed (red). These results suggest that the calculations in Equations (22) through (25) are effective for reconfiguration of the spatial domain considering the gravity center of the patient. In addition, whereas the head of the bed in Figure 10b is mostly region II (blue), it is shown as mostly region V (red) in Figure 10c. The effect of ε (see Figure 4), this makes it possible to limit region II to only the identification of the presence or absence of a patient.

Results and Discussion of Two Specific Condition Experiments (Experiment 7)
This section describes the calibration results of depth images taken under the two specific conditions. Figure 11a shows a depth image taken by setting up so that the floor area is larger than the bed area. Figure 11b-d respectively shows the initial calibration results of Figure 11a using the PDC, DDC-D, and DDC-S, and their red and blue colors respectively represent regions extracted as the bed surface Φ and the floor surface Ψ. Figure 11c,d succeeded in extraction because they were colored as expected, however Figure 11b failed because red and blue were opposite. Figure 11e-g respectively represents bed regions before the correction calculated based on Figure 11b-d. These results also showed that DDC-D and DDC-S succeeded but PDC failed. The calculated bed heights of DDC-D and DDC-S were 44.9 and 45.8 cm, respectively, which were close to the true value (46 cm). Similarly, the calculated bed widths of them were 99.3 and 98.3 cm, respectively, which were also close to the true value (95 cm). These results show that the bed area must be larger than the floor area in the PDC as described in Section 2.2, however DDC-D and DDC-S can be calibrated under these conditions. and floor areas. Figure 12a shows the captured depth image, and Figure 12b-d shows the results of calibration with the PDC, DDC-D, and DDC-S, respectively. Similar to Figure 11, their red and blue colors respectively represent regions extracted as the bed surface Φ and the floor surface Ψ. In Figure  12b, the back wall was recognized as a bed region. Also, in Figure 12c,d, the back walls were recognized as floor regions. Therefore, as expected, both our methods do not work under this condition. Next, we installed and took an image so that the wall area was larger than the sum of the bed and floor areas. Figure 12a shows the captured depth image, and Figure 12b-d shows the results of calibration with the PDC, DDC-D, and DDC-S, respectively. Similar to Figure 11, their red and blue colors respectively represent regions extracted as the bed surface Φ and the floor surface Ψ. In Figure 12b, the back wall was recognized as a bed region. Also, in Figure 12c,d, the back walls were recognized as floor regions. Therefore, as expected, both our methods do not work under this condition.
results also showed that DDC-D and DDC-S succeeded but PDC failed. The calculated bed heights of DDC-D and DDC-S were 44.9 and 45.8 cm, respectively, which were close to the true value (46 cm). Similarly, the calculated bed widths of them were 99.3 and 98.3 cm, respectively, which were also close to the true value (95 cm). These results show that the bed area must be larger than the floor area in the PDC as described in Section 2.2, however DDC-D and DDC-S can be calibrated under these conditions. Next, we installed and took an image so that the wall area was larger than the sum of the bed and floor areas. Figure 12a shows the captured depth image, and Figure 12b-d shows the results of calibration with the PDC, DDC-D, and DDC-S, respectively. Similar to Figure 11, their red and blue colors respectively represent regions extracted as the bed surface Φ and the floor surface Ψ. In Figure  12b, the back wall was recognized as a bed region. Also, in Figure 12c,d, the back walls were recognized as floor regions. Therefore, as expected, both our methods do not work under this condition.

Conclusions
We proposed a calibration method for a bed-monitoring system using an infrared-image depth sensor. This method can calculate the bed region, floor region, and sensor position and attitude automatically and robustly. The system using our method can recognize how elevated a patient is and whether the patient is in or out of the bed, even when the sensor is attached in any position and attitude without installation on the ceiling, such as on a tripod or on a bed fence.
For the recognition of the bed and floor regions, we propose three methods-the PCA-based depth-sensor calibration (PDC) method and the two D-KHT-based depth-sensor calibration methods characterized by either high density (DDC-D) or high speed (DDC-S). The PDC method is based on PCA, k-means++ clustering, and the Canny edge-detection method. Both the DDC-D and DDC-S methods are based on the D-KHT plane-detection method. The DDC-S method estimates the bed and floor regions directly using the plane-detection results, whereas the DDC-D method repeatedly calculates both regions for each pixel after applying D-KHT. Experimental results show that DDC-S calibrates at high speed but with low robustness, and although PDC and DDC-D require more calibration time than DDC-S, their robustness is higher. In many cases, since initial calibration does not require real-time processing, PDC or DDC-D are more useful than DDC-S.
However, the PDC method must be taken so that the bed area is larger than the floor area. Also, our system requires that both the bed and floor are taken and these total area is larger than the wall area. We also experimented with these specific conditions and confirmed that the calibration failed. If the calibration fails during actual use, it is necessary to change the orientation of the sensor or change the installation position.
Furthermore, we proposed a method of reconfiguration of the spatial domain that considers both occlusion by the head (or foot) of the bed and the gravity center of the patient. Experimental results of multi-view calibration and motion simulation show that this method is effective for spatial-domain recognition.
Future work will include improved timing of detection of a patient's rising or sitting up, getting out of bed, falling, and other potentially dangerous movements as well as verification of effectiveness in actual hospital and other care facilities.