Lightweight Neural Network for Sleep Posture Classification Using Pressure Sensing Mat at Various Sensor Densities

Recently, pressure-sensing mats have been widely used to capture static and dynamic pressure over sleep for posture recognition. Both a full-size mat with a low-density sensing array for figuring out the structure of the whole body and a miniature scale mat with a high-density sensing array for identifying the local characteristics around the chest have been investigated. However, both of the mat systems may face the challenge in the trade-off between the computational complexity (involving the size, density, etc. of the mat) and the performance of sleep posture recognition, where high performance may requires overcomplex computation and result in time latency in real-time sleep posture monitoring. In this paper, a lightweight neural network named ConcatNet, is proposed to realize sleep postures (supine, left, right, and prone) recognition in real time while maintaining a favorable performance. In ConcatNet, the inception module is proposed to extract the image features under multiple receptive fields, while the multi-layer feature fusion module is utilized to fuse deep and shallow features to enhance the model performance. To further improve the efficiency of the model, the depthwise convolution is adopoted. ConcatNet models in 3 different scales (ConcatNet-S, ConcatNet-M, and ConcatNet-L) are built to explore the impact of the sensor density on sleep posture recognition performance. Experimental results exhibit that ConcatNet-M corresponding to medium sensor density ( ${16}\times {16}$ ) achieved the best comprehensive performance, with short-term data cross-validation accuracy at 95.56% and overnight data testing accuracy at 94.68%. The model size is 7.91KB, FLOPs is 56.47K, and the inference time is only 0.38ms, which shows an outstanding performance of real-time sleep posture recognition with minimum consumption, indicating the potential to be deployed in mobile devices.


I. INTRODUCTION
S LEEPING is an indispensable part of daily life for human, as it plays crucial roles in brain functions including neurobehavioral, cognitive and safety-related performance [1], etc.In sleep monitoring, sleep posture is an important indicator.To illustrate, staying in the wrong sleep postures can lead to tension in the cervical and lumbar spine, causing pain and even related diseases [2].In addition, improper sleep postures can cause disturbance in respiration and even sleep apnea, which can lead to death by suffocation in severe cases.What's more, for those patients who are bedridden for long-term have a higher probability of having other disease, such as decubitus ulcers due to long-term pressure [3].There are around 700000 patients were affected by decubitus ulcers each year [4] and about 60000 people died because of the pressure ulcer complication worldwide [5].Therefore, it is of great significance for real-time monitoring of sleep posture, not only to alert people to avoid improper sleeping positions but also to remind them not to maintain the same sleep posture for a long time.
The challenges of sleep posture monitoring mainly come from its application scenario, which is long-term and realtime sleep posture monitoring in home environment.Since it involves long-term, real-time monitoring, sleep and private environment (e.g., bedroom), it has higher requirements for the system in terms of robustness, real-time performance, ease of use, comfort and privacy protection.In recent studies, methods on sleep posture monitoring could be mainly classified into 4 categories, including wearable device-based [6], [7], camerabased [8], [9], radio-frequency-based [10], [11], and pressure sensitive mat-based.For wearable device-based methods, the devices attach directly to the human body, which may cause discomfort and interfere with normal sleep.In addition, motion artifacts can also affect the accuracy of the system.In camerabased systems, the image quality is greatly affected by ambient lighting, and the use of camera in bedrooms may raise privacy concerns.As for radio-frequency-based approaches, some radar devices are expensive and require installation.In addition, the presence of clutter in indoor settings may produce multipath effects [12], and factors such as distance, location, environment, and identity difference may also affect the robustness of the system [13].
In contrast to the aforementioned approaches, pressure sensitive mat-based systems gain the benefits as follows: 1) In the usage scenario, the pressure sensing mat can be placed under the bed sheet, without directly contact with the human body, which ensures the comfort; 2) The system is relatively stable and less affected by the environment (e.g. the change of light, the cover of quilt, position and movement of surrrounding objects); 3) The smart mat leads to fewer concerns on privacy issue; 4) The material cost is relatively low.
The initial idea of the pressure sensing mat, starts with building a full-size bed to collect the pressure distribution of the whole human body, then conduct a visualization operation by converting the data into the format of image.The feature of the image could be extracted and classified into different sleep postures by proposing traditional machine learning methods or deep learning methods.In some studies, pressure sensing mats in large scale are deployed [14], [15], [16], [17], [18], [19].For example, Xu et al. [15] proposed a mat with 8192 (128 × 64) pressure sensors in 1.62 m 2 area.In [16], a pressure sensor array consists of 1728 (64×27) sensors within the dimensions of 185 × 76cm region is implemented.And the smart mat used in [17] has 64 × 32 for a total of 2048 sensors in a bedsize area.These works proved the feasibility of sleep posture classification based on pressure sensing mat.However, when people lie on the large-scale smart mat, the body only covers some of the sensors, which means part of the sensors are not utilized in practical, which may bring redundant information.Besides, large-scale mat has higher production cost, and in most cases, has more sensors which increase the computational complexity.To address above issues, a miniature-scale smart mat implemented by 1024 (32 × 32) pressure sensors with the size of 55 × 55cm is proposed in our previous work [20], [21], [22].The mat covers the major regions of the chest, part of the shoulders and part of the hips, which is feasible to classify the sleep postures.Liu et al. also implement a pressure sensing mat of 100 × 60cm with 2048 sensors in [23] and [24].
In the existing research, from the perspective of hardware, most studies use large-size pressure sensing mats [14], [15], [16], [17], [18], [19], which have shortcomings like area redundancy and high cost as mentioned earlier.From the perspective of algorithm, some studies have used traditional machine learning algorithms [14], [15], [16], [21], [23], which have the disadvantages of requiring manual feature extraction and limited recognition capabilities.Another part of the research uses deep learning algorithms [17], [18], [19], [20], [22], [24].In order to achieve higher accuracy in sleep posture recognition, most methods use neural networks with higher complexity, which makes it difficult to achieve real-time sleep posture monitoring.Meanwhile, complex models are difficult to deploy on devices with small capacity and low computing power, so there are certain limitations in corresponding application scenarios.Although some methods use lightweight models, they still need to be used with data calibration [19] or other algorithm modules [22], which will increase the complexity of the algorithm to a certain extent, having negative impact on the real-time performance.In terms of model architecture, [17], [19], [24] adopted the most common serial CNN structure.The Inception-V3 [18] contains parallel convolutional modules and can extract multi-scale features.The ResNet18 [20] introduced the residual structure, making it possible to build very deep networks.The Tiny-MobileNetV2 [22] also deployed the residual structure while the model is much shallower than ResNet18.While in this paper, the proposed lightweight CNN, ConcatNet contains both parallel convolution structure and multi-layer feature fusion structure, which is mainly designed for the characteristics of the pressure distribution images, and the motivation for the specific architecture will be elaborated in Section II-C.4).
The information, or the feature contained in the pressure distribution image, is mainly affected by 2 factors.The first one has already been motioned above, is the size of the mat, in other words, the body coverage of the mat, which affects the morphology of the image.The second is the sensor density of the mat, which influences the fineness and details of the image.The higher of the sensor density, the more details the image contains.There's no doubt that adequate information is indispensable to classify sleep postures, however, redundant information may have negative influence on classification.For instance, if the sensor density is too high, on the one hand, computational complexity increases sharply, making it hard to implement real-time sleep posture classification; on the other hand, as body shape and weight distribution vary from person to person, some non-universal features brought by high density may be involved in the algorithm, which may reduce the robustness of the classification model.However, in the existing studies [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], whether in large-scale or small-scale mat, the sensor density has not been explored.Besides, due to the high density of the mat, most of the previous works implemented sleep posture classification of offline instead of real-time, which did not meet the requirement of real-time performance mentioned previously.In this paper, to improve the real-time performance of our smart mat system, we explored the influence of the sensor density on sleep posture classification by proposing a lightweight neural network called ConcatNet in 3 sizes (ConcatNet-S, ConcatNet-M, ConcatNet-L), which correspond to 3 different sensor densities (low-density, medium-density, and high-density), respectively.And the method implements real-time sleep posture classification in a miniature-scale, low-density pressure sensing mat proposed in our previous work [20].The novelty of this paper is summarized as follows: 1) A lightweight deep learning neural network is proposed for real-time sleep posture recognition, which reduces large amounts of model size, computational complexity and time consumption of model inference compared with the existing classical models.At the same time, it's able to achieve comparable performance, paving the potential way for the deployment on mobile devices.2) A trade-off between the sensor density of the mat and the performance of sleep posture recognition model is analyzed.The ConcatNet in 3 sizes are implemented, and the comparison between the performance of the neural networks corresponding to different sensor densities are carried out.As a result, it demonstrates the possibility of utilizing a miniature-scale, low-density mat to implement real-time sleep posture classification.The rest of the paper is organized as follows: Section II presents the algorithm for sleep posture classification, including pre-processing, channel selection and the classification model.Section III shows the results of sleep posture classification with model performance.The discussion and conclusion are presented in Section IV and V, respectively.

II. ALGORITHM
The algorithm flow is shown in Fig. 1.The raw pressure distribution image in the resolution of 32 × 32 is first pre-processed to remove the noise, then the image is downsampled to 16 × 16 and 8 × 8 to implement the channel selection.The images in 3 sizes are then sent to the corresponding model to realize sleep posture classification.The advantage of the proposed framework is that the computational complexity is relatively low, as the only steps before sleep posture classification are pre-processing and channel selection, both of which have very little computational cost.

A. Pre-Processing
The raw images of the dataset are firstly processed to reduce the noises, including threshold filtering that removes the background pixels, peak filtering that reduces the influence of adjacent areas, and Gaussian filtering that makes the images smoother.The detailed information on pre-processing can be found in our previous work [20].

B. Channel Selection
In many previous works, due to the large size and high density of the pressure sensing mat, pressure distribution images in high resolution were collected and used directly to train the deep learning model, resulting in a relatively large size of the model and high computational complexity.Higher resolution pressure distribution image contains more details and more information, but for the sleep posture classification task, the extra information contained in the high-resolution images is actually redundant.Therefore, if the size of the pressure distribution images could be reduced, the size and complexity of the network could also be reduced, which can save the time consumption of training and inferencing, making it easier to realize real-time sleep posture monitoring.
In order to implement the channel selection and explore the influence of sensor density on the performance of the sleep posture classification network, the original images with the resolution of 32 × 32 are down sampled to the resolution of 16×16 and 8×8, by selecting the channels at intervals, which is shown in Fig. 2. Deep neural networks in different sizes are implemented for each corresponding resolution.

C. Sleep Posture Classification Model
To realize real-time sleep posture monitoring, we have designed a lightweight convolutional neural network called ConcatNet.The network gets the name because of the extensive utilization of feature map concatenation operations in the network structure.The overall structure, basic module, optimization measures of the model and the motivation for the architecture will be introduced successively.ConcatBlock-A mainly consists of two parts.The first part is the Inception module originated from the Inception-v3 network [25].Assuming that the size of the input feature map is H × W ×C (H for height, W for width, and C for channel).First, a parallel convolution of 4 branches is performed.The first branch uses a convolution with the kernel size of 1 × 1, step length of 1, and kernel number of 2 × C. Based on the first branch, the second branch adds a depthwise convolution whose kernel size is 3 × 3 and step length is 1.The depthwise convolution proposed here is to reduce the model size and computational complexity, which will be discussed in the next part.The third branch adds another depthwise convolution with the kernel size of 3 × 3 and step length of 1 based on the second branch.And the fourth branch uses an average pooling layer with the kernel size of 3 × 3, step length of 1, and a convolution layer with the kernel size of 1 × 1, step length of 1, and kernel number of 2 × C. The size of the output feature map of each branch is H × W × (2 × C), and then the output feature maps of the four branches are Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
The structure of ConcatBlock-B is similar to ConcatBlock-A, as shown in Fig. 5.The only difference is that the kernel number of the convolutions in ConcatBlock-B is reduced to (1/2) × C, so the size of the output feature map is As we can see in Fig. 3, in the 3 networks, ConcatBlock-A is set as the first block.The input pressure distribution image is a single channel image, we use ConcatBlock-A to extract the features and expand the channel dimension to 8.For the rest of the ConcatBlocks, we set them as ConcatBlock-B, whose expand ratio of the depth of the feature map is 2 instead of 8 for ConcatBlock-A.By this way, the expansion speed of channel dimension is slower and the computation cost could be reduced.
The architectures of 3 ConcatNets are shown in Table I.
3) Lightweight Method -Depthwise Convolution: As mentioned above, in the ConcatBlock, depthwise convolution is proposed in the second and third branch of the parallel convolution, and the down sampling module.As the name implies, depthwise convolution is to apply a single channel convolution kernel to each channel of the input feature map for a convolution operation in the depth direction, that is, in the channel dimension.Then, the obtained feature map is concatenated together in the channel dimension to obtain the output feature map.Thus, the number of channels of the input feature map and the output feature map are consistent.Depthwise convolution has been applied in classic network models such as MobileNet [26] and Xception [27].Compared with standard convolution, depthwise convolution can reduce the number of parameters and the computational complexity.Fig. 6 and Fig. 7 show the process of standard convolution and depthwise convolution respectively.
Suppose the size of the input feature map is H i × W i × C, the standard convolution includes a convolution kernel with kernel size of K h × K w × C and kernel number of C, while the convolution kernel in depthwise convolution has a kernel size of K h × K w × 1 and kernel number of C. The size of the output feature map is H o × W o × C, which is the same for standard convolution and depthwise convolution.Ignore the bias of the convolution, the number of parameters for standard convolution is: The number of parameters for depthwise convolution is: The ratio of the number of parameters of depthwise convolution to standard convolution is: The multiplication and addition operations of standard convolution is: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.The multiplication and addition operations of depthwise convolution is:

TABLE II INFORMATION OF SUBJECTS IN SHORT-TERM EXPERIMENT
The ratio of multiplication and addition operations between depthwise convolution and standard convolution is: As Table I shows, C = 2, 4, 8, 16 in different stages.Thus, compared with standard convolution, depthwise convolution significantly reduces the number of parameters and the computational complexity.
4) Motivation for the Architecture of ConcatNet: In the previous paragraphs, the architecture of ConcatNet is introduced in detail.The motivation for the design of the architecture mainly comes from the characteristics of the pressure distribution images.Very different from natural images, the characteristics of pressure distribution images include: low resolution, relatively fixed image content, and small data quantity.In view of these characteristics, the design of the model needs to take into account the efficiency of feature extraction and utilization as well as the complexity and real-time performance of the model.

TABLE III POSTURE CLASSIFICATION ACCURACY OF EACH SUBJECT
In terms of feature extraction, considering the image characteristics, it is not suitable to use an overly deep network structure because overfitting is prone to occur.However, using a shallow network may also lead to insufficiency of the extracted features.Therefore, ConcatNet uses the Inception module to improve the model's ability to extract multi-scale features without increasing the depth of the network.Considering feature utilization, we refer to the idea of fusing deep and shallow features in the U-Net [28] model and add a deep and shallow feature fusion module to ConcatNet.The reason is that U-Net has a very good performance in extracting the features of medical images, which are similar to pressure distribution images in characteristics mentioned above.Finally, take the complexity and real-time performance of the model into consideration.On the one hand, the aforementioned avoidance of using too deep network has reduced the complexity of the model to a certain extent.On the other hand, we use deep convolution in ConcatNet to further reduce the complexity of the model, which also improves real-time performance.

III. RESULTS
A total of 16 subjects (9 males, 7 females) were included in the short-term experiment and 5 male subjects were included in the overnight experiment, as described in our previous work [20].The first part (Dataset A) is the short-term experiment data that consists of 1059 pressure distribution images and corresponding sleep posture labels.During the short-term experiment, all subjects were instructed to lie on the mat with 4 different sleep postures: supine, right, left, and prone.Each posture was held for 20 seconds, and the interval between the two postures was 15 seconds.The sampling rate is 0.1 Hz, and about 2 images are saved within 20 seconds.In order to simulate the real scenario, the subjects can move slightly during the experiment.Some subjects may move significantly during the experiment, then these samples will be discarded.Therefore, the number of samples for each subject is different.The information of each subject is given in Table II.The second part (Dataset B) comes from the overnight experiment conducted in the laboratory with whole nights of continuous sleep.5 male subjects participated in the overnight experiment for 7 nights.An infrared camera continuously recorded the ground truth postures.During the experiments, all the sleeping stages were covered.After excluding the time of turning over, 57 hours of sleeping data was recorded and 20521 pressure distribution images were collected in total.As the size of the sensor array is 32 × 32, the resolution of the pressure distribution images is also 32 × 32.

A. Cross-Validation on Short-Term Dataset
As mentioned above, in the short-term experiment, subjects were instructed to lie on the pressure sensing mat in different Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.sleep postures for data collection.The main purpose of collecting this part of data is to train and verify the sleep posture classification model.Since the short-term dataset contains only 1059 images of pressure distribution images, which is in a relatively small scale.In order to maximize the utilization of the data, Leave-One-Subject-Out cross-validation (LOSOCV) is adopted.The dataset contains 16 subjects, so the number of folds equals 16, in other words, the dataset is divided into 16 parts according to each subject, and a total of 16 rounds of training are performed.During each training round, all the data of one subject is selected as the verification set, and the data of the remaining 15 subjects is collectively taken as the training set.The data of each subject is used as a training set and a validation set in different training rounds, so the data is utilized to the maximum.After completing each round of training, the classification results of the model on the verification set are collected, and after all 16 rounds of training, the results are summarized.The performance of the network would be analyzed, and the model would be optimized for the testing according to the verification results.Finally, the relevant hyperparameters are set as follows: the number of epochs of training is 100, the batch size is 64, the optimizer is adaptive moment estimation (Adam), the learning rate is 0.001, the loss function is categorical cross-entropy.The platform is a Windows 10 PC using PyTorch 1.13 with CPU of INTEL i5-8400 and GPU of NVIDIA GTX1060.
Table III shows the sleep posture classification accuracy of 3 ConcatNet models in different sizes on each subject.For the data of 16 subjects, the models corresponding to different sensor densities all perform well on classifying the sleep postures, and the classification accuracy of the small-scale model (ConcatNet-S) ranges from 71.8% to 100%, with an average accuracy of 89.5%.The accuracy of medium-scale model (ConcatNet-M) ranges from 83.8% to 100%, and the average accuracy is 95.6%.The classification accuracy of the large-scale model (ConcatNet-L) ranges from 84.7% to 100%, with an average accuracy of 97.6%.It is not difficult to find that, with the increase of sensor density, classification accuracy in cross-validation also increases.
Table IV-VI show the posture classification results of ConcatNet-S, ConcatNet-M, and ConcatNet-L, respectively.In Table IV, as we can see clearly, ConcatNet-S performs better at distinguishing left and right sleep postures than supine and prone postures, which is mainly caused by the subtle difference between the images of supine and prone in low resolution.As shown in Fig. 8, the difference between supine and prone images in 8 × 8 resolution is less noticeable than the higher resolution ones.With images in higher resolution and details, ConcatNet-M and ConcatNet-L both get balanced performance on all 4 postures and achieve satisfying results, as shown in Table V and VI.

B. Testing on Overnight Dataset
In order to verify the performance of our model in real scenario, dataset A is utilized as the training set, and dataset B as the testing set.The hyperparameters selected in the cross-verification process are used to train the model.After the testing, the metrics of the model are calculated.
The metrics consist of two parts, which are, the accuracy metrics and the complexity metrics.The results are shown in Table VII.The size of the original images in the dataset is 32 × 32, and for some large models, we up sample the original images to the resolution of 160 × 160 to match the model structures as [20] proposed.
Benefiting from the simple design of the network structure and the utilization of depthwise convolution, the model size, computational complexity and inference time have been significantly reduced.The ConcatNet in 3 sizes (ConcatNet-S, ConcatNet-M, and ConcatNet-L) have achieved similar or even higher accuracy in sleep posture classification than other classical models in Table VII, both in short-term dataset cross-validation and in overnight dataset testing.With fewer parameters and less computational complexity, faster inference speed and higher classification accuracy are obtained, which shows the great advantages of our proposed model.Considering the reason, we believe that, for pressure distribution image, the shallow feature and deep feature are both crucial for distinguishing different sleep postures.The shallow feature mainly contains local information, such as edge, outline, texture, etc. which could be used to make a rough classification.And the deep feature contains the global semantic information.In ConcatNet, we have fused the shallow feature and the deep feature, while other classic models mainly use the deep feature, that would be the reason why ConcatNet can achieve higher accuracy with less computational cost.Among 3 sizes of ConcatNet, the model size, computational complexity and inference time are all increased with the increase of input size, or sensor density, which is in the expectation.In terms of accuracy, the cross-validation accuracy of short-term data is also improved with the increase of input size.But regarding the accuracy of the overnight dataset, ConcatNet-M > ConcatNet-L > ConcatNet-S.This indicates that the ConcatNet-L model may has got some overfitting.
In other words, the information contained in the input image with 32 × 32 resolution (high-density) is redundant, and the noise introduced by the higher resolution is also learned by the model.As a result, the cross-validation accuracy Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
of ConcatNet-L is higher than ConcatNet-M while the test set accuracy is lower.Considering both the computational consumption and the sleep posture classification accuracy, the sensor array in medium density with ConcatNet-M has the best comprehensive performance.The algorithm proposed in this article can be applied in the following aspects.First, the channel selection method in Section II-B can be applied to high-density array sensors.By adjusting the instructions of the acquisition system, interval sampling of sensors can be easily realized, which can reduce the sampling time of pressure distribution data and the complexity of data processing, while still achieving favorable results in tasks such as sleep classification.Second, as ConcatNet consumes very little storage space resources (number of parameters, model size) and computing resources (computational complexity), and has strong real-time performance, the model can be deployed to mobile devices or embedded devices instead of being limited to expensive highperformance devices.On the one hand, it can reduce the cost of the monitoring system and facilitate its promotion to home monitoring.On the other hand, it can realize local processing of data and achieve real-time monitoring.

IV. DISCUSSION
In this paper, a neural network called ConcatNet, with the characteristics of fast, efficient, and lightweight, is designed and applied to classify 4 sleep postures.The results prove that the model is capable to classify sleep postures in real time with different sensor densities of pressure sensing mat.

A. Comparison With the Existing Works
In comparison with the existing works, Table VIII presents the studies of sleep posture classification based on pressure sensing mat.In terms of hardware, as mentioned earlier, a smaller scaled mat was deployed compared with prior works, and we further explored the impact of the sensor density on sleep posture classification by down sampling the sensor channels to 16 × 16 and 8 × 8, which correspond to medium-density and low-density mat in a miniature scale, respectively.
In the design of algorithm, in some studies, such as [14], [15], [16], and [21], traditional machine learning methods have been adopted, while [18], [19], [20], and [22] have proposed deep learning methods.Compared with conventional machine learning algorithms, Convolutional Neural Networks (CNN) can eliminate manual feature selection and extraction while achieving better performance in computer vision domain.In studies using CNN, the Inception-v3 in [18] and ResNet18 in [20] are relatively large networks with a slow inference speed, which cannot implement sleep posture classification in real time.Hu et al. [19] deployed a simple CNN and achieved real-time classification.However, the accuracy is only 91.24% with transfer learning, which means, part of one subject's data was used for training, while the other part was used for testing, and this also means additional training is required before use in real world scenario.The Tiny-MobileNetV2 in [22] is a lightweight model, however, before the model inferencing, the pressure distribution images need to be processed by the FCSNet, which is a complex frequency channel selection operation with negative influence on realtime performance.In this paper, we proposed a lightweight CNN called ConcatNet.As can be seen from Table VII, compared with other models, ConcatNet has smaller model size, less computational operations and faster inference speed, while achieving high accuracy, which proves that the model can implement real-time sleep posture classification at a high accuracy rate.

B. Influence of Sensor Density on Sleep Posture Classification
Different from previous studies, in this paper, we explored the effect of sensor density on the performance of sleep posture classification by obtaining datasets of different sensor densities through down sampling channels of sensor array and designing a lightweight neural network.
Obviously, with a fixed size, the higher of the sensor density, the higher the resolution of the collected pressure distribution image and the greater the amount of information in it.Therefore, more complex models are needed for machine learning, and theoretically, the higher the accuracy of sleep posture classification should be.However, the results in Table VII are not completely consistent with what were expected.Among the ConcatNet in 3 sizes, with the increase of sensor density, for the model, the number of parameters, model size, computational complexity, and inference time all increase.However, as regard to the classification accuracy, although the accuracy of cross-validation of short-term data also gradually increases, the accuracy of test set increases first and then decreases.The most reasonable explanation is that in machine learning, we generally assume that the training set and the test set come from the same distribution, that is, the features of the training set data and the test set data are similar, so, if the features of the training set are learned and used to distinguish different categories of data in the test set, which will usually get favorable results.However, in practical application scenarios, the distributions of training set and test set are not exactly consistent.For the dataset used in this paper, factors such as different body shape and weight distribution of individuals in the training set and the test set may lead to differences in data distribution.As mentioned earlier, the higher the resolution of an image, the more information and detail it contains, this additional information is often where the difference in features lies.The model learns too many features of the training set that do not apply to the test set, resulting in overfitting, leading to a lower test set accuracy for ConcatNet-L than for ConcatNet-M.Of course, the above situation holds true when the image resolution has reached the stage to be able to extract enough features for classification.For example, the accuracy of ConcatNet-M is obviously better than that of ConcatNet-S, precisely because the information contained in the image with 8 × 8 resolution is not enough to extract enough features to achieve a favorable classification accuracy.
Regarding the overfitting problem of ConcatNet-L mentioned above, we have tried some methods to alleviate the issue, such as, using data augmentation, adding dropout in the full connection layer of the model, adding L2 regularization in the loss function.The following configuration is adopted, 1) data augmentation is performed on the training set, including random rotation from −20 • to 20 • and perspective transformation with a distortion scale of 0.3 and a probability of 0.5, 2) dropout with a probability of 0.3 is added to the full connection layer of the model, 3) L2 regularization with a weight of 0.3 is added to the loss function.With these strategies, the short-term cross-validation accuracy of ConcatNet-L dropped from 97.56% to 96.88%, the overnight test accuracy increased from 92.57% to 94.59%, which shows that overfitting problem has been mitigated.
Based on the above analysis, we can conclude that in this dataset, the ConcatNet model proposed in this paper can achieve terrific sleep posture classification result in real time with minimal resource consumption.In addition, for the pressure sensing mat system implemented in [20], the data collected by the 16×16 channel sensor array (medium density) is sufficient to classify sleep postures.Higher sensor density will not only increase the computational complexity, but also may have a negative impact on the classification accuracy.
In terms of applicable range, this study still has some limitations.For example, although a miniature scaled mat can achieve efficient and accurate sleep posture monitoring for a single person, it cannot be directly applied to scenario of double person, which requires a wider mat.In addition, the pressure distribution images of 8 × 8 channels are difficult to achieve a high classification accuracy, especially in distinguishing between supine and prone.Some possible research directions include the following points.First, only 3 different sensor densities are tested in this paper.In the future, the types of resolution could be more refined to further explore its impact on model performance.Second, the algorithm in this paper currently only considers the recognition of sleep posture based on pressure sensing mat.In the future, signals such as body movement and respiratory can also be added to study lightweight algorithms for sleep monitoring based on multi-modal signals.The influence of sensor density on multi-modal signals and optimal sensor selection for adaptation based on task type could also be explored.Third, although the proposed ConcatNet has achieved high generalizability in the sleep posture classification task of different subjects, for the fixed users, combining the model with transfer learning method can further improve the classification accuracy, which is also a research direction of the future.

V. CONCLUSION
In this paper, a new lightweight neural network model, named as ConcatNet, is proposed, which adopts modularized design and the model series are constructed in different sizes corresponding to different sensor densities.By using the pressure distribution image collected by the pressure sensor array as input, four categories of sleep posture are classified.Comparing the model size, number of parameters, computational complexity, and inference speed with classical models such as ResNet [29] and MobileNet, ConcatNet achieves the best performance.The short-term data cross-validation accuracy of ConcatNet-L with 32 × 32 input size reached 97.64%.The ConcatNet-M corresponding to input size of 16 × 16 achieved the accuracy of 94.68% in overnight data testing.In addition, this paper also explores the influence of the sensor density on the model performance, the results show that ConcatNet-M for medium-density mat has the best comprehensive performance, with the accuracy of short-term data at 95.56%, and the overnight data at 94.68%.The model size is 7.91KB, FLOPs is 56.47K, and the inference time is only 0.38ms, which have realized sleep posture classification in real time.We believe that the system consisting of a miniature-scale, low-density pressure sensing mat and ConcatNet, has great potential to be implemented on mobile devices to realize real-time sleep posture classification.In real-life scenarios, for example, in the monitoring scenario for long-term bedridden patients, realtime and long-term monitoring of the patient's sleep posture through the algorithm proposed in this paper can enable caregivers to know the sleep posture status of the patients and whether they need to help the patients turn over to reduce the risk of occurrence of diseases such as decubitus ulcers.In daily home health monitoring scenarios, by deploying the proposed algorithm to mobile devices or embedded devices, local real-time sleep posture monitoring can be achieved, which can help users understand their own sleep status.Moreover, the system could be improved by integrating other sensors, or connecting with other sleep monitoring devices to obtain more diverse information.Correspondingly, the algorithms could be enhanced to process the multi-modal information and provide more functions.For example, by combining the information of sleep quality analysis, recommendations about which sleep posture will be good to improve sleep quality on a personalized basis could be given in the future.
proposed to extract the image features under multiple receptive fields, while the multi-layer feature fusion module is utilized to fuse deep and shallow features to enhance the model performance.To further improve the efficiency of the model, the depthwise convolution is adopoted.ConcatNet models in 3 different scales (ConcatNet-S, ConcatNet-M, and ConcatNet-L) are built to explore the impact of the sensor density on sleep posture recognition performance.Experimental results exhibit that ConcatNet-M corresponding to medium sensor density (16 × 16) achieved the best comprehensive performance, with short-term data crossvalidation accuracy at 95.56% and overnight data testing accuracy at 94.68%.The model size is 7.91KB, FLOPs is 56.47K, and the inference time is only 0.38ms, which shows an outstanding performance of real-time sleep posture recognition with minimum consumption, indicating the potential to be deployed in mobile devices.Index Terms-Sleep posture recognition, real-time, lightweight model, sensor density.

1 )
Model Structure: The structure of the proposed model is shown in Fig. 3.The model consists of several basic modules that stacked together, which named as ConcatBlock.The ConcatBlock has two different types, ConcatBlock-A and ConcatBlock-B.Different numbers of Concatblock are stacked in the model for inputs of different sizes.For ConcatNet-L corresponding to image of 32 × 32, 4 Concat-Blocks (1 ConcatBlock-A, 3 ConcatBlock-B) are stacked, for ConcatNet-M corresponding to image of 16 × 16, 3 Concat-Blocks (1 ConcatBlock-A, 2 ConcatBlock-B) are stacked, and for ConcatNet-S corresponding to image of 8 × 8, 2 Concat-Blocks (1 ConcatBlock-A, 1 ConcatBlock-B) are stacked.The output feature maps of each ConcatBlock are globally average pooled and scaled to 1 × 1 in size respectively.After that, these feature maps are concatenated on the channel dimension, and then flattened and sent into the full connection layer for classifying into 4 categories of sleep postures.2) Basic Module -ConcatBlock: As mentioned earlier, the basic module of the model is the ConcatBlock in 2 types, ConcatBlock-A and ConcatBlock-B.The structure of ConcatBlock-A is shown in Fig. 4.

Fig. 1 .
Fig. 1.The algorithm flow of sleep posture classification using pressure distribution image.

Fig. 2 .
Fig. 2. The process of down sampling.The highlighted pixels in (a), (b), and (c) represent the sampled sensor, and (d), (e), and (f) are the corresponding images of supine posture.

Fig. 8 .
Fig. 8.The difference between pressure distribution images of supine and prone.(a)(b)(c) show a supine image collected by the pressure sensor array in high density, medium density, and low density, respectively, (d)(e)(f) show a prone image collected by the pressure sensor array in high density, medium density, and low density, respectively.
The accuracy metrics include the cross-validation accuracy on short-term dataset and the testing accuracy on overnight dataset, which indicate whether the model is accurate in recognizing sleep postures.While the complexity metrics consist of number of parameters, model size, Floating-point Operations (FLOPs), and inference time.Number of parameters refers to the amount of parameters in the model structure.Model size refers to the amount of space required by computers and other devices to store model parameters.Model size is positively related to number of parameters and the ratio between the two depends on the data type of the parameters.Number of parameters and model size can reflect the model's consumption of storage space resources.Excessive number of parameters and model size will limit its application in certain scenarios, such as the deployment of the model on embedded devices with small storage capacity.FLOPs are used to describe the overall calculation amount of the model, including the calculation amount generated by convolutions, poolings, activation functions, etc. in the model structure.Inference time refers to the time it takes for the model to process a single piece of input data.FLOPs and inference time reflect the model's consumption of computing resources.The longer the inference time, the worse the real-time performance of the model.For devices with poor computing performance, excessive FLOPs will seriously affect the speed of the model and have restrictions on scenarios with real-time requirements.

TABLE I THE
ARCHITECTURES OF CONCATNET IN 3 SIZES (K=KERNEL SIZE, C=CHANNEL, S=STRIDE, DW=DEPTHWISE CONVOLUTION, CAT=CONCATENATE, GAP=GLOBAL AVERAGE POLLING, AP=AVERAGE POLING, FC=FULL CONNECTION) concatenated in the channel dimension, so the size of the obtained feature map is H × W × (8 × C).The feature map is then sent into the second part of ConcatBlock, which is the feature down sampling module composed of depthwise convolution, Batch Normalization (BN) and Rectified Linear Unit (ReLU) activation function.The size of the convolution kernel is 3 × 3 and the step length is 2, so the final output feature map size of the module is

TABLE VII METRICS
OF DIFFERENT MODELS ON THE SHORT-TERM AND OVERNIGHT DATASETTABLE VIII COMPARISON OF THE PROPOSED METHOD WITH EXISTING METHODS(M: NUMBER OF SUBJECTS; N: NUMBER OF SAMPLES; N/A: NOT MENTIONED)