Evaluation of Five Classifiers for Children Activity Recognition with Sound as Information Source and Akaike Criterion for Feature Selection

García-Domínguez, Antonio; Zanella-Calzada, Laura A.; Galván-Tejada, Carlos E.; Galván-Tejada, Jorge I.; Celaya-Padilla, José M.

doi:10.1007/978-3-030-21077-9_37

Evaluation of Five Classifiers for Children Activity Recognition with Sound as Information Source and Akaike Criterion for Feature Selection

Antonio García-Domínguez¹⁸,
Laura A. Zanella-Calzada¹⁸,
Carlos E. Galván-Tejada¹⁸,
Jorge I. Galván-Tejada¹⁸ &
…
José M. Celaya-Padilla¹⁸

Conference paper
First Online: 18 May 2019

1371 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11524))

Abstract

The recognition and classification of children activities is a subject of novel interest in which different works have been presented, where the data source to perform this classification is crucial to define the way of working. This work uses environmental sound as data source to perform the activities recognition and classification, evaluating the accuracy of the k-Nearest Neighbor (kNN), Support Vector Machines (SVM), Random Forests (RF), Extra Trees (ET) and Gradient Boosting (GB) algorithms in the generation of a recognition and classification model. In the first stage of experimentation, the complete set of features extracted from the audio samples is used to generate classification models. Then, a feature selection process is performed based on the Akaike criteria to obtain a reduced set of features used as input for a second generation of these models. Finally, a comparison of the results obtained from the data of models of both approaches is carried out in terms of accuracy to determine if the model contained by the features selected improves the performance in the classification of children activities.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

The human activity recognition and classification is a subject of recent interest, on which many works have been presented and proposed different applications [1,2,3,4,5], in order to facilitate the daily life of human beings by promoting an automated interaction with their environment.

An important aspect to be considered in human activity recognition is the data source to be used. As a data source, the use of different types of sensors has been proposed, as in the work presented by Arnon [6]. In recent years, in the topic of recognition and classification of children activities, has been proposed the use of different data sources, such as video cameras, accelerometers, and radio-frequency devices, as in the work presented by Kurashima et al. [7]. Most of the works described in this area collect information for analysis by embedding the sensor directly into a child’s garment to record activity data as is proposed by Nam et al. [8]. This way of data capture has the disadvantage that the devices or sensors used, which are placed in the garments, can interfere directly with the natural action of the children, not allowing them to perform normally the activities to be analyzed.

One way to solve the problem mentioned above is to change the data source to one that does not interfere with the activities to be performed by the study subjects. Under this idea, environmental sound has been used as data source to recognize and classify human activities, as in the works presented by Leeuwen [9] and Galván-Tejada et al. [10], since data capture passes inadvertently to the study group, thus not interfering with the activities to be analyzed.

Environmental sound as data source in child activity classification models is a major challenge due to the complexity of the audio signal analysis process as well as due to the different environmental factors that may interfere during the data capture process, causing that the samples taken do not have the necessary features for their analysis. Therefore an adequate data processing (audio samples) and the choice of an appropriate model that optimizes the process of recognition of the activities becomes of vital importance.

For the correct audio signals processing it is necessary to perform a feature extraction on which the classification model will be based. Given these features, it is possible, with a set of training examples (samples), to label the classes (type of sound to which the samples belong), construct and train a model that predicts the class of a new sample. Once the classification model is constructed, it is possible to perform the process of recognizing an activity through an audio signal, passing the signal through the model so that it can predict which kind of sound it belongs to, based on the information with which the model was trained. In the present work, the accuracy of 5 classification algorithms, Support Vector Machines (SVM), k-Nearest Neighbors (kNN), Random Forests (RF), Extra Trees (ET) and Gradient Boosting (GB) is compared, in the generation of a model of recognition and classification of children activities using environmental sound as a data source.

The activity classification models are constructed by executing the classification algorithms with the data obtained from the audio samples in the feature extraction stage. In the first phase of the methodology proposed, these models are built using the 34 extracted features present in the dataset. Nevertheless, in order to develop a more efficient classification model that can be used in mobile applications, a feature selection is performed to reduce the number of features. Therefore, in this proposal Akaike criterion is applied to re-generate the models with a reduced set of features, finally comparing the results obtained in terms of accuracy.

This paper is organized as follows, in the present section is presented an introduction to children activity recognition. Materials and methods are described in Sect. 2. Section 3 reports the results obtained from the methodology. The discussion and conclusions of this proposal are described in Sect. 4. Finally, future work is reported in Sect. 5.

2 Materials and Methods

To compare the efficiency of the classification algorithms SVM, kNN, ET, RF and GB in the generation of a model of recognition and classification of children activities using environmental sound data, and a feature selection process using the Akaike criterion, 5 main stages were performed: data acquisition, feature extraction, classification analysis based on the complete set of features, feature selection, classification analysis based on the set of selected features.

The feature extraction was performed using the programming language, Python [11], while the feature selection and the classification analysis were performed using the free software environment, R [12].

2.1 Data Description

In the majority of the works presented about children activity recognition, is common to analyze detectable activities through movement, such walking or running, because these works use motion sensors like accelerometer as data source, as in the works presented by Boughorbel et al. [13] and Nam et al. [8]. In order to analize different kind of activities, in the present work, the dataset is composed of recordings from four activities commonly performed by children from 12 to 36 months in a residential environment: crying, running, walking and playing (manipulating plastic objects), two of which are not detectable through motion sensors (crying and playing). For the conformation of the dataset, 10% of the sounds was generated and 90% was acquired from the Internet [14, 15] through a search of audio clips about children activities carried out on October 3, 2018.

Table 1 shows the description of the activities analyzed in this work.

Table 1. General description of activities.

Full size table

Recording Devices. To make the recordings of the audio clips corresponding to the part of the generated data, the devices used were a Lanix Ilium s620 (MediaTek MT6582 quad-core, Android 4.2.2) and a Motorola Moto G4 (Snapdragon 617, Android 6.0.1).

Metadata. From the process of recording the audio clips using different devices and different configurations, as well as considering the recordings taken from the Internet, the dataset of this work includes audio clips with a sample rate between 44100 Hz and 96000 Hz, in stereo and mono channels. Table 2 shows the metadata of the audio clips in the dataset for each activity. The features presented in Table 2 ensure an acceptable quality for recorded audio files, and they define the parameters required for future recordings in order to expand the dataset.

Table 2. Audio clips metadata per activity.

Full size table

2.2 Feature Extraction

The feature extraction is the process by which information is obtained from audio clips. This information is used to differentiate the type of activity to which the recording belongs, since for each type of activity, the audio clips contains different measurements for their extracted features.

Because the dataset contains audio files of different lengths, these were divided into 10-s clips, causing that all the analyzed samples have the same length. Each 10-s clip is transformed into an array, where each position represents the magnitude of the corresponding feature for that audio clip. Table 3 shows the set of 34 features extracted for each audio 10-s clip. To prevent problems with the difference in the channels of the recordings (Mono and Stereo), all the samples were converted to the Mono type.

Table 3. Features extracted.

Full size table

It is important to mention that this set of features was chosen because they have been commonly used in related works of audio processing [16,17,18,19], especially the mel-frequency spectral coefficients, being one of the most robust features in the area of recognition and classification of activities using sound [20,21,22,23,24].

2.3 Classification Analysis Based on the Features Extracted

For the classification analysis, the 34 extracted features were subjected to five classification algorithms, SVM, kNN, RF, ET and GB, generating five children activities classifications models, one for each algorithm used.

The classification algorithms used in this work are supervised learning algorithms, being necessary to be previously trained with known data, using a training dataset (70% of the total samples), to later be able to classify new data automatically, based on a blind test using a testing dataset (30% of the remaining total samples).

Finally, each classification model was evaluated obtaining its accuracy, to be compared with each other.

2.4 Feature Selection

In this stage, a feature selection process based on the Akaike criterion (AIC) [25, 26] is performed to reduce the number of features, selecting those that present the most significant information to differentiate the classes to which the audio samples belong.

The principle of this technique is based on the generation of models constructing all the possible combinations of the 34 features extracted through a stepwise regression, a forward selection and a backward elimination, calculating subsequently the AIC for each of these models. Then, the models are ranked according to their AIC, being the best of them the one with the lowest AIC [27].

2.5 Classification Analysis Based on the Features Selected

The classification analysis based on the features selected was carried out only with the set of features that belong to the combination of those with the lowest AIC, since they are the ones that best describe the difference between the analyzed classes.

Finally, as well as in the classification analysis based on the total extracted features, a validation to compare the accuracy of each model is performed in order to evaluate which classification approach presents the most significant results, the classification based on the total number of features extracted or the classification based on the selected features.

3 Results

From the data acquisition, a total of 146 recordings were obtained (considering both the own recordings and those taken from the Internet), which were divided into 2,716 10-s clips. Table 4 shows the number of recordings obtained for each activity as well as the number of 10-s clips generated.

Table 4. Audio clips per activity.

Full size table

A total of 34 features were extracted for each 10-s clip, so the database for the comparison of the classifying algorithms was contained by 2,716 records with 34 features each one.

Then, from the classification analysis based on the total set of features extracted, in Table 5 are shown the true positives obtained from each classification technique and Table 6 summarizes the accuracy by activity. Table 7 shows the average accuracies of each technique considering the whole set of activities analyzed.

All classifiers achieve an accuracy equal or greater than 0.90.

Table 5. True positives for each classification technique based on the features extracted.

Full size table

Table 6. Accuracy for each classification technique based on the features extracted.

Full size table

Table 7. Average accuracy for the features extracted.

Full size table

Table 8. Features selected.

Full size table

In the feature selection stage, a set of 27 features was selected, shown in Table 8.

From the classification analysis based on the features selected, the true positives obtained are shown in Table 9, while Table 10 summarizes the accuracy by activity. Table 11 shows the average accuracies for each classification technique.

All classifiers achieve an accuracy between 0.89 and 0.97.

4 Discussion and Conclusions

The objective of this research is to compare the efficiency of five classification techniques based on the generation of a recognition and classification model of children activities using environmental sound data, comparing the classification accuracy of a specific set of extracted features and a reduced set of selected features through an AIC approach.

Table 9. True positives for each classification technique based on the features selected.

Full size table

Table 10. Accuracy for each classification technique based on the features selected.

Full size table

Table 11. Average accuracy for the features selected.

Full size table

From the results presented in Sect. 3, it can be observed that, initially, for the four analyzed activities, on average the best model is the one generated by the ET classification technique, followed by GB, RF, kNN and SVM, respectively, all with an accuracy equal or greater than 0.90. These five initial models were generated using the 34 extracted features of the audio samples, which represents 100% of the data.

In the next phase the process feature selection was performed, selecting a set of 27 according to the AIC, which represents the reduction of 20% of the data used for the development of the classification models.

For the generation of the recognition and classification models of activities using the reduced dataset, the results show a practically equal behavior in the accuracy of the classifying techniques, besides that the RF and ET techniques presented an improvement in their accuracy values.

According to this results, the set of 27 selected features classify activities with a similar performance as the complete dataset contained by 34 features, managing to reduce the amount of data needed by 20% and practically maintaining or improving the accuracy of the models.

The reduction in the number of features is important because when the classification techniques are subjected to large amounts of information, the response time is usually long in a significant way, increasing the computational cost, in addition to the fact that the recognition and classification of activities models are usually designed to be implemented in mobile applications, so it is important to optimize the amount of data with which the user will work and reduce the cost of processing.

5 Future Work

As part of the future work, it is proposed to add to the analysis more common activities in children with the established age range, as well as perform a validation analysis of the dataset to establish if the number of features and samples is optimal for the type of study that is being done.

Another important aspect is to improve the process of feature selection, finding a mechanism that further reduces the set of features needed to describe the phenomena or activities analyzed, and thus, reducing the size of the database with which the algorithms work and the models are generated.

References

Masuda, A., Zhang, K., Maekawa, T.: Sonic home: environmental sound collection game for human activity recognition. J. Inf. Process. 24(2), 203–210 (2016)
Google Scholar
Shaikh, M.A.M., Hirose, K., Ishizuk, M.: Recognition of real-world activities from environmental sound cues to create life-log. In: The Systemic Dimension of Globalization, January 2011
Google Scholar
Aggarwal, J., Xia, L.: Human activity recognition from 3D data: a review. Pattern Recogn. Lett. 48, 70–80 (2014)
Article Google Scholar
Chen, B., Fan, Z., Cao, F.: Activity recognition based on streaming sensor data for assisted living in smart homes. In: 2015 International Conference on Intelligent Environments (2015)
Google Scholar
Vrigkas, M., Nikou, C., Kakadiaris, I.A.: A review of human activity recognition methods. Front. Robot. AI 2, 28 (2015)
Article Google Scholar
Arnon, P.: Classification model for multi-sensor data fusion apply for human activity recognition. In: 2014 International Conference on Computer, Communications, and Control Technology (I4CT) (2014)
Google Scholar
Kurashima, S., Suzuki, S.: Improvement of activity recognition for child growth monitoring system at kindergarten. In: IECON 2015–41st Annual Conference of the IEEE Industrial Electronics Society (2015)
Google Scholar
Nam, Y., Park, J.W.: Child activity recognition based on cooperative fusion model of a triaxial accelerometer and a barometric pressure sensor. IEEE J. Biomed. Health Inform. 17(2), 420–426 (2013)
Article Google Scholar
Salomons, E., Havinga, P., Leeuwen, H.V.: Inferring human activity recognition with ambient sound on wireless sensor nodes. Sensors 16(10), 1586 (2016)
Article Google Scholar
Delgado-Contreras, J.R., Garćıa-Vázquez, J.P., Brena, R.F., Galván-Tejada, C.E., Galván-Tejada, J.I.: Feature selection for place classification through environmental sounds. Procedia Comput. Sci. 37, 40–47 (2014)
Article Google Scholar
Python.org. https://www.python.org/
The R project for statistical computing. https://www.r-project.org/
Boughorbel, S., Breebaart, J., Bruekers, F., Flinsenberg, I., Kate, W.T.: Child-activity recognition from multi-sensor data. In: Proceedings of the 7th International Conference on Methods and Techniques in Behavioral Research - MB 10 (2010)
Google Scholar
Freesound. https://freesound.org/
Youtube. https://www.youtube.com/
Scheirer, E.D.: Tempo and beat analysis of acoustic musical signals. J. Acoust. Soc. Am. 103(1), 588–601 (1998)
Article Google Scholar
Wang, H., Divakaran, A., Vetro, A., Chang, S.-F., Sun, H.: Survey of compressed-domain features used in audio-visual indexing and analysis. J. Vis. Commun. Image Represent. 14(2), 150–183 (2003)
Article Google Scholar
Zhang, T., Kuo, C.-C.: IEEE Trans. Speech Audio Process. 9(4), 441–457 (2001)
Article Google Scholar
Verhaegh, W.F.J., Aarts, E.H.L., Korst, J.: Algorithms in Ambient Intelligence. Kluwer Academic, Dordrecht (2004)
Book Google Scholar
Stork, J.A., Spinello, L., Silva, J., Arras, K.O.: Audio-based human activity recognition using non-Markovian ensemble voting. In: 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication (2012)
Google Scholar
Galván-Tejada, C.E., et al.: An analysis of audio features to develop a human activity recognition model using genetic algorithms, random forests, and neural networks. Mob. Inf. Syst. 2016, 1–10 (2016)
Google Scholar
Markakis, M.: Selection of relevant features for audio classification tasks (2011)
Google Scholar
Salamon, J., Bello, J.P.: Unsupervised feature learning for urban sound classification. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015)
Google Scholar
Mascia, M., Canclini, A., Antonacci, F., Tagliasacchi, M., Sarti, A., Tubaro, S.: Forensic and anti-forensic analysis of indoor/outdoor classifiers based on acoustic clues. In: 2015 23rd European Signal Processing Conference (EUSIPCO) (2015)
Google Scholar
Akaike, H.: A new look at the statistical model identification. In: Parzen, E., Tanabe, K., Kitagawa, G. (eds.) Selected Papers of Hirotugu Akaike. Springer Series in Statistics, pp. 215–222. Springer, New York (1974). https://doi.org/10.1007/978-1-4612-1694-0_16
Chapter Google Scholar
Snipes, M., Taylor, D.C.: Model selection and akaike information criteria: an example from wine ratings and prices. Wine Econ. Policy 3(1), 3–9 (2014)
Article Google Scholar
Akaike, H.: Akaike’s information criterion. In: International Encyclopedia of Statistical Science, p. 25 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juarez 147, Centro, 98000, Zacatecas, Zac, Mexico
Antonio García-Domínguez, Laura A. Zanella-Calzada, Carlos E. Galván-Tejada, Jorge I. Galván-Tejada & José M. Celaya-Padilla

Authors

Antonio García-Domínguez
View author publications
You can also search for this author in PubMed Google Scholar
Laura A. Zanella-Calzada
View author publications
You can also search for this author in PubMed Google Scholar
Carlos E. Galván-Tejada
View author publications
You can also search for this author in PubMed Google Scholar
Jorge I. Galván-Tejada
View author publications
You can also search for this author in PubMed Google Scholar
José M. Celaya-Padilla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Antonio García-Domínguez , Laura A. Zanella-Calzada , Carlos E. Galván-Tejada or Jorge I. Galván-Tejada .

Editor information

Editors and Affiliations

National Institute of Astrophysics, Optics and Electronics, Puebla, Mexico
Jesús Ariel Carrasco-Ochoa
National Institute of Astrophysics, Optics and Electronics, Puebla, Mexico
José Francisco Martínez-Trinidad
Autonomous University of Puebla , Puebla, Mexico
José Arturo Olvera-López
National Polytechnic Institute of Mexico , Querétaro, Mexico
Joaquín Salas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

García-Domínguez, A., Zanella-Calzada, L.A., Galván-Tejada, C.E., Galván-Tejada, J.I., Celaya-Padilla, J.M. (2019). Evaluation of Five Classifiers for Children Activity Recognition with Sound as Information Source and Akaike Criterion for Feature Selection. In: Carrasco-Ochoa, J., Martínez-Trinidad, J., Olvera-López, J., Salas, J. (eds) Pattern Recognition. MCPR 2019. Lecture Notes in Computer Science(), vol 11524. Springer, Cham. https://doi.org/10.1007/978-3-030-21077-9_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-21077-9_37
Published: 18 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21076-2
Online ISBN: 978-3-030-21077-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Abstract

1 Introduction

2 Materials and Methods

2.1 Data Description

2.2 Feature Extraction

2.3 Classification Analysis Based on the Features Extracted

2.4 Feature Selection

2.5 Classification Analysis Based on the Features Selected

3 Results

4 Discussion and Conclusions

5 Future Work

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation