Abstract
The challenge of describing 3D real scenes is tackled in this paper using qualitative spatial descriptors. A key point to study is which qualitative descriptors to use and how these qualitative descriptors must be organized to produce a suitable cognitive explanation. In order to find answers, a survey test was carried out with human participants which openly described a scene containing some pieces of furniture. The data obtained in this survey are analysed, and taking this into account, the QSn3D computational approach was developed which uses a XBox 360 Kinect to obtain 3D data from a real indoor scene. Object features are computed on these 3D data to identify objects in indoor scenes. The object orientation is computed, and qualitative spatial relations between the objects are extracted. These qualitative spatial relations are the input to a grammar which applies saliency rules obtained from the survey study and generates cognitive natural language descriptions of scenes. Moreover, these qualitative descriptors can be expressed as first-order logical facts in Prolog for further reasoning. Finally, a validation study is carried out to test whether the descriptions provided by QSn3D approach are human readable. The obtained results show that their acceptability is higher than 82%.
Similar content being viewed by others
Notes
Trade and company names are included for benefit of the reader and imply no endorsement or preferential treatment of the product by the authors.
For a cross-disciplinary taxonomy of reference frames see the work by Pederson (2003).
JARCA workshop: http://madeirasic.us.es/jarca16/?lang=en.
In general, one could apply different classification algorithms as well. In particular, zero-shot learning (e.g. Ji et al. 2017; Socher et al. 2013) might prove as a useful improvement to the current implementation, as these methods do not require a training phase. This allows to more easily add new objects to the system.
References
Barclay M, Galton A (2013) Selection of reference objects for locative expressions: the importance of knowledge and perception. In: Tenbrink T, Wiener J, Claramunt C (eds) Representing space in cognition: interrelations of behavior, language, and formal models, explorations in language and space. Oxford University Press, Oxford, pp 57–169. doi:10.1093/acprof:oso/9780199679911.003.0005
Bo L, Lai K, Ren X, Fox D (2011a) Object recognition with hierarchical kernel descriptors. In: Proceedings of computer vision and pattern recognition
Bo L, Ren X, Fox D (2011b) Depth kernel descriptors for object recognition. In: 2011 IEEE/RSJ international conference on intelligent robots and systems, IROS 2011, San Francisco, CA, September 25–30, IEEE, pp 821–826
Carlson LA, Regier T, Lopez W, Corrigan B (2006) Attention unites form and function in spatial language. Spat Cogn Comput 6(4):295–308
Carlson LA, Skubic M, Miller J, Huo Z, Alexenko T (2014) Strategies for human-driven robot comprehension of spatial descriptions by older adults in a robot fetch task. Topics Cogn Sci 6(3):513–533. doi:10.1111/tops.12101
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27, http://www.csie.ntu.edu.tw/~cjlin/libsvm
Clark HH (1996) Using language. Cambride University Press, Cambridge
Du H, Henry P, Ren X, Cheng M, Goldman D, Seitz SM, Fox D (2011) Interactive 3D modeling of indoor environments with a consumer depth camera. In: Proceedings of 13th international conference on ubiquitous computing, ACM, New York, NY, UbiComp’11, pp 75–84
Falomir Z (2013) Towards cognitive image interpretation qualitative descriptors, domain knowledge and narrative generation. In: Gibert K, Reig-Balao VBR (eds) Artificial intelligence research and development, frontiers in artificial intelligence and applications. IOS Press, Amsterdam, pp 45–57
Falomir Z (2015) A qualitative model for reasoning about 3D objects using depth and different perspectives. In: Lechowski T, Walega P, Zawidzki M (eds) LQMR 2015 workshop, PTI, annals of computer science and information systems, vol 7, pp 3–11, doi:10.15439/2015F370
Falomir Z, Rahman S (2015) From qualitative descriptors of movement towards spatial logics for videos. In: Proceedings of 3rd workshop on recognition and action for scene understanding (REACTS), co-located at 16th international conference of computer analysis of images and patterns (CAIP), Valleta, Malta, pp 119–128
Falomir Z, Castelló V, Escrig MT, Peris JC (2011a) Fuzzy distance sensor data integration and interpretation. Int J Uncertainty Fuzziness Knowl Based Syst IJUFKS 19(3):499–528. doi:10.1142/S0218488511007106
Falomir Z, Jiménez-Ruiz E, Escrig MT, Museros L (2011b) Describing images using qualitative models and description logics. Spatial Cogn Comput 11(1):45–74. doi:10.1080/13875868.2010.545611
Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395
Genesereth MR, Nislsson NJ (1987) Logical foundations of artificial intelligence. Morgan Kaufmann Publishers, Burlington
Henry P, Krainin M, Herbst E, Ren X, Fox D (2010) RGBD mapping: using depth cameras for dense 3D modeling of indoor environments. In: RGB-D: advanced reasoning with depth cameras workshop in conjunction with RSS
Herbst E, Henry P, Ren X, Fox D (2011a) Toward object discovery and modeling via 3-D scene comparison. In: ICRA, IEEE, pp 2623–2629
Herbst E, Ren X, Fox D (2011b) RGB-D object discovery via multi-scene analysis. In: 2011 IEEE/RSJ international conference on intelligent robots and systems, IROS 2011, San Francisco, CA, September 25–30, IEEE, pp 4850–4856
Hernández D, Clementini E, Di Felice P (1995) Qualitative distances. In: Frank AU, Kuhn W (eds) Spatial information theory—a theoretical basis for GIS (COSIT’95). Springer, Berlin, pp 45–57
Huo Z, Skubic M (2016) Natural spatial description generation for human–robot interaction in indoor environments. In: 2016 IEEE international conference on smart computing (SMARTCOMP), pp 1–3, doi:10.1109/SMARTCOMP.2016.7501708
Ji Z, Yu Y, Pang Y, Chen L, Zhang Z (2017) Zero-shot learning with multi-battery factor analysis. Signal Process 138:265–272. doi:10.1016/j.sigpro.2017.03.023
Kluth T, Schultheis H (2014) Attentional distribution and spatial language. In: Freksa C, Nebel B, Hegarty M, Barkowsky T (eds) Spatial cognition IX, vol 8684. Lecture notes in computer science. Springer, Berlin, pp 76–91. doi:10.1007/978-3-319-11215-2_6
Kluth T, Burigo M, Knoeferle P (2017a) Modeling the directionality of attention during spatial language comprehension. In: Herik JVD, Filipe J (eds) Agents and artificial intelligence, vol 10162. Lecture notes in computer science. Springer, Berlin, pp 283–301
Kluth T, Burigo M, Schultheis H, Knoeferle P (2017b) Does direction matter? linguistic asymmetries reflected in visual attention. Cognition (to appear)
Krainin M, Henry P, Ren X, Fox D (2011) Manipulator and object tracking for in-hand 3D object modeling. Int J Robot Res 30(11):1311–1327. doi:10.1177/0278364911403178
Lai K, Bo L, Ren X, DFox (2011a) Sparse distance learning for object recognition combining RGB and depth information. In: IEEE international conference on robotics and automation
Lai K, Bo L, Ren X, Fox D (2011b) A scalable tree-based approach for joint object and pose recognition. In: Twenty-fifth conference on artificial intelligence (AAAI)
Landau B (2016) Update on what and where in spatial language: a new division of labor for spatial terms. Cogn Sci. doi:10.1111/cogs.12410
Levinson S (2003) Space in language and cognition: explorations in cognitive diversity. Cambridge University Press, Cambridge
Lison P (2010) Robust processing of spoken situated dialogue. Diplomica Verlag, Hamburg
Lloyd JW (1987) Foundations of logic programming. Symbolic computation: artificial intelligence, 2nd edn. Springer, Berlin
Marton ZC, Pangercic D, Rusu RB, Holzbach A, Beetz M (2010) Hierarchical object geometric categorization and appearance classification for mobile manipulation. In: 2010 10th IEEE-RAS international conference on humanoid robots (humanoids), IEEE, pp 365–370
Mast V, Falomir Z, Wolter D (2016) Probabilistic reference and grounding with PRAGR for dialogues with robots. J Exper Theor Artif Intell 28(5):889–911. doi:10.1080/0952813X.2016.1154611
Moratz R, Tenbrink T (2006) Spatial reference in linguistic human–robot interaction: iterative, empirically supported development of a model of projective relations. Spatial Cogn Comput 6(1):63–106
Moratz R, Tenbrink T (2008) Affordance-based human–robot interaction. In: Proceedings of the 2006 international conference on towards affordance-based robot control. Springer, Berlin, pp 63–76
Museros L, Falomir Z, Sanz I, Gonzalez-Abril L (2014) Sketch retrieval based on qualitative shape similarity matching: towards a tool for teaching geometry to children. AI Commun 28(1):73–86. doi:10.3233/AIC-140614
Olszewska JI (2015a) 3D spatial reasoning using the clock model. In: Bramer M, Petridis M (eds) Research and development in intelligent systems XXXII: incorporating applications and innovations in intelligent systems XXIII. Springer, Cham, pp 147–154. doi:10.1007/978-3-319-25032-8_10
Olszewska JI (2015b) Where is my cup?—fully automatic detection and recognition of textureless objects in real-world images. In: Azzopardi G, Petkov N (eds) Computer analysis of images and patterns: 16th International Conference, CAIP 2015, Valletta, Malta, September 2–4, 2015 Proceedings, Part I, Springer, pp 501–512. doi:10.1007/978-3-319-23192-1_42
Olszewska JI (2016) Interest-point-based landmark computation for agents’ spatial description coordination. In: van den Herik HJ, Filipe J (eds) Proceedings of the 8th international conference on agents and artificial intelligence (ICAART 2016), Vol 2, Rome, February 24–26, SciTePress, pp 566–569, doi:10.5220/0005847705660569
Oppenheimer DM, Meyvis T, Davidenko N (2009) Instructional manipulation checks: detecting satisficing to increase statistical power. J Exper Soc Psychol 45(4):867–872. doi:10.1016/j.jesp.2009.03.009
Pederson E (2003) How many reference frames? In: Freksa C, Brauer W, Habel C, Wender KF (eds) Spatial cognition III: routes and navigation, human memory and learning, spatial representation and spatial learning. Springer, Berlin, pp 287–304. doi:10.1007/3-540-45004-1_17
Regier T, Carlson LA (2001) Grounding spatial language in perception: an empirical and computational investigation. J Exper Psychol Gen 130(2):273–298. doi:10.1037//0096-3445.130.2.273
Ruiz-Sarmiento JR (2016) Probabilistic techniques in semantic mapping for mobile robotics. Ph.D. thesis, Department of Systems Engineering and Automatics, University of Malaga, Malaga
Ruiz-Sarmiento JR, Galindo C, González-Jiménez J (2015) Olt: A toolkit for object labeling applied to robotic RGB-D datasets. In: European conference on mobile robots
Rusu RB, Bradski G, Thibaux R, Hsu J (2010) Fast 3D recognition and pose using the viewpoint feature histogram. In: Proceedings of the 23rd IEEE/RSJ international conference on intelligent robots and systems (IROS), Taipei, Taiwan
Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: Proceedings of 2011 IEEE conference on computer vision and pattern recognition. IEEE computer society, Washington, DC, CVPR ’11, pp 1297–1304
Skubic M, Blisard S, Bailey C, Adams J, Matsakis P (2004) Qualitative analysis of sketched route maps: translating a sketch into linguistic descriptions. IEEE Trans Syst Man Cyber B Cybern 34(2):1275–1282
Socher R, Ganjoo M, Manning CD, Ng A (2013) Zero-shot learning through cross-modal transfer. In: Advances in neural information processing systems, pp 935–943
Steels L (2015) The talking heads experiment: origins of words and meanings. Computational models of language evolution. Language Science Press. doi:10.17169/langsci.b49.75 http://langsci-press.org/catalog/book/49
Steinhauer HJ (2005) A qualitative model for natural language communication about vehicle traffic. In: AAAI spring symposium: reasoning with mental and external diagrams: computational modeling and spatial assistance, AAAI, pp 52–57
Tenbrink T, Fischer K, Moratz R (2002) Spatial strategies in linguistic human–robot communication. In: Freksa C (ed) KI-Themenheft 4/02 spatial cognition. arenDTaP Verlag, Bremen, pp 19–23
Tenbrink T, Maiseyenka V, Moratz R (2007) Spatial reference in simulated human–robot interaction involving intrinsically oriented objects. In: Symposium spatial reasoning and communication at AISB’07 artificial and ambient intelligence, vol 7
Tenbrink T, Coventry KR, Andonova E (2011) Spatial strategies in the description of complex configurations. Discourse Process 48(4):237–266
Tenorth M, Beetz M (2013) Knowrob: a knowledge processing infrastructure for cognition-enabled robots. Int J Robot Res 32(5):566–590. doi:10.1177/0278364913481635
Waibel M, Beetz M, Civera J, D’Andrea R, Elfring J, Galvez-Lopez D, Haussermann K, Janssen R, Montiel J, Perzylo A, Schiessle B, Tenorth M, Zweigle O, van de Molengraft R (2011) Roboearth. IEEE Robot Autom Mag 18(2):69–82. doi:10.1109/MRA.2011.941632
Zhang X, quan Li Q, xiang Fang Z, wei Lu S, lung Shaw S (2014) An assessment method for landmark recognition time in real scenes. J Environ Psychol 40:206–217. doi:10.1016/j.jenvp.2014.06.008
Acknowledgements
This work was conducted on the scope of the project Cognitive Qualitative Descriptions and Applications (CogQDA: https://sites.google.com/site/cogqda/) (CogQDA) funded by the Central Research Development Fund (CRDF) at Universität Bremen through the 04-Independent Projects for Postdocs action. The authors also thank Niels Eicke, Susanne Knoop, Bengt Kohrt, Nico Lehmann and Mareike Picklum for helping with the implementation.
Author information
Authors and Affiliations
Corresponding author
Additional information
Handling editor: Antonio Bandera (University of Malaga); Reviewers: Andrea Torsello (Ca’ Foscari University Venice), Ricardo Vázquez Martín (University of Malaga), Rebeca Marfil (University of Malaga).
This article is part of the Special Section on ‘Cognitive Robotics’ guest-edited by Antonio Bandera, Jorge Dias, and Luis Manso.
Rights and permissions
About this article
Cite this article
Falomir, Z., Kluth, T. Qualitative spatial logic descriptors from 3D indoor scenes to generate explanations in natural language. Cogn Process 19, 265–284 (2018). https://doi.org/10.1007/s10339-017-0824-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10339-017-0824-7