Skip to main content
Log in

Qualitative spatial logic descriptors from 3D indoor scenes to generate explanations in natural language

  • Research Report
  • Published:
Cognitive Processing Aims and scope Submit manuscript

Abstract

The challenge of describing 3D real scenes is tackled in this paper using qualitative spatial descriptors. A key point to study is which qualitative descriptors to use and how these qualitative descriptors must be organized to produce a suitable cognitive explanation. In order to find answers, a survey test was carried out with human participants which openly described a scene containing some pieces of furniture. The data obtained in this survey are analysed, and taking this into account, the QSn3D computational approach was developed which uses a XBox 360 Kinect to obtain 3D data from a real indoor scene. Object features are computed on these 3D data to identify objects in indoor scenes. The object orientation is computed, and qualitative spatial relations between the objects are extracted. These qualitative spatial relations are the input to a grammar which applies saliency rules obtained from the survey study and generates cognitive natural language descriptions of scenes. Moreover, these qualitative descriptors can be expressed as first-order logical facts in Prolog for further reasoning. Finally, a validation study is carried out to test whether the descriptions provided by QSn3D approach are human readable. The obtained results show that their acceptability is higher than 82%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. Trade and company names are included for benefit of the reader and imply no endorsement or preferential treatment of the product by the authors.

  2. For a cross-disciplinary taxonomy of reference frames see the work by Pederson (2003).

  3. JARCA workshop: http://madeirasic.us.es/jarca16/?lang=en.

  4. In general, one could apply different classification algorithms as well. In particular, zero-shot learning (e.g. Ji et al. 2017; Socher et al. 2013) might prove as a useful improvement to the current implementation, as these methods do not require a training phase. This allows to more easily add new objects to the system.

  5. http://www.ros.org.

  6. http://www.openni.org.

  7. http://www.pointclouds.org.

  8. https://www.prolific.ac/.

References

  • Barclay M, Galton A (2013) Selection of reference objects for locative expressions: the importance of knowledge and perception. In: Tenbrink T, Wiener J, Claramunt C (eds) Representing space in cognition: interrelations of behavior, language, and formal models, explorations in language and space. Oxford University Press, Oxford, pp 57–169. doi:10.1093/acprof:oso/9780199679911.003.0005

    Chapter  Google Scholar 

  • Bo L, Lai K, Ren X, Fox D (2011a) Object recognition with hierarchical kernel descriptors. In: Proceedings of computer vision and pattern recognition

  • Bo L, Ren X, Fox D (2011b) Depth kernel descriptors for object recognition. In: 2011 IEEE/RSJ international conference on intelligent robots and systems, IROS 2011, San Francisco, CA, September 25–30, IEEE, pp 821–826

  • Carlson LA, Regier T, Lopez W, Corrigan B (2006) Attention unites form and function in spatial language. Spat Cogn Comput 6(4):295–308

    Article  Google Scholar 

  • Carlson LA, Skubic M, Miller J, Huo Z, Alexenko T (2014) Strategies for human-driven robot comprehension of spatial descriptions by older adults in a robot fetch task. Topics Cogn Sci 6(3):513–533. doi:10.1111/tops.12101

    Article  Google Scholar 

  • Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27, http://www.csie.ntu.edu.tw/~cjlin/libsvm

  • Clark HH (1996) Using language. Cambride University Press, Cambridge

    Book  Google Scholar 

  • Du H, Henry P, Ren X, Cheng M, Goldman D, Seitz SM, Fox D (2011) Interactive 3D modeling of indoor environments with a consumer depth camera. In: Proceedings of 13th international conference on ubiquitous computing, ACM, New York, NY, UbiComp’11, pp 75–84

  • Falomir Z (2013) Towards cognitive image interpretation qualitative descriptors, domain knowledge and narrative generation. In: Gibert K, Reig-Balao VBR (eds) Artificial intelligence research and development, frontiers in artificial intelligence and applications. IOS Press, Amsterdam, pp 45–57

    Google Scholar 

  • Falomir Z (2015) A qualitative model for reasoning about 3D objects using depth and different perspectives. In: Lechowski T, Walega P, Zawidzki M (eds) LQMR 2015 workshop, PTI, annals of computer science and information systems, vol 7, pp 3–11, doi:10.15439/2015F370

  • Falomir Z, Rahman S (2015) From qualitative descriptors of movement towards spatial logics for videos. In: Proceedings of 3rd workshop on recognition and action for scene understanding (REACTS), co-located at 16th international conference of computer analysis of images and patterns (CAIP), Valleta, Malta, pp 119–128

  • Falomir Z, Castelló V, Escrig MT, Peris JC (2011a) Fuzzy distance sensor data integration and interpretation. Int J Uncertainty Fuzziness Knowl Based Syst IJUFKS 19(3):499–528. doi:10.1142/S0218488511007106

    Article  Google Scholar 

  • Falomir Z, Jiménez-Ruiz E, Escrig MT, Museros L (2011b) Describing images using qualitative models and description logics. Spatial Cogn Comput 11(1):45–74. doi:10.1080/13875868.2010.545611

    Article  Google Scholar 

  • Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395

    Article  Google Scholar 

  • Genesereth MR, Nislsson NJ (1987) Logical foundations of artificial intelligence. Morgan Kaufmann Publishers, Burlington

    Google Scholar 

  • Henry P, Krainin M, Herbst E, Ren X, Fox D (2010) RGBD mapping: using depth cameras for dense 3D modeling of indoor environments. In: RGB-D: advanced reasoning with depth cameras workshop in conjunction with RSS

  • Herbst E, Henry P, Ren X, Fox D (2011a) Toward object discovery and modeling via 3-D scene comparison. In: ICRA, IEEE, pp 2623–2629

  • Herbst E, Ren X, Fox D (2011b) RGB-D object discovery via multi-scene analysis. In: 2011 IEEE/RSJ international conference on intelligent robots and systems, IROS 2011, San Francisco, CA, September 25–30, IEEE, pp 4850–4856

  • Hernández D, Clementini E, Di Felice P (1995) Qualitative distances. In: Frank AU, Kuhn W (eds) Spatial information theory—a theoretical basis for GIS (COSIT’95). Springer, Berlin, pp 45–57

    Chapter  Google Scholar 

  • Huo Z, Skubic M (2016) Natural spatial description generation for human–robot interaction in indoor environments. In: 2016 IEEE international conference on smart computing (SMARTCOMP), pp 1–3, doi:10.1109/SMARTCOMP.2016.7501708

  • Ji Z, Yu Y, Pang Y, Chen L, Zhang Z (2017) Zero-shot learning with multi-battery factor analysis. Signal Process 138:265–272. doi:10.1016/j.sigpro.2017.03.023

    Article  Google Scholar 

  • Kluth T, Schultheis H (2014) Attentional distribution and spatial language. In: Freksa C, Nebel B, Hegarty M, Barkowsky T (eds) Spatial cognition IX, vol 8684. Lecture notes in computer science. Springer, Berlin, pp 76–91. doi:10.1007/978-3-319-11215-2_6

    Chapter  Google Scholar 

  • Kluth T, Burigo M, Knoeferle P (2017a) Modeling the directionality of attention during spatial language comprehension. In: Herik JVD, Filipe J (eds) Agents and artificial intelligence, vol 10162. Lecture notes in computer science. Springer, Berlin, pp 283–301

    Chapter  Google Scholar 

  • Kluth T, Burigo M, Schultheis H, Knoeferle P (2017b) Does direction matter? linguistic asymmetries reflected in visual attention. Cognition (to appear)

  • Krainin M, Henry P, Ren X, Fox D (2011) Manipulator and object tracking for in-hand 3D object modeling. Int J Robot Res 30(11):1311–1327. doi:10.1177/0278364911403178

    Article  Google Scholar 

  • Lai K, Bo L, Ren X, DFox (2011a) Sparse distance learning for object recognition combining RGB and depth information. In: IEEE international conference on robotics and automation

  • Lai K, Bo L, Ren X, Fox D (2011b) A scalable tree-based approach for joint object and pose recognition. In: Twenty-fifth conference on artificial intelligence (AAAI)

  • Landau B (2016) Update on what and where in spatial language: a new division of labor for spatial terms. Cogn Sci. doi:10.1111/cogs.12410

    Article  PubMed  Google Scholar 

  • Levinson S (2003) Space in language and cognition: explorations in cognitive diversity. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Lison P (2010) Robust processing of spoken situated dialogue. Diplomica Verlag, Hamburg

    Google Scholar 

  • Lloyd JW (1987) Foundations of logic programming. Symbolic computation: artificial intelligence, 2nd edn. Springer, Berlin

    Book  Google Scholar 

  • Marton ZC, Pangercic D, Rusu RB, Holzbach A, Beetz M (2010) Hierarchical object geometric categorization and appearance classification for mobile manipulation. In: 2010 10th IEEE-RAS international conference on humanoid robots (humanoids), IEEE, pp 365–370

  • Mast V, Falomir Z, Wolter D (2016) Probabilistic reference and grounding with PRAGR for dialogues with robots. J Exper Theor Artif Intell 28(5):889–911. doi:10.1080/0952813X.2016.1154611

    Article  Google Scholar 

  • Moratz R, Tenbrink T (2006) Spatial reference in linguistic human–robot interaction: iterative, empirically supported development of a model of projective relations. Spatial Cogn Comput 6(1):63–106

    Article  Google Scholar 

  • Moratz R, Tenbrink T (2008) Affordance-based human–robot interaction. In: Proceedings of the 2006 international conference on towards affordance-based robot control. Springer, Berlin, pp 63–76

  • Museros L, Falomir Z, Sanz I, Gonzalez-Abril L (2014) Sketch retrieval based on qualitative shape similarity matching: towards a tool for teaching geometry to children. AI Commun 28(1):73–86. doi:10.3233/AIC-140614

    Article  Google Scholar 

  • Olszewska JI (2015a) 3D spatial reasoning using the clock model. In: Bramer M, Petridis M (eds) Research and development in intelligent systems XXXII: incorporating applications and innovations in intelligent systems XXIII. Springer, Cham, pp 147–154. doi:10.1007/978-3-319-25032-8_10

    Chapter  Google Scholar 

  • Olszewska JI (2015b) Where is my cup?—fully automatic detection and recognition of textureless objects in real-world images. In: Azzopardi G, Petkov N (eds) Computer analysis of images and patterns: 16th International Conference, CAIP 2015, Valletta, Malta, September 2–4, 2015 Proceedings, Part I, Springer, pp 501–512. doi:10.1007/978-3-319-23192-1_42

  • Olszewska JI (2016) Interest-point-based landmark computation for agents’ spatial description coordination. In: van den Herik HJ, Filipe J (eds) Proceedings of the 8th international conference on agents and artificial intelligence (ICAART 2016), Vol 2, Rome, February 24–26, SciTePress, pp 566–569, doi:10.5220/0005847705660569

  • Oppenheimer DM, Meyvis T, Davidenko N (2009) Instructional manipulation checks: detecting satisficing to increase statistical power. J Exper Soc Psychol 45(4):867–872. doi:10.1016/j.jesp.2009.03.009

    Article  Google Scholar 

  • Pederson E (2003) How many reference frames? In: Freksa C, Brauer W, Habel C, Wender KF (eds) Spatial cognition III: routes and navigation, human memory and learning, spatial representation and spatial learning. Springer, Berlin, pp 287–304. doi:10.1007/3-540-45004-1_17

    Chapter  Google Scholar 

  • Regier T, Carlson LA (2001) Grounding spatial language in perception: an empirical and computational investigation. J Exper Psychol Gen 130(2):273–298. doi:10.1037//0096-3445.130.2.273

    Article  CAS  Google Scholar 

  • Ruiz-Sarmiento JR (2016) Probabilistic techniques in semantic mapping for mobile robotics. Ph.D. thesis, Department of Systems Engineering and Automatics, University of Malaga, Malaga

  • Ruiz-Sarmiento JR, Galindo C, González-Jiménez J (2015) Olt: A toolkit for object labeling applied to robotic RGB-D datasets. In: European conference on mobile robots

  • Rusu RB, Bradski G, Thibaux R, Hsu J (2010) Fast 3D recognition and pose using the viewpoint feature histogram. In: Proceedings of the 23rd IEEE/RSJ international conference on intelligent robots and systems (IROS), Taipei, Taiwan

  • Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: Proceedings of 2011 IEEE conference on computer vision and pattern recognition. IEEE computer society, Washington, DC, CVPR ’11, pp 1297–1304

  • Skubic M, Blisard S, Bailey C, Adams J, Matsakis P (2004) Qualitative analysis of sketched route maps: translating a sketch into linguistic descriptions. IEEE Trans Syst Man Cyber B Cybern 34(2):1275–1282

    Article  Google Scholar 

  • Socher R, Ganjoo M, Manning CD, Ng A (2013) Zero-shot learning through cross-modal transfer. In: Advances in neural information processing systems, pp 935–943

  • Steels L (2015) The talking heads experiment: origins of words and meanings. Computational models of language evolution. Language Science Press. doi:10.17169/langsci.b49.75 http://langsci-press.org/catalog/book/49

  • Steinhauer HJ (2005) A qualitative model for natural language communication about vehicle traffic. In: AAAI spring symposium: reasoning with mental and external diagrams: computational modeling and spatial assistance, AAAI, pp 52–57

  • Tenbrink T, Fischer K, Moratz R (2002) Spatial strategies in linguistic human–robot communication. In: Freksa C (ed) KI-Themenheft 4/02 spatial cognition. arenDTaP Verlag, Bremen, pp 19–23

    Google Scholar 

  • Tenbrink T, Maiseyenka V, Moratz R (2007) Spatial reference in simulated human–robot interaction involving intrinsically oriented objects. In: Symposium spatial reasoning and communication at AISB’07 artificial and ambient intelligence, vol 7

  • Tenbrink T, Coventry KR, Andonova E (2011) Spatial strategies in the description of complex configurations. Discourse Process 48(4):237–266

    Article  Google Scholar 

  • Tenorth M, Beetz M (2013) Knowrob: a knowledge processing infrastructure for cognition-enabled robots. Int J Robot Res 32(5):566–590. doi:10.1177/0278364913481635

    Article  Google Scholar 

  • Waibel M, Beetz M, Civera J, D’Andrea R, Elfring J, Galvez-Lopez D, Haussermann K, Janssen R, Montiel J, Perzylo A, Schiessle B, Tenorth M, Zweigle O, van de Molengraft R (2011) Roboearth. IEEE Robot Autom Mag 18(2):69–82. doi:10.1109/MRA.2011.941632

    Article  Google Scholar 

  • Zhang X, quan Li Q, xiang Fang Z, wei Lu S, lung Shaw S (2014) An assessment method for landmark recognition time in real scenes. J Environ Psychol 40:206–217. doi:10.1016/j.jenvp.2014.06.008

    Article  Google Scholar 

Download references

Acknowledgements

This work was conducted on the scope of the project Cognitive Qualitative Descriptions and Applications (CogQDA: https://sites.google.com/site/cogqda/) (CogQDA) funded by the Central Research Development Fund (CRDF) at Universität Bremen through the 04-Independent Projects for Postdocs action. The authors also thank Niels Eicke, Susanne Knoop, Bengt Kohrt, Nico Lehmann and Mareike Picklum for helping with the implementation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zoe Falomir.

Additional information

Handling editor: Antonio Bandera (University of Malaga); Reviewers: Andrea Torsello (Ca’ Foscari University Venice), Ricardo Vázquez Martín (University of Malaga), Rebeca Marfil (University of Malaga).

This article is part of the Special Section on ‘Cognitive Robotics’ guest-edited by Antonio Bandera, Jorge Dias, and Luis Manso.

Appendix

Appendix

More results obtained by QSn3D (narratives and logics) are shown in Tables 6, 7, 8 and 9.

Table 6 QSn3D narratives and logics obtained in the home scenario using 2 pieces of furniture: 1 oriented and 1 non-oriented
Table 7 QSn3D narratives and logics obtained in the home scenario using 3 pieces of furniture: 1 oriented and 2 non-oriented
Table 8 QSn3D narratives and logics obtained in the office scenario using 2 pieces of furniture: 1 oriented and 1 non-oriented
Table 9 QSn3D narratives and logics obtained in the office scenario using 3 pieces of furniture: 2 oriented and 1 non-oriented

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Falomir, Z., Kluth, T. Qualitative spatial logic descriptors from 3D indoor scenes to generate explanations in natural language. Cogn Process 19, 265–284 (2018). https://doi.org/10.1007/s10339-017-0824-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10339-017-0824-7

Keywords

Navigation