Skip to main content

Multi-module Natural Language Search Engine for Travel Offers

  • Conference paper
  • First Online:
Advances in Computational Collective Intelligence (ICCCI 2022)

Abstract

In this work, we present an advanced semantic search engine dedicated to travel offers, allowing the user to create queries in the Natural Language. We started with the Polish language in focus. Search for e-commerce requires a different set of methods and algorithms than search for travel, search for corporate documents, for law documents, for medicine, etc. In travel, the complexity of data is bigger than in other domains, and the search process requires more parameters. In e-commerce, one product has 1 price, while in travel, one product (holiday package) has got tens of thousands of prices depending on time, board type, room type, number of people, children’s age, etc. Providing a search for one middle-size tour operator, we need to search within hundreds of millions documents.

We present a set of methods based on natural language processing to improve the search for travel. We also present our new application for annotating travel offers, prepared in a human-in-the-loop paradigm that enables iterative system improvement. We also show a large dataset containing more than 3,000 manually constructed queries and more than 23,000 manually annotated answers, a large fraction by at least two independent experts, and a semi-automatically constructed ontology of tourism terms in OWL format containing nearly 2,000 concept classes.

This work was supported by the European Regional Development Fund as a part of the 2014-2020 Smart Growth Operational Programme: (1) Intelligent travel search system based on natural language understanding algorithms, project no. POIR.01.01.01-00-0798/19; (2) CLARIN - Common Language Resources and Technology Infrastructure, project no. POIR.04.02.00-00C002/19.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/UKPLab/sentence-transformers/.

References

  1. Akbik, A., Bergmann, T., Blythe, D., Rasul, K., Schweter, S., Vollgraf, R.: Flair: an easy-to-use framework for state-of-the-art NLP. In: NAACL 2019, 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pp. 54–59 (2019)

    Google Scholar 

  2. Chandrasekaran, R., Pathak, H.N., Yano, T.: Deep neural query understanding system at eXpedia group. In: 2020 IEEE International Conference on Big Data (Big Data), pp. 1476–1484 (2020). https://doi.org/10.1109/BigData50022.2020.9378495

  3. Dadas, S., Perełkiewicz, M., Poświata, R.: Pre-training Polish transformer-based language models at scale. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds.) ICAISC 2020. LNCS (LNAI), vol. 12416, pp. 301–314. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61534-5_27

    Chapter  Google Scholar 

  4. expedia: Millenial travel report (2016)

    Google Scholar 

  5. Eyefortravel: Understanding the traveler consumer’s path to purchase (2017)

    Google Scholar 

  6. Feng, F., Yang, Y., Cer, D., Arivazhagan, N., Wang, W.: Language-agnostic BERT sentence embedding. CoRR abs/2007.01852 (2020). https://arxiv.org/abs/2007.01852

  7. Henderson, M., et al.: Efficient natural language response suggestion for smart reply. arXiv preprint arXiv:1705.00652 (2017)

  8. Honnibal, M., Montani, I.: spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing (2017, to appear)

    Google Scholar 

  9. Inc., W.: U.S. travel trends report (2019)

    Google Scholar 

  10. Kocoń, J., Bernaś, T., Oleksy, M.: Recognition and normalisation of temporal expressions using conditional random fields and cascade of partial rules. Poznan Stud. Contemp. Linguist. 55(2), 271–303 (2019)

    Article  Google Scholar 

  11. Marcinczuk, M.: Lemmatization of multi-word common noun phrases and named entities in polish. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, Varna, Bulgaria, 2–8 September 2017, pp. 483–491 (2017)

    Google Scholar 

  12. Marcińczuk, M., Radom, J.: A single-run recognition of nested named entities with transformers. Procedia Comput. Sci. 192, 291–297 (2021)

    Article  Google Scholar 

  13. Meng, Y., Shen, J., Zhang, C., Han, J.: Weakly-supervised neural text classification. CoRR abs/1809.01478 (2018). http://arxiv.org/abs/1809.01478

  14. Meng, Y., et al.: Text classification using label names only: a language model self-training approach (2020)

    Google Scholar 

  15. Oleksy, M., et al.: Polish Corpus of Wrocław University of Technology 1.3 (2019)

    Google Scholar 

  16. Papineni, K.: Why inverse document frequency? In: Second Meeting of the North American Chapter of the Association for Computational Linguistics (2001). https://aclanthology.org/N01-1004

  17. Phocuswright: The perfect path: what travellers want -and don’t want - in their digital journey (2017)

    Google Scholar 

  18. Piasecki, M., Walentynowicz, W.: Morphodita-based tagger adapted to the polish language technology. In: Proceedings of Human Language Technologies as a Challenge for Computer Science and Linguistics, pp. 377–381 (2017)

    Google Scholar 

  19. Przepiórkowski, A., Bańko, M., Górski, R.L., Lewandowska-Tomaszczyk, B., Łaziński, M., Pęzik, P.: National corpus of Polish. In: Proceedings of the 5th language & technology conference: human language technologies as a challenge for computer science and linguistics, pp. 259–263. Fundacja Uniwersytetu im. Adama Mickiewicza Poznań (2011)

    Google Scholar 

  20. Radziszewski, A., Wardyński, A., Śniatowski, T.: WCCL: a morpho-syntactic feature toolkit. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS (LNAI), vol. 6836, pp. 434–441. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23538-2_55

    Chapter  Google Scholar 

  21. Reimers, N., Gurevych, I.: Sentence-Bert: sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992 (2019)

    Google Scholar 

  22. Shen, J., Qiu, W., Meng, Y., Shang, J., Ren, X., Han, J.: TaxoClass: hierarchical multi-label text classification using only class names. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4239–4249. Association for Computational Linguistics, Online, June 2021. https://doi.org/10.18653/v1/2021.naacl-main.335, https://aclanthology.org/2021.naacl-main.335

  23. Solutions, E.M.: The traveler’s path to purchase (2016)

    Google Scholar 

  24. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. CoRR abs/1512.00567 (2015). http://arxiv.org/abs/1512.00567

  25. Wang, K., Reimers, N., Gurevych, I.: Tsdae: using transformer-based sequential denoising auto-encoderfor unsupervised sentence embedding learning. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 671–688 (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karol Gawron .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gawron, K. et al. (2022). Multi-module Natural Language Search Engine for Travel Offers. In: Bădică, C., Treur, J., Benslimane, D., Hnatkowska, B., Krótkiewicz, M. (eds) Advances in Computational Collective Intelligence. ICCCI 2022. Communications in Computer and Information Science, vol 1653. Springer, Cham. https://doi.org/10.1007/978-3-031-16210-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16210-7_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16209-1

  • Online ISBN: 978-3-031-16210-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics