Abstract
Schema-flexible NoSQL databases are increasingly popular backends in the agile application development as they allow developers to write code assuming a new database schema that is different from the current one. If the application is in production already, non-functional requirements for application performance and cost efficiency are routinely part of service-level agreements (SLAs). Co-evolving the schema with the application code then requires subtle management decisions regarding the migration of variational legacy data that is persisted in the production database. Eventually, project managers have to deal with the repercussions of schema evolution in order to comply with SLAs, especially if stipulated metrics compete with each other in tradeoffs. To this end, we present a NoSQL Schema Migration Advisor that supports the schema migration management in NoSQL databases in two distinct ways: If the migration situation can be elicited, a heuristic is offered to estimate the impact of schema evolution by means of choosing a migration strategy and pace code releases accordingly. If this information is not sufficiently or not readily available, self-adaptive schema migration strategies are presented that can automatically curate variational data such that competing metrics can be balanced out in order to comply with SLAs, if possible, making management interventions superfluous.
This work has been funded by the German Research Foundation (project grant #385808805). We thank Jan-Christopher Mair, Kai Pehns, Tobias Kreiter, Shamil Nabiyev, and Maksym Levchenko from Darmstadt University of Applied Sciences for their contributions to MigCast and Darwin.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The MigCast pricing model is specified at USD 0.2 per 1M I/O-Requests and is based on Amazon DocumentDB (AWS) for US-East. It can be viewed at https://aws.amazon.com/en/documentdb/pricing/, visited on February 2, 2022.
- 2.
- 3.
The distribution of the served workload of entity accesses and the distribution and kinds of SMOs are randomized in MigCast within the given bounds as specified, in this case a Pareto-distributed workload of medium intensity and a high multi-type ratio of SMOs. The cost model is chosen as described on page 3. For further details of the implementation setup be referred to [17].
- 4.
Despite the relatively small amounts in our example of an original database instance of 10m entities and just 12 schema changes affecting parts of the database, costs can easily amount to many thousands of USD, increasing exponentially due to many influencing factors [17].
- 5.
The limit values for the complexity-adaptive strategy are in the depicted migration scenarios equivalent to the predictive strategy, because its advantage can only be played out at a higher share of multi-type SMOs and a lesser, Pareto-distributed query workload.
- 6.
The increase is slightly exponentially for a data growth rate of \(10\%\), which increases the number of entities by a constant amount per release; see bottom left column of Table 3.
- 7.
This amount can be considered an upper limit, because the migration costs can be assumed to grow exponentially, such that in the Monte Carlo experiments not the assumed \(50\%\) but \(40\%\) need to be spent at release 6 for lazy due to the Pareto distribution of the entity accesses.
- 8.
The migration can either be done in offline batch processing, or in a blue-green deployment [22], or during a phase of low query workload, then causing higher latency intermittently.
References
3T Software Labs Ltd.: MongoDB Trends Report. Cambridge, U.K. (2020)
Aulbach, S., Jacobs, D., Kemper, A., Seibold, M.: A comparison of flexible schemas for software as a service. In: Proceedings of SIGMOD 2009. ACM (2009)
Barker, S., Chi, Y., Moon, H.J., Hacigümüş, H., Shenoy, P.: “Cut me some slack” latency-aware live migration for databases. In: Proceedings of EDBT’12 (2012)
Bertino, E., Guerrini, G., Mesiti, M., Tosetto, L.: Evolving a set of DTDs according to a dynamic set of XML documents. In: Proceedings of EDBT’02 Workshops (2002)
Cleve, A., Gobert, M., Meurice, L., Maes, J., Weber, J.: Understanding database schema evolution. Sci. Comput. Programm. 97(P1), January 2015
Conrad, A., Gärtner, S., Störl, U.: Towards automated schema optimization. In: ER Demos and Posters. Proceedings of CEUR Workshop, vol. 2958 (2021)
Curino, C., et al.: Relational cloud: a DbaaS for the cloud. In: Proceedings of CIDR (2011)
Curino, C., Moon, H.J., Deutsch, A., Zaniolo, C.: Automating the database schema evolution process. VLDB J. 22(1), 73–98 (2013)
Curino, C., Moon, H.J., Tanca, L., Zaniolo, C.: Schema evolution in Wikipedia - toward a web information system benchmark. In: Proceedings of ICEIS 2008 (2008)
Difallah, D.E., Pavlo, A., Curino, C., Cudre-Mauroux, P.: OLTP-bench: an extensible testbed for benchmarking relational databases. Proc. VLDB E 7(4), 277–288 (2013)
Ellison, M., Calinescu, R., Paige, R.F.: Evaluating cloud database migration options using workload models. J. Cloud Comput. 7(1), 1–18 (2018). https://doi.org/10.1186/s13677-018-0108-5
Fahmideh, M., Daneshgar, F., Beydoun, G., Rabhi, F.A.: Challenges in migrating legacy software systems to the cloud. CoRR abs/2004.10724 (2020)
Filho, E.R.L., de Almeida, E.C., Scherzinger, S., Herodotou, H.: Investigating automatic parameter tuning for SQL-on-hadoop systems. Big Data Res. 25 (2021)
Guerrini, G., Mesiti, M., Rossi, D.: Impact of XML schema evolution on valid documents. In: Proceedings of WIDM’05 Workshop. ACM (2005)
Herrmann, K., Voigt, H., Behrend, A., Rausch, J., Lehner, W.: Living in parallel realities: co-existing schema versions. In: Proceedings of SIGMOD (2017)
Hillenbrand, A., Levchenko, M., Störl, U., Scherzinger, S., Klettke, M.: MigCast: Putting a price tag on data model evol. in NoSQL D. S. In: Proceedings of SIGMOD (2019)
Hillenbrand, A., Scherzinger, S., Störl, U.: Remaining in control of the impact of schema evolution in NoSQL databases. In: Proceedings of ER 2021 (2021)
Hillenbrand, A., Störl, U.: Automated curation of variational data in NoSQL databases through metric-driven self-adaptive migration strategies. In: Proceedings of MODELSWARD 2022. SCITEPRESS (2022)
Hillenbrand, A., Störl, U., Levchenko, M., Nabiyev, S., Klettke, M.: Towards self-adapting data migration in the context of schema evolution in NoSQL databases. In: Proceedings of ICDE 2020 Workshops. IEEE (2020)
Hillenbrand, A., Störl, U., Nabiyev, S., Klettke, M.: Self-adapting data migration in the context of schema evolution in NoSQL databases. Distrib. Parallel Databases 40(1), 5–25 (2021). https://doi.org/10.1007/s10619-021-07334-1
Hillenbrand, A., Störl, U., Nabiyev, S., Scherzinger, S.: MigCast in Monte Carlo: the impact of data model evolution in NoSQL databases. CoRR (2021)
Kim, G., Debois, P., Willis, J., Humble, J.: The DevOps Handbook. IT Revolution Press (2016)
Klettke, M., Störl, U., Shenavai, M., Scherzinger, S.: NoSQL schema evolution and big data migration at scale. In: Proceedings of SCDM 2016. IEEE (2016)
Klímek, J., Malý, J., Necaský, M., Holubová, I.: eXolutio: methodology for design and evolution of XML schemas using conceptual mod. Informatica 26(3), 271 (2015)
Levandoski, J.J., Larson, P., Stoica, R.: Identifying hot and cold data in main-memory databases. In: Proceedings of ICDE 2013. IEEE (2013)
Meurice, L., Cleve, A.: Supporting schema evolution in schema-less NoSQL data stores. In: Proceedings of SANER 2017 (2017)
Mior, M.J., Salem, K., Aboulnaga, A., Liu, R.: NoSE: schema design for NoSQL applications. IEEE Trans. Knowl. Data Eng. 29, 2275–2289 (2017)
Preuveneers, D., Joosen, W.: Automated configuration of NoSQL performance and scalability tactics for data-intensive applications. Informatics 7, 29 (2020)
Qiu, D., Li, B., Su, Z.: An empirical analysis of the co-evolution of schema and code in database applications. In: Proceedings of SIGSOFT 2013. ACM (2013)
Rijsbergen, C.J.V.: Inf. Retrieval. Butterworth-Heinemann, USA (1979)
Saur, K., Dumitras, T., Hicks, M.W.: Evolving NoSQL databases without downtime. In: Proceedings of ICSME 2016. IEEE (2016)
Scherzinger, S., Klettke, M., Störl, U.: Managing schema evolution in NoSQL data stores. In: Proceedings of DBPL 2013 (2013)
Scherzinger, S., Sidortschuck, S.: An empirical study on the design and evolution of NoSQL database schemas. In: Dobbie, G., Frank, U., Kappel, G., Liddle, S.W., Mayr, H.C. (eds.) ER 2020. LNCS, vol. 12400, pp. 441–455. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62522-1_33
Skoulis, I., Vassiliadis, P., Zarras, A.: Growing up with stability: how open-source relational databases evolve. Inf. Syst. 53 (2015)
Störl, U., et al.: Curating variational data in appl. dev. In: Proceedings of ICDE 2018 (2018)
Suárez-Otero, P., Mior, M.J., José Suárez-Cabal, M., Tuya, J.: Maintaining NoSQL database quality during conceptual model evolution. In: IEEE International Conference on Big Data (Big Data) (2020)
Tsoumakos, D., Konstantinou, I., Boumpouka, C., Sioutas, S., Koziris, N.: Automated, elastic resource provisioning for NoSQL clusters using TIRAMOLA. In: CCGrid 2013. IEEE (2013)
Upton, G., Cook, I.: The Oxford Dictionary of Statistics. Oxford University Press, United Kingdom (2002)
Vassiliadis, P.: Profiles of schema evolution in free open source software projects. In: Proceedings of ICDE 2021. IEEE (2021)
Vassiliadis, P., Zarras, A., Skoulis, I.: Gravitating to rigidity: patterns of schema evolution-and its absence-in the lives of tables. Inf. Syst. 63 (2016)
Zilio, D.C., et al.: DB2 design advisor. In: Proceedings of VLDB (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Hillenbrand, A., Störl, U. (2023). Managing Schema Migration in NoSQL Databases: Advisor Heuristics vs. Self-adaptive Schema Migration Strategies. In: Pires, L.F., Hammoudi, S., Seidewitz, E. (eds) Model-Driven Engineering and Software Development. MODELSWARD MODELSWARD 2021 2022. Communications in Computer and Information Science, vol 1708. Springer, Cham. https://doi.org/10.1007/978-3-031-38821-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-38821-7_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-38820-0
Online ISBN: 978-3-031-38821-7
eBook Packages: Computer ScienceComputer Science (R0)