Abstract
Determining the index of Simon’s congruence is a long outstanding open problem. Two words u and v are called Simon congruent if they have the same set of scattered factors (also known as subwords or subsequences), which are parts of the word in the correct order but not necessarily consecutive, e.g., \(\mathtt {oath}\) is a scattered factor of \(\mathtt {logarithm}\) but \(\mathtt {tail}\) is not. Following the idea of scattered factor k-universality (also known as k-richness), we investigate nearly k-universality, i.e., words where exactly one scattered factor of length k is absent. We present a full characterisation as well as the index of the congruence in this special case and the shortlex normal form for each such class. Moreover, we extend the definition to m-nearly k-universality (exactly m scattered factors of length k are absent), show some results for \(m>1\), and give a full combinatorial characterisation of m-nearly k-universal words which are additionally \((k-1)\)-universal.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baik, J., Deift, P., Johansson, K.: On the distribution of the length of the longest increasing subsequence of random permutations. J. Am. Math. Soc. 12(4), 1119–1178 (1999)
Barker, L., Fleischmann, P., Harwardt, K., Manea, F., Nowotka, D.: Scattered factor-universality of words. In: Jonoska, N., Savchuk, D. (eds.) DLT 2020. LNCS, vol. 12086, pp. 14–28. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-48516-0_2
Bergroth, L., Hakonen, H., Raita, T.: A survey of longest common subsequence algorithms. In: SPIRE, pp. 39–48. IEEE (2000)
Blumer, A., Blumer, J., Haussler, D., Ehrenfeucht, A., Chen, M.T., Seiferas, J.: The smallest automation recognizing the subwords of a text. Theor. Comp. Sci. 40, 31–55 (1985)
Day, J., Fleischmann, P., Kosche, M., Koß, T., Manea, F., Siemer, S.: The edit distance to k-subsequence universality. In: STACS, vol. 187, pp. 25:1–25:19 (2021)
Do, D., Le, T., Le, N.: Using deep neural networks and biological subwords to detect protein s-sulfenylation sites. Brief. Bioinform. 22(3) (2021)
Dress, A., Erdős, P.: Reconstructing words from subwords in linear time. Ann. Combinatorics 8(4), 457–462 (2005)
Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. ACM Sigmod Rec. 23(2), 419–429 (1994)
Fleischer, L., Kufleitner, M.: Testing Simon’s congruence. In: Proceedings of MFCS 2018, LIPIcs, vol. 117, pp. 62:1–62:13 (2018)
Fleischmann, P., Germann, S., Nowotka, D.: Scattered factor universality-the power of the remainder. preprint arXiv:2104.09063 (published at RuFiDim) (2021)
Fleischmann, P., Lejeune, M., Manea, F., Nowotka, D., Rigo, M.: Reconstructing words from right-bounded-block words. Int. J. Found. Comput. 32, 1–22 (2021)
Gawrychowski, P., Kosche, M., Koß, T., Manea, F., Siemer, S.: Efficiently testing Simon’s congruence. In: STACS, LIPIcs, vol. 187, pp. 34:1–34:18 (2021)
Hebrard, J.J.: An algorithm for distinguishing efficiently bit-strings by their subsequences. Theor. Comput. Sci. 82(1), 35–49 (1991)
Karandikar, P., Kufleitner, M., Schnoebelen, P.: On the index of Simon’s congruence for piecewise testability. Inf. Process. Lett. 115(4), 515–519 (2015)
Karandikar, P., Schnoebelen, P.: The height of piecewise-testable languages with applications in logical complexity. In: Proceedings of CSL, LIPIcs, vol. 62, pp. 37:1–37:22 (2016)
Karandikar, P., Schnoebelen, P.: The height of piecewise-testable languages and the complexity of the logic of subwords. LICS 15(2) (2019)
Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. J. ACM 53(6), 918–936 (2006)
Keogh, E., Lin, J., Lee, S.H., Van Herle, H.: Finding the most unusual time series subsequence: algorithms and applications. KAIS 11(1), 1–27 (2007)
Kosche, M., Koß, T., Manea, F., Siemer, S.: Absent subsequences in words. In: Bell, P.C., Totzke, P., Potapov, I. (eds.) RP 2021. LNCS, vol. 13035, pp. 115–131. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-89716-1_8
Kátai-Urbán, K., Pach, P., Pluhár, G., Pongrácz, A., Szabó, C.: On the word problem for syntactic monoids of piecewise testable languages. Semigroup Forum 84(2), 323–332 (2012)
Lothaire, M.: Combinatorics on Words. Cambridge Mathematical Library, Cambridge University Press, Cambridge (1997)
Maier, D.: The complexity of some problems on subsequences and supersequences. J. ACM (JACM) 25(2), 322–336 (1978)
Maňuch, J.: Characterization of a word by its subwords. In: DLT, pp. 210–219. World Scientific (2000)
Pach, P.: Normal forms under Simon’s congruence. Semigroup Forum 97(2), 251–267 (2018)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. preprint arXiv:1508.07909 (2015)
Simon, I.: Piecewise testable events. In: Brakhage, H. (ed.) GI-Fachtagung 1975. LNCS, vol. 33, pp. 214–222. Springer, Heidelberg (1975). https://doi.org/10.1007/3-540-07407-4_23
Wagner, R., Fischer, M.: The string-to-string correction problem. JACM 21(1), 168–173 (1974)
Wang, C., Cho, K., Gu, J.: Neural machine translation with byte-level subwords. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 9154–9160 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 IFIP International Federation for Information Processing
About this paper
Cite this paper
Fleischmann, P., Haschke, L., Huch, A., Mayrock, A., Nowotka, D. (2022). Nearly k-Universal Words - Investigating a Part of Simon’s Congruence. In: Han, YS., Vaszil, G. (eds) Descriptional Complexity of Formal Systems. DCFS 2022. Lecture Notes in Computer Science, vol 13439. Springer, Cham. https://doi.org/10.1007/978-3-031-13257-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-13257-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-13256-8
Online ISBN: 978-3-031-13257-5
eBook Packages: Computer ScienceComputer Science (R0)