Longest bordered and periodic subsequences

We present an algorithm computing the longest periodic subsequence of a string of length n in O ( n 7 ) time with O ( n 3 ) space. We obtain improvements when restricting the exponents or extending the search allowing the reported subsequence to be subperiodic down to O ( n 2 ) time and O ( n ) space. By allowing subperiodic subsequences in the output, the task becomes ﬁnding the longest bordered subsequence, for which we devise a conditional lower bound. © 2023 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons .org /licenses /by /4 .0/).


Introduction
A natural extension of the analysis of regularities such as squares or palindromes perceived as substrings of a given text is the study of the same type of regularities when considering subsequences.In this line of research, given a text of length n, Kosowski [25] proposed an algorithm running in O(n 2 ) time using O(n) words of space to find a longest subsequence that is a square.Inoue et al. [21] generalized this setting to consider the longest such subsequence common of two texts T and S of length n, and gave an algorithm computing this sequence in O(n 6 ) time using O(n 4 ) space, also providing improvements in the case where the number of matching character pairs T [i] = S[ j] is rather small.Recently, Inoue et al. [22] provided similar improvements for the longest square subsequence of a single string.Here, we consider the problem for a single text, but allow the subsequence to have expo-nents other than a multiple of two.In detail, we want to find a longest subsequence that is periodic (exponent ≥ 2) or sub-periodic (exponent strictly between 1 and 2).

Preliminaries
Let N denote all natural numbers 1, 2, . .., and Q + the set of all rational numbers greater than or equal to 1.We distinguish integer intervals {1, 2, . . ., n} = [1..n] ⊂ N and intervals of rational numbers Given a string S, we can write S in the form S = U α U with U being either empty or a proper prefix of U .Then u := |U | is called a period of S, and δ := α + |U |/u ∈ Q + is called the exponent of S with respect to the period u.
For the largest possible such exponent δ, S is called periodic if δ ≥ 2, or sub-periodic if δ ∈ (1, 2).For instance, the unary string T = a ter IV, Section 15.4] for a textbook reference).Instead of just two strings, we also make use of its generalization to determine the longest common subsequence of multiple strings: for a sequence X 1 , X 2 , • • • , X k of strings, we write LCS( X 1 , X 2 , . . ., X k ) to denote the longest common subsequence of X 1 , X 2 , . .., and X k .Assuming that all strings X j have length O(n) for j ∈ [1..k], it is known that we can lookup the length of the longest common subsequence space the length of the LCS by keeping only two rows in one dimension of the k-dimensional DP matrix.To actually retrieve an LCS within o(n k ) space, Schrödl [29] noticed that Hirschberg's algorithm [17] can reduce the space for the computation of the LCS of k strings of length n from O(n k ) to O(kn k−1 ).

Structure of the paper
We first show an algorithm (Sect.3) that computes the longest bordered subsequence.Bordered subsequences can be either periodic (δ ≥ 2) or sub-periodic (δ ∈ (1, 2) ⊂ Q + ).By reducing the problem of computing the (classic) LCS to the computation of the longest bordered subsequence, we obtain a conditional lower bound such that it seems hard to improve our algorithm for the longest bordered subsequence significantly.Subsequently, we modify this algorithm to omit the sub-periodic subsequences by allowing more time and space in Sect. 4. Table 1 gives an overview of our obtained results.Our algorithm that finds a longest bordered subsequence has better time and space complexities than those that find longest subsequences with more restricted exponents.
A key observation is that a longest (sub-)periodic subsequence S is maximal, meaning that no occurrence Space and time complexities for finding subperiodic and/or periodic subsequences of specific exponents.is a rational number with 0 < < 1.
The exponent column means that a subsequence is considered only if it has at least one exponent within the domain given in the column.
.] to form a longer subsequence without breaking the periodicity.

Longest bordered subsequence
We start with the search for a longest subsequence with a rational exponent δ > 1.Thus, this sequence is periodic or subperiodic, i.e., it has a border.The idea is that we iterate over all possible factorizations of and for each such factorization, we compute the longest subsequence U U common in both Y and Z , and then find the longest extension of U to U U that is still a subsequence of Y , which gives us U U U as a possible candidate for the answer to our query.We then return the length of the longest such candidate with respect to all partitions.
Formally, we fix a factorization T = Y • Z , and define an array 1 for an example.The following lemma states that G 1 holds the answer to our problem if we have chosen the correct partition.

Lemma 3.1. The maximum value of G 1 is the length of the longest subsequence U U U of Y Z with U U and U being subsequences of Y and Z , respectively.
Proof.Let y s be the index at which the maximum value of G 1 is stored.Let S be the longest common subsequence of Y [1..y s ] and Z , and It is left to iterate over all partitions, and take the maximum.This leads us to the following theorem.value G 1 [y ] in G 1 , we can compute an instance of a bordered subsequence that has the length G 1 [y ] as follows.
Since the y -th entry stores the maximum length, the so- We subsequently show that we can improve the running time by some machinery due to Tiskin [31, Section 4.3], who showed that we can compute a longest repeated subsequence, i.e., a subsequence whose expo- namely that these counting queries can be answered by the data structure of Chan and Patrascu [8], which uses O(n) space, can be built in O(n lg n) time, and answers such a query in O(lg n/ lg lg n) time.In total, we obtain an algorithm computing the longest bordered subsequence in O(n 2 lg n/ lg lg n) time while using O(n) working space.For our setting, it is possible to shave the O(lg n/ lg lg n) time factor by resorting to simple scans of the matrix instead of building and using the data structure of Chan and Patrascu [8].The idea is to process all possible corners row by row in a left-to-right manner, computing the entries of the matrix H S,T needed in [30,Thm. 4.10] To give evidence that a significant improvement over this algorithm is unlikely, we adapt the hardness result of Inoue et al.In what follows we show that L always needs to contain all occurrences of # and $ appearing in T , i.e., L is of the form # m Y $ m # m Z $ m , with Y and Z being subsequences of Y and Z , respectively.
• If U contains a #, then # must be a prefix of U and U .If this were not the case then we would have selected characters from the second character run of #'s in T , i.e., for U we can select characters from the (2n + 3m)-length substring Y $ m # m Z $ m of T .However, 2n + 3m < 4m, a contradiction.Hence, U and U contain characters from the first and the second character run of # in T , and we maximize the length of L by taking all m #'s of each such run.
• By symmetry, if U contains a $, then $ m must be suffix of U and U .
• Otherwise (U contains no # and U contains no $), we have selected a subsequence of the substring Y $ m # m Z of T , which has a length less than 4m.
We conclude that U and U have # m and $ m as a prefix and a suffix, respectively.
Since # and $ are unique characters, the longest border of L is U = # m Z $ m , and therefore we know that Z is the longest common subsequence of Y and Z .In particular, Y cannot be longer than Z since U and U both start and end with the same characters, that appear nowhere else.
We conclude that U = U , in particular Y = Z .Finally, the LCS of Y and Z is Z with 2| Z | = |L| − 4m.
Abboud et al. [1] showed that the strong exponential time hypothesis (SETH) is false if there exists a constant > 0 and an algorithm computing the LCS of k ≥ 2 strings in O(n k− ) time for an alphabet of size O(k).Hence, solving the longest bordered subsequence in O(n 2− ) time for any > 0 would imply that SETH is false.
In the following we want to omit subsequences having exponents only in (1,2), i.e., we are interested in the longest subsequence having an exponent of at least two.

Longest periodic subsequence
Based on the approach of the previous section, our main idea is to generalize the factorization of T from 2 to k factors.By computing the LCS of these k factors, we can obtain all longest periodic subsequences having an exponent of the form k with ∈ N.For k = 2, we can find all square subsequences, i.e., the longest common subsequence with an exponent of 2α for α ∈ N, similar to Kosowski [25].We generalize to exponent values with ∈ Q + by stopping matching characters in the last factor when computing a solution like in Sect.3. Instead of considering all possible factorization sizes, the following lemma shows that it is sufficient to consider only the values k = 3 and k = 4. Lemma 4.1.Given a subsequence of T with a rational exponent larger than two, it has the form V 2 V or V 3 V with V being a (not necessarily proper) prefix of V .
Proof.Let S = U δ be a subsequence of T with a rational exponent δ larger than two.
First, let us consider δ ≥ 4. Let V := U δ/3 and V = U β with β := δ − δ/3 • 2. Notice that 0 ≤ β ≤ δ/3 for δ ≥ 4, so we can write S in the form V 2 V .For δ ∈ (2, 3], set V := U and V := U β with β ≤ 1.We can again write S in the form V 2 V .Finally, we consider δ ∈ (3, 4), i.e., S has the form S = (U U ) 3 U .Then it is in general not possible to write S in the form V 2 V since both occurrences of V as well as V have to start with a prefix of U having four occurrences.However, it is possible to write S as V 3 V by setting V := U U and V := U .
A conclusion is that we can find all subsequences with an exponent in (2,3] ∪ [4, ∞) by a factorization of size three, while we need a factorization of size four for finding those with exponents within (3,4).

Three factors
We start with an algorithm factorizing T into three factors, and first consider a special case.Lemma 4.2.We can find a longest periodic subsequence of T with an exponent divisible by three in O(n 5 ) time using O(n 2 ) space.
Proof.Let T = X Y Z be a factorization of T into three factors.We can compute the longest periodic subsequence X Y Z of T with an exponent divisible by three such that X , Y , and Z are subsequences of X , Y , and Z , respectively, by a call of LCS X,Y ,Z (|X|, |Y |, |Z |), taking O(n 3 ) time and O(n 2 ) space.However, there are n 2 = O(n 2 ) such partitions of T into three factors, and therefore, we perform O(n 2 ) LCS computations, giving O(n 5 ) total time to determine the factorization that yields the maximal length.We then apply Hirschberg's algorithm on that factorization to obtain a subsequence of T with the desired properties.
For the general case, we can make an observation similar to Sect. 3 for G 1 in that it suffices to determine the pair of positions (x, y) until which we consider the longest common subsequence U of X[1..x], Y [1..y], and Z and then prolong the subsequences U contained in X [1..x] and Y [1..y] with the longest common subsequence U of X[x + 1..] and Y [y + 1..] to form U U .More formally, let G 2 be a 2-dimensional matrix with We again iterate over all possible factorizations and obtain the following theorem.

Four factors
Finally, we consider a factorization of size four to capture exponents in (3,4).We have n 3 = O(n 3 ) possibilities to factorize T into four factors W , X, Y , Z .Let us fix a factorization T = W XY Z.
We can compute similarly to G 2 the 3-dimensional matrix Compared to Sect.4.1, we just added another dimension by the additional input string W , which is treated the same way as we have treated X and Y .Therefore, the consequences are straight-forward.Proof.We have O(n 3 ) different factorizations, and for each factorization we compute G 3 in O(n 4 ) time within O(n 3 ) additional space.

Open problems
We are unaware of polynomial-time algorithms computing several other types of regularities when considering subsequences.For instance, we are not aware of an algorithm computing the longest sub-periodic subsequence, or an efficient algorithm computing the longest (common) subsequence without a border.Other problems are finding the longest (common) subsequence that is primitive (no exponent in N \ {1}), or the longest (common) subsequence that is non-primitive (exponent in N \ {1}).Since our time bounds compared to the longest square subsequence are rather large, we expect that it should be possible to reduce the time complexities by linear factors.However, in the case that we have tight bounds, as mentioned by Tiskin [32], it could be interesting to study accelerating techniques such as introduced in [13], dividing the time by a poly-logarithmic term.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Theorem 3 . 2 .
We can find a longest bordered subsequence in O(n 3 ) time using O(n) space.Proof.Fix one of the n different factorizations T = Y • Z .While we compute the lengths in G 1 , we compute simultaneously the necessary LCS values keeping only the y-th row of the DP matrix in memory such that we can compute LCS Y ,Z (y + 1, |Z |) from this row in O(|Z |) time and space.Neglecting the LCS computation, we can compute each entry of G 1 in constant time.Given the maximum Information Processing Letters 182 (2023

Fig. 1 .
Fig. 1.G 1 defined in Sect. 3 for the text T = abcaaaca = Y • Z with the factors Y = abcaa and Z = aca.For visualization, we show the subsequences found with G 1 , although G 1 actually stores only their lengths.Each • separates the subsequences extracted from Y and Z .According to Lemma 3.1, the length of the longest bordered subsequence U U U of T with U U and U being subsequences of Y and Z , respectively, is the maximum value in G 1 , which is G 1 [4] = 7 in this example.
asks to compute the length of an LCS of T[1..i] and T [ j..].Tiskin [31, Theorem 4]  proposed a data structure answering a prefix-suffix LCS query in O(lg n/ lg lg n) time; it can be built in O(n 2 lg lg n/ lg n) time.For finding a longest repeated subsequence, it is sufficient to consider only the n − 1 queries of the formj = i + 1 ∈ [2..n].Similarly, we can find a longest bordered sequence bycomputing the two indices i, j ∈ [1..n] with i < j that maximize 2 • LCS(T [1..i], T [ j..n]) + ( j − i − 1) under the constraint that LCS(T [1..i], T [ j..n]) > 0. These two indices can be found by considering all O(n 2 ) prefix-suffix LCS queries.We can then backtrack to find a longest common subsequence S of T [1..i] and T [ j..n] (for instance by computing LCS(T [1..i], T [ j..n]) in the textbook way using O(n 2 ) time), and report S • T [i + 1.. j − 1] • S as the solution.In more detail, Tiskin[30, Algorithm 5.1] computes a so-called semi-local seaweed matrix in O(n 2 ) time in O(n) space.The semi-local seaweed matrix is a binary (permutation) matrix of size 2n × 2n, which stores at most one '1' per row, and thus can be represented spaceeconomically in O(n) space [9, Definition 3].By interpreting the O(n) '1's of this matrix as points in the plane, Tiskin [30, Thm.4.10 and Equations after Ex. 4.9] reduces each prefix-suffix LCS to counting the number of points in a quarter-plane.Here, we can make use of the observation of Charalampopoulos et al. [9, Lemma 4],

[ 20 ,Theorem 3 . 4 .
Lemma 10] of longest square subsequences for longest bordered subsequences: The computation of LCS(Y , Z ) for two strings Y and Z , each of length n, can be reduced to the computation of a longest bordered subsequence of a string of length O(n) in linear time.Proof.Let m := 2n + 1 and define the string T := # m Y $ m # m Z $ m with two special characters # and $ that do neither appear in Y nor in Z .In what follows, we claim that we can deduce an LCS of Y and Z from a longest bordered subsequence L of T .Since |Y Z| = 2n < |# m $ m # m $ m | = 4m, L has a length of at least 4m by consuming all occurrences of # and $ appearing in T .Since L is a bordered subsequence, we can write it as L = U U with U being a non-empty prefix of U .

Fig. 2 Lemma 4 . 3 .
Fig. 2 for an example.The following property holds.Lemma 4.3.The maximum value stored in G 2 is the length of the longest subsequence (U U ) 2 U of T such that (a) U U is a common subsequence of X[1..x], and Y [1..y], and (b) U is a common subsequence of X[1..x], Y [1..y], and Z .Proof.We basically follow the proof of Lemma 3.1.Let x s and y s be the indices in G 2 at which the maximum value G 2 [x s , y s ] is stored.Let S be the longest common subsequence of X[1..x s ], Y [1..y s ], and Z , and let S:= LCS( X[x s + 1..], Y [y s + 1..]).By definition, |(S S ) 2 S | = G 2 [x s , y s ].Assume that |(S S ) 2 S | < |(U U ) 2 U |. Let x u := pos X (U )and y u := pos Y (U ) be the smallest indices such that U is a common subsequence of X[1..x u ], Y [1..y u ], and Z .On the one hand, we have |U | ≤ LCS X,Y ,Z (x u , y u , |Z |), and equality follows from the fact that we otherwise would find a maximum longest common subsequence V of X[1..x u ], Y [1..y u ], and Z such that |V | = LCS X,Y ,Z (x u , y u , |Z |) and (V U ) 2 V is longer than (U U ) 2 U .On the other hand, we have |U | ≤ LCS( X[x u + 1..], Y [y u + 1..]), and equality follows from a similar argument.Consequently, |(U U ) 2 U | = G 2 [x u , y u ] ≤ G 2 [x s , y s ].

Fig. 2 .
Fig. 2. G 2 defined in Sect.4.1 for the text T = aabbabbaab = X Y Z with the factors X = aabb, Y = abb, and Z = aab.Given a longest subsequence (U U ) 2 U with U U being a common subsequence of X and Y , and U being a common subsequence of X , Y , and Z , then, according to Lemma 4.3, the length of this subsequence is the maximum value stored in G 2 , which is G 2 [3, 2] = 8 in this example.

Lemma 4 . 5 .Theorem 4 . 6 .
The maximum value stored in G 3 is the length of the longest subsequence (U U ) 3 U of T such that (a) U U is a common subsequence of W [1..w], X[1..x], and Y [1..y], and (b) U is a common subsequence of W [1..w], X[1..x], Y [1..y], and Z .Proof.Analogous to Lemma 4.3.By adding an additional dimension to the proof of Theorem 4.4 to compute G 3 instead of G 2 , we obtain our final theorem.We can compute the longest periodic subsequence with an exponent in α∈N (3α, 4α) in O(n 7 ) time using O(n 3 ) space.

Finally, we would
like to study better lower bounds on the time complexities for computing our proposed problems.For m = Θ(n) large enough, we can generalize Theorem 3.4 to make use of Theorem 4.4 computing the longest periodic subsequence with an exponent in(2,3) ∪ [4, ∞) of the string # m X$ m # m Y $ m # m Z $ m to find LCS( X, Y , Z ),and of Theorem 4.6 computing the longest periodic subsequence with an exponent inα∈N (3α, 4α) of the string # m W $ m # m X$ m # m Y $ m # m Z $ m to find LCS(W , X, Y , Z ),where W , X, Y , Z are four strings of equal length n.Hence, there is a multiplicative gap of O(n 2 ) and O(n 3 ) in the lower and upper bounds of the time complexities of Theorem 4.4 and Theorem 4.6, respectively.
• • • a has the minimum period 1 with exponent |T |, or more generally, period p ∈ [1..|T |] with exponent |T |/p.A string S has a border P if P is both a non-empty proper prefix and a proper suffix of S. If S has no border, then it has only |S| as a period (with exponent 1).Further, for two strings Y and Z , let LCS Y ,Z (y, z) denote the longest common subsequence (LCS) of Y [1..y] and Z [1..z].We assume the reader to be familiar with the computation of the LCS via dynamic programming (DP), involving a DP matrix with |Y ||Z | entries (cf.[12, Chap- we have a table of size O(n k ) whose entries we can fill in O(kn k ) time via dynamic programming.Instead of a table, we can determine in O(n k−1 ) and an instance of U can be retrieved by Hirschberg's algorithm applied to the computation of LCS(Y [1..y ], Z ).In total, we need O(|Y ||Z |) time for the LCS computation and Hirschberg's algorithm, but only O(n) space.So we need O(n 2 ) time per factorization T = Y • Z , and therefore O(n 3 ) time for the entire computation.
and [9,Lemma   4], in O(n 2 ) total time.H S,T is a matrix whose entries correspond to precomputed LCS values of prefixes/suffixes of S and T ; however H S,T is not computed explicitly.The space can be bounded by O(n) by maintaining the counter indices for processing the matrix and the length of the longest bordered subsequence computed so far.Overall, we obtain the following result.
[3,2] a longest subsequence (U U )2U with U U being a common subsequence of X and Y , and U being a common subsequence of X , Y , and Z , then, according to Lemma 4.3, the length of this subsequence is the maximum value stored in G 2 , which is G 2[3,2]= 8 in this example.Let us fix a factorization T = X • Y • Z .We allocate space for G 2 having O(n 2 ) entries, and fill G 2 as follows: While computing the length of the LCS of X, Y , Z with O(|Y ||Z |) space keeping only the DP matrix values for X[1..x − 1] and X[1..x] in memory, we can compute G 2 [x, y] in constant time.Hence, we can fill G 2 in O(n 3 ) time (the time for the LCS computation of X, Y , Z ) while needing O(|Y ||Z |) = O(n 2 ) additional working space.If G 2 [x , y ] stores the maximum value of G 2 , we can proceed similarly to the proof of Theorem 3.2.
Namely, we can make use of Hirschberg's algorithm to retrieve a common subsequence U of X, Y , and Z of length LCS X,Y ,Z (x , y , |Z |).After computing the common subsequence S of X[x + 1..] and Y [y + 1..] in O(n 2 ) time and space, we obtain the subsequence U SU .In total, we can compute this subsequence in O(n 3 ) time and O(n 2 ) space per factorization.Finally, we proceed as in the proof of Lemma 4.2 for taking the maximum value of the computed lengths over all factorizations and finding an instance of a subsequence with the desired properties.