On the Word Fragment Length for Unambiguous Reconstruction of a Periodic Word from a Complete Multiset of Fragments of Fixed Length

We consider the problem of reconstructing a word from a multiset of its fragments of fixed length. Words consist of symbols from a finite alphabet. The word to be reconstructed is assumed to be periodic or contain a periodic word as a subword. It is shown that a periodic word with a period p can be reconstructed from a multiset of its fragments of length k, where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k \geqslant \left\lfloor {\frac{{16}}{7}\sqrt p } \right\rfloor + 5$$\end{document}. For a word consisting of a q-periodic prefix repeated m times and a p-periodic suffix repeated l times, if \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l \geqslant m{{q}^{{\left\lfloor {\frac{{16}}{7}\sqrt P } \right\rfloor + 5}}}$$\end{document}, then the estimate becomes \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k \geqslant \left\lfloor {\frac{{16}}{7}\sqrt P } \right\rfloor + 5$$\end{document}, where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P = {\text{max(}}p,~q{\text{)}}$$\end{document}.

In the general case, the problem of reconstructing an object of some nature from incomplete information on its "parts" can be treated as a pattern recognition problem [1]. Specifically, when objects are words in the form of sequences of symbols over an alphabet, the question arises as to whether a word can be reconstructed from a set or multiset of its subwords. This problem is concerned with an area of discrete mathematics known as combinatorics on words [2][3][4][5][6], which studies the relationship between sequences of symbols and sets of their subsequences.
The problem of reconstructing words from subwords has a number of applications. For example, in character encoding, arbitrary information is represented in the form of sequences of symbols and words [7]. This method of information representation is used, for example, in data transmission through communication channels [8], when it is important to † Deceased. ensure reliable transmission of encoded information without loss or to recover the original word from available fragments in the case of loss [9]. Character encoding is also used, for example, in time series analysis [10] and in biocomputer science [11].
Estimates for the fragment length sufficient for an arbitrary word to be reconstructed from the multiset of all its subwords were obtained in [12,13].
In this paper, on the words to be reconstructed, we impose the constraint that they contain a periodic subword. This makes it possible to reduce the fragment length sufficient for word reconstruction. For example, a result concerning the reconstruction of a word with a periodic suffix and an aperiodic prefix was presented in [17]. Below, this result of [17] is strengthened in Theorem 3, which deals with the reconstruction of a word with a periodic suffix and a periodic prefix. Theorems 1 and 2 concern the reconstruction of periodic words and repeat their counterparts from [17]. Additionally, some more cases of words that are not periodic, but contain a periodic word or subwords are considered in this paper. Specifically, we state and prove theorems on the reconstruction of a word with an aperiodic subword contained between periodic prefix and suffix (Theorem 4) and for a periodic word with a constraint on a repeated subword (Theorem 5). In turn, each of these theorems is included in Theorem 3 as a special case.

FORMULATION OF THE PROBLEM
We introduce the following notation and definitions. In what follows, Greek letters denote words, and lowercase Latin letters denote alphabetic symbols. Let denote the set of words of length from the alphabet {0, 1}. For a word , the symbol denotes the sum of its elements: + ... + a n . Let denote the empty word, i.e., the word of length zero. The word raised to power denotes the word consisting of repeats of the word , i.e., , where .
Given a word and a reference vector , where , the fragmentation operation constructs a word of length according to the following rule: A fragment, or subword, of a word α = is a word of the form Let be the smallest fragment length k for which the word α of length is uniquely reconstructed from a multiset of its subwords of length k.
Problem. In the general form, the problem of reconstructing a word from a multiset of its subwords is formulated as follows. Given • a set of reference vectors V = , , and • a set of words X = {χ 1 , χ 2 , ..., χ N }, , the task is to check whether X is a set of fixed-length fragments of some unknown word constructed with the help of fragmentation operations with vectors from V and to find all possible solutions.
Note that all estimates obtained in the case of the binary alphabet remain valid for an arbitrary alphabet [3], since, in the case of the alphabet , the problem is reduced to a set of problems in the binary alphabet with the help of the mappings Here, the subword is called the generating subword for a word α of form (1).

STATE OF THE ART IN THE PROBLEM
It is known that the problem of word reconstruction from subwords can be reduced to verifying the uniqueness of solutions to Diophantine equations of certain type [2]. It has been shown that the problem of existence and uniqueness of a solution is NP-complete.
In the case of a complete multiset of fragments ( ), it was established [14] that a word α of length can be uniquely reconstructed if the fragment length satisfies . For the same case of a complete multiset of fragments, the following estimates for the sufficient fragment length are known [3,13,12]: , , .
Several special cases were studied. For example, it was shown in [15] that, for words consisting of series, it is sufficient to use fragments of length .

RESULTS
The first theorem is as follows.

Theorem 1. A periodic word of length and period can be uniquely reconstructed from the multiset of all its fragments of length k if
Before proving the theorem, we introduce the following quantity. Definition 1. For a word , let denote the number of fragments equal to β in .
It was shown in [2] that, given for all binary words β of length k, the numbers of fragments of all lengths shorter than k can be uniquely reconstructed. Next, we formulate and prove a lemma that directly follows from the system of equations in [2]. Lemma 1. For any word , the set of its moments of the form is uniquely determined by the set of its fragments of the form .
Proof. Let us derive formulas for for various j: The functions are computed using combinatorial relations. For example, we find an expression for , , in terms of obtained at the preceding steps (here, it is possible to set ): Since, by assumption, all are known, is known as well. Now we can prove Theorem 1. Proof. The proof is based on the results of [13] concerning the lengths of fragments sufficient for unique word reconstruction in the case of arbitrary words.
The estimate of f*(α) in [13] was obtained by analyzing the system of equations (2) for which the solution is unique if .
Suppose that the word consists of periods: . Then the zero equation can be rewritten as . For an arbitrary index ranging from 1 to k -1 inclusive, we have where is a linear function. The rest of the proof is the same as in [13].
Let us formulate a stronger result.

Theorem 2. A periodic word of length with period p is uniquely reconstructed from the multiset of all its fragments of length if
Proof. It is based on [12], where the possibility of unique reconstruction of a word from its subwords is proved not by analyzing system (2) used in [13], but rather by analyzing a similar system of the form   . Thus, the proof is reduced to a search for a condition under which the system of Diophantine equations (3) has only the trivial solution. Next, the authors of [12] refer to [16], where the following result is obtained: For the case of periodic words, in that work, it is shown in the proof of Theorem 1 how system (3) can be rewritten in terms of only , where p is the word period. Thus, the original word can be uniquely reconstructed from subwords if their length satisfies Word reconstruction is also possible when the word itself is aperiodic, but consists of several subwords, at least one of which is periodic. Below, we consider several cases of such words. For each case, we state a separate theorem, which is proved relying on what was proved previously for a periodic word (Theorem 2).  The first sum in the expression for (5) can be represented as follows:

Theorem 3. Suppose that a word consists of two subwords: a prefix and a suffix, which are both periodic:
Similarly, for the second sum in (5), we obtain Finally,

If
, then the equations for the prefix and the suffix split, as in the case with . Now, we consider the case of an arbitrary such that : The first sum is expanded as In the last transition, the term with raised to power (for ) was separated from the others. The terms with raised to lower powers were represented in the form of a linear function .
Similarly, for the second sum, we have Here, in the last transition, the term with raised to power (for and s = 0) was again separated from the others. The terms with raised to lower powers were represented in the form of a linear function .
Thus, combining the results obtained for both sums, we can return to the computation of : where denotes a linear function of that is the sum of and .
Since , it follows that, for m · < l, all the equations split, and we obtain two systems. Therefore, according to what was proved earlier for the case of a periodic word (Theorem 2), for , where , the subword length has to satisfy .

Corollary 1. For m = 1, has the form of a word with an aperiodic prefix and a periodic suffix:
If the suffix is longer than the prefix, namely, , then the word can be uniquely recon-