Abstract
In the Minimum Common String Partition problem (MCSP), we are given two strings on input, and we wish to partition them into the same collection of substrings, minimizing the number of the substrings in the partition. This problem is NP-hard, even for a special case, denoted 2-MCSP, where each letter occurs at most twice in each input string. We study a greedy algorithm for MCSP that at each step extracts a longest common substring from the given strings. We show that the approximation ratio of this algorithm is between Ω(n0.43) and O(n0.69). In the case of 2-MCSP, we show that the approximation ratio is equal to 3. For 4-MCSP, we give a lower bound of Ω(log n).
- Chen, X., Zheng, J., Fu, Z., Nan, P., Zhong, Y., Lonardi, S., and Jiang, T. 2005. Computing the assignment of orthologous genes via genome rearrangement. In Proceedings of the 3rd Asia-Pacific Bioinformatics Conference. 363--378. To appear in IEEE/ACM Transactions on Computational Biology and Bioinformatics. Google ScholarDigital Library
- Chrobak, M., Kolman, P., and Sgall, J. 2004. The greedy algorithm for the minimum common string partition problem. In Proceedings of the 7th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems. Lecture Notes in Computer Science, vol. 3122. Springer, Berlin, Germany, 84--95.Google Scholar
- Cormode, G., and Muthukrishnan, J. 2002. The string edit distance matching with moves. In Proceedings of the 13th Annual Symposium on Discrete Algorithms. 667--676. Google ScholarDigital Library
- Goldstein, A., Kolman, P., and Zheng, J. 2004. Minimum common string partition problem: Hardness and approximations. In Proceedings of the 15th International Symposium on Algorithms and Computation. Lecture Notes in Computer Science, vol. 3341. Springer, Berlin, Germany, 473--484. Google ScholarDigital Library
- Kaplan, H., and Shafrir, N. 2005. The greedy algorithm for edit distance with moves. Inform. Process. Lett. To appear. Google ScholarDigital Library
- Kolman, P. 2005. Approximating reversal distance for strings with bounded number of duplicates. In Proceedings of the 30th International Symposium on Mathematical Foundations of Computer Science. Lecture Notes in Computer Science, vol. 3618. Springer, Berlin, Germany, 580--590. Google ScholarDigital Library
- Kruskal, J. B., and Sankoff, D. 1983. An anthology of algorithms and concepts for sequence comparison. In Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, D. Sankoff and J. B. Kruskal, Eds. Addison-Wesley, Reading, MA.Google Scholar
- Levenshtein, V. I. 1965. Binary codes capable of correcting deletions, insertions and reversals. Dok. Akad. Nauk SSSR 163, 4, 845--848. (in Russian).Google Scholar
- Lopresti, D., and Tomkins, A. 1997. Block edit models for approximate string matching. Theoret. Comput. Sci. 181, 1, 159--179. Google ScholarDigital Library
- Shapira, D., and Storer, J. 2002. Edit distance with move operations. In Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 2373. Springer, Berlin, Germany, 85--98. Google ScholarDigital Library
- Tichy, W. F. 1984. The string-to-string correction problem with block moves. ACM Trans. Comput. Syst. 2, 4, 309--321. Google ScholarDigital Library
- Waterman, M. S., and Eggert, M. 1987. A new algorithm for best subsequence alignments with applications to tRNA-rRNA comparison. J. Molec. Biol. 197, 4, 723--728.Google ScholarCross Ref
Index Terms
- The greedy algorithm for the minimum common string partition problem
Recommendations
On the minimum common integer partition problem
We introduce a new combinatorial optimization problem in this article, called the minimum common integer partition (MCIP) problem, which was inspired by computational biology applications including ortholog assignment and DNA fingerprint assembly. A ...
A novel greedy algorithm for the minimum common string partition problem
ISBRA'07: Proceedings of the 3rd international conference on Bioinformatics research and applicationsThe Minimum Common String Partition problem (MCSP) is to partition two given input strings into the same collection of substrings, where the number of substrings in the partition is minimized. This problem is a key problem in genome rearrangement, and ...
Minimum common string partition problem: hardness and approximations
ISAAC'04: Proceedings of the 15th international conference on Algorithms and ComputationString comparison is a fundamental problem in computer science, with applications in areas such as computational biology, text processing or compression In this paper we address the minimum common string partition problem, a string comparison problem ...
Comments