Skip to main content

Run Generation Revisited: What Goes Up May or May Not Come Down

  • Conference paper
  • First Online:
Algorithms and Computation (ISAAC 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9472))

Included in the following conference series:

Abstract

We revisit the classic problem of run generation. Run generation is the first phase of external-memory sorting, where the objective is to scan through the data, reorder elements using a small buffer of size M, and output runs (contiguously sorted chunks of elements) that are as long as possible.

We develop algorithms for minimizing the total number of runs (or equivalently, maximizing the average run length) when the runs are allowed to be sorted or reverse sorted. We study the problem in the online setting, both with and without resource augmentation, and in the offline setting.

First, we analyze alternating-up-down replacement selection (runs alternate between sorted and reverse sorted), which was studied by Knuth as far back as 1963. We show that this simple policy is asymptotically optimal.

Next, we give online algorithms having smaller competitive ratios with resource augmentation. We demonstrate that performance can also be improved with a small amount of foresight. Lastly, we present algorithms tailored for “nearly sorted” inputs which are guaranteed to have sufficiently long optimal runs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The external-memory (or I/O) model applies to any two levels of the memory hierarchy.

  2. 2.

    Data structures such as heaps can identify the smallest elements in memory. But from the perspective of minimizing I/Os, this does not matter—computation is free in the DAM model.

  3. 3.

    Note that for a given input, minimizing the number of runs is equivalent to maximizing the average length of runs.

  4. 4.

    Due to space constraints, we defer some proofs to the full-version [2].

References

  1. Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Commun. ACM 31(9), 1116–1127 (1988)

    Article  MathSciNet  Google Scholar 

  2. Bender, M.A., McCauley, S., McGregor, A., Singh, S., Vu, H.T.: Run generation revisited: What goes up may or may not come down. arXiv preprint arXiv:1504.06501 (2015)

  3. Chandramouli, B., Goldstein, J.: Patience is a virtue: revisiting merge and sort on modern processors. In: Proceedings International Conference on Management of Data, pp. 731–742 (2014)

    Google Scholar 

  4. Estivill-Castro, V., Wood, D.: A survey of adaptive sorting algorithms. ACM Comput. Surv. 24(4), 441–476 (1992)

    Article  Google Scholar 

  5. Frazer, W., Wong, C.: Sorting by natural selection. Commun. ACM 15(10), 910–913 (1972)

    Article  Google Scholar 

  6. Friend, E.H.: Sorting on electronic computer systems. J. ACM 3(3), 134–168 (1956)

    Article  Google Scholar 

  7. Gassner, B.J.: Sorting by replacement selecting. Commun. ACM 10(2), 89–93 (1967)

    Article  MATH  Google Scholar 

  8. Goetz, M.A.: Internal and tape sorting using the replacement-selection technique. Commun. ACM 6(5), 201–206 (1963)

    Article  MATH  Google Scholar 

  9. Graefe, G.: Implementing sorting in database systems. ACM Comput. Surv. 38(3), 10 (2006)

    Article  Google Scholar 

  10. Knuth, D.E.: Length of strings for a merge sort. Commun. ACM 6(11), 685–688 (1963)

    Article  MATH  Google Scholar 

  11. Knuth, D.E.: The Art of Computer Programming: Sorting and Searching. Adison-Wesley, Reading (1998)

    MATH  Google Scholar 

  12. Lin, Y.C.: Perfectly overlapped generation of long runs for sorting large files. J. Parallel Distrib. Comput. 19(2), 136–142 (1993)

    Article  Google Scholar 

  13. Lin, Y.C., Lai, H.Y.: Perfectly overlapped generation of long runs on a transputer array for sorting. Microprocess. Microsyst. 20(9), 529–539 (1997)

    Article  Google Scholar 

  14. Mallows, C.L.: Patience sorting. Bulletin Inst. Math. Appl. 5(4), 375–376 (1963)

    Google Scholar 

  15. Martinez-Palau, X., Dominguez-Sal, D., Larriba-Pey, J.L.: Two-way replacement selection. Proc. VLDB Endow. 3, 871–881 (2010)

    Article  Google Scholar 

  16. Wikipedia: Timsort (2004). http://en.wikipedia.org/wiki/Timsort

Download references

Acknowledgments

We gratefully acknowledge Goetz Graefe and Harumi Kuno for introducing us to this problem and for their advice. This research was supported by NSF grants CCF 1114809, CCF 1217708, IIS 1247726, IIS 1251137, CNS 1408695, CCF 1439084, CCF 0953754, IIS 1251110, CCF 1320719, and by Google Research and Sandia National Laboratories.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hoa T. Vu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bender, M.A., McCauley, S., McGregor, A., Singh, S., Vu, H.T. (2015). Run Generation Revisited: What Goes Up May or May Not Come Down. In: Elbassioni, K., Makino, K. (eds) Algorithms and Computation. ISAAC 2015. Lecture Notes in Computer Science(), vol 9472. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48971-0_59

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-48971-0_59

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-48970-3

  • Online ISBN: 978-3-662-48971-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics