research-article

Analytic modeling of network processors for parallel workload mapping

Authors:
Ning Weng

Southern Illinois University Carbondale, Carbondale, IL

Southern Illinois University Carbondale, Carbondale, IL
View Profile

,
Tilman Wolf

University of Massachusetts Amherst, Amherst, MA

University of Massachusetts Amherst, Amherst, MA
View Profile

ACM Transactions on Embedded Computing Systems Volume 8 Issue 3Article No.: 18pp 1–29https://doi.org/10.1145/1509288.1509290

Published:22 April 2009Publication History

ACM Transactions on Embedded Computing Systems

Abstract

Network processors are heterogeneous system-on-chip multiprocessors that are optimized to perform packet forwarding and processing tasks at Gigabit data rates. To meet the performance demands of increasing link speeds and complex network applications, network processors are implemented with several dozen embedded processor cores and hardware accelerators that run multiple packet processing applications in parallel. The parallel nature of the processing system makes it increasingly difficult for application developers to understand and manage resources and map processing tasks to the hardware. To address this problem, we present a methodology for profiling and analyzing network processor applications, mapping processing tasks to a generalized network processor architecture, and analytically determining the expected throughput performance. The key novelty of this work is not only the adaptation of application analysis and mapping algorithms to heterogeneous network processors, but also that the entire process can be automated and hidden from the application developer. Starting with the analysis of a uniprocessor implementation of the application, the process yields a mapping of the partitioned application that shows best performance for a given network processor system. The simplicity of the proposed randomized mapping algorithm allows the use of this methodology in network processor runtime systems where dynamic reallocation of tasks is necessary but processing power is limited. We present results that show the effectiveness of the analysis and mapping methodology as well as its application to design space exploration.

References

Agarwal, A. 1992. Performance tradeoffs in multithreaded processors. IEEE Trans. Parall. Distrib. Syst. 3, 5, 525--539. Google ScholarDigital Library
Austin, T. M. and Sohi, G. S. 1993. Tetra: evaluation of serial program performance on fine-grain parallel processors. Tech. rep. 1163, Computer Science Department, University of Wisconsin, Madison.Google Scholar
Baker, F. 1995. Requirements for IP version 4 routers. RFC 1812, Network Working Group. Google ScholarDigital Library
Bhandarkar, D. P. 1975. Analysis of memory interference in multiprocessors. IEEE Trans. Comput. C- 24, 9, 897--908. Google ScholarDigital Library
Daemen, J. and Rijmen, V. 2000. The block cipher Rijndael. Lecture Notes in Computer Science. Vol. 1820. Springer-Verlag, Berlin, Germany, 288--296.Google Scholar
Dowdy, L. W., Rosti, E., Serazzi, G., and Smirni, E. 1999. Scheduling issues in high-performance computing. SIGMETRICS Perform. Eval. Rev. 26, 4, 60--69. Google ScholarDigital Library
Foster, I. and Kesselman, C., Eds. 2004. The Grid -- Blueprint for a New Computing Infrastructure, 2nd Ed. Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
Franklin, M. A. and Wolf, T. 2002. A network processor performance and design model with benchmark parameterization. In Proceedings of the 1st Network Processor Workshop (NP-1) in Conjunction with the 8th International Symposium on High-Performance Computer Architecture (HPCA-8). ACM, New York, 63--74.Google Scholar
Franklin, M. A. and Wolf, T. 2003. Power considerations in network processor design. In Proceedings of the 2nd Network Processor Workshop (NP-2) in Conjunction with 9th International Symposium on High-Performance Computer Architecture (HPCA-9). ACM, New York, 10--22.Google Scholar
Goglin, S. D., Hooper, D., Kumar, A., and Yavatkar, R. 2003. Advanced software framework, tools, and languages for the IXP family. Intel Tech. J. 7, 4, 64--76.Google Scholar
Grasso et al., P. A. 1984. Memory interference in multimicroprocessor systems with a time-shared bus. Proc. IEEE 131, 10.Google Scholar
Gries, M., Kulkarni, C., Sauer, C., and Keutzer, K. 2003. Exploring trade-offs in performance and programmability of processing element topologies for network processors. In Proceedings of the 2nd Network Processor Workshop (NP-2) in Conjunction with 9th International Symposium on High-Performance Computer Architecture (HPCA-9). ACM, New York, 75--87.Google Scholar
Hoogendoorn, C. H. 1977. A general model for memory interference in multiprocessors. IEEE Trans. Comput. c-26, 10, 998--1005. Google ScholarDigital Library
Intel Corporation 2003. Intel IXA software developers Kit 2.01.Google Scholar
Kapasi, U. J., Rixner, S., Dally, W. J., Khailany, B., Ahn, J. H., Mattson, P., and Owens, J. D. 2003. Progammable stream processors. IEEE Comput. 36, 8, 54--62. Google ScholarDigital Library
Karp, R. M. 1991. An introduction to randomized algorithms. Discrete Appl. Math. 34, 1-3, 165--201. Google ScholarDigital Library
Kohler, E., Morris, R., Chen, B., Jannotti, J., and Kaashoek, M. F. 2000. The Click modular router. ACM Trans. Comput. Syst. 18, 3, 263--297. Google ScholarDigital Library
Kokku, R., Riché, T., Kunze, A., Mudigonda, J., Jason, J., and Vin, H. 2003. A case for run-time adaptation in packet processing systems. In Proceedings of the 2nd Workshop on Hot Topics in Networks (HOTNETSII). Cambridge, MA.Google Scholar
Kwok, Y.-K. and Ahmad, I. 1999. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv. 31, 4, 406--471. Google ScholarDigital Library
Lakamraju, V., Koren, I., and Krishna, C. M. 2002. Filtering random networks to synthesize interconnection networks with multiple objectives. IEEE Trans. Parall. Distrib. Syst.13, 11, 1139--1149. Google ScholarDigital Library
Malloy, B. A., Lloyd, E. L., and Souffa, M. L. 1994. Scheduling DAG's for asynchronous multiprocessor execution. IEEE Trans. Parall. Distrib. Syst. 5, 5, 498--508. Google ScholarDigital Library
Motwani, R. and Raghavan, P. 1995. Randomized Algorithms. Cambridge University Press, Cambridge, UK. Google ScholarDigital Library
Nilsson, S. and Karlsson, G. 1999. IP-address lookup using LC-tries. IEEE J. Sel. Areas Comm. 17, 6, 1083--1092. Google ScholarDigital Library
Ramaswamy, R., Weng, N., and Wolf, T. 2004. Application analysis and resource mapping for heterogeneous network processor architectures. In Proceedings of the 3rd Workshop on Network Processors and Applications (NP-3) in Conjunction with the 10th International Symposium on High Performance Computer Architecture (HPCA-10). ACM, New York, 103--119.Google Scholar
Ramaswamy, R., Weng, N., and Wolf, T. 2005. Analysis of network processing workloads. In Proceedings of the of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, Los Alamitos, CA, 226--235. Google ScholarDigital Library
Ramaswamy, R. and Wolf, T. 2003. PacketBench: A tool for workload characterization of network processing. In Proceedings of the IEEE 6th Annual Workshop on Workload Characterization (WWC-6). IEEE, Los Alamitos, CA, 42--50.Google Scholar
Reijns, G. L. and van Gemund, A. J. C. 1999. Analysis of a shared-memory multiprocessor via a novel queuing model. J. Syst. Architect. 45, 14, 1189--1193. Google ScholarDigital Library
Shah, N., Plishker, W., and Keutzer, K. 2003. NP-Click: A programming model for the intel IXP1200. In Proceedings of the 2nd Network Processor Workshop (NP-2) in Conjunction with 9th International Symposium on High-Performance Computer Architecture (HPCA-9). ACM, New York, 100--111.Google Scholar
Taylor, M. B., Kim, J., Miller, J., Wentzlaff, D., Ghodrat, F., Greenwald, B., Hoffman, H., Lee, J.-W., Johnson, P., et al. 2002. The Raw microprocessor: A computational fabric for software circuits and general purpose programs. IEEE Micro 22, 2, 25--35. Google ScholarDigital Library
Teja Technologies. 2003. TejaNP datasheet. Teja Technologies. http://www.teja.com.Google Scholar
Thiele, L., Chakraborty, S., Gries, M., and Künzli, S. 2002. Design space exploration of network processor architectures. In Proceedings of the 1st Network Processor Workshop (NP-1) in Conjunction with the 8th International Symposium on High-Performance Computer Architecture (HPCA-8). ACM, New York, 30--41.Google Scholar
van Gemund, A. J. C. 1993. Performances prediction of parallel processing systems: The Pamela methodology. In Proceedings of the 7th ACM International Conference on Supercomputing. ACM, New York, 318--327. Google ScholarDigital Library
Wei, Y.-C. and Cheng, C.-K. 1991. Ratio cut partitioning for hierarchical designs. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 10, 7, 911--921.Google ScholarDigital Library
Wolf, T. and Franklin, M. A. 2000. CommBench -- a telecommunications benchmark for network processors. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, Los Alamitos, CA, 154--162. Google ScholarDigital Library
Wolf, T., Weng, N., and Tai, C.-H. 2005. Design considerations for network processor operating systems. In Proceedings of the ACM/IEEE Symposium on Architectures for Networking and Communication Systems (ANCS). ACM, New York, 71--80. Google ScholarDigital Library

Index Terms

Analytic modeling of network processors for parallel workload mapping

Recommendations

Profiling and mapping of parallel workloads on network processors
SAC '05: Proceedings of the 2005 ACM symposium on Applied computing

Network processors are embedded system-on-a-chip multiprocessors that are optimized to perform simple packet processing tasks at data rates of several Gigabits per second. To meet the performance demands of increasing link speeds and more complex ...
Read More
Evaluating Network Processors using NetBench

The Network Processor market is one of the fastest growing segments of the microprocessor industry today. In spite of this increasing market importance, there does not exist a common framework to compare the performance of different Network Processor ...
Read More
Program mapping onto network processors by recursive bipartitioning and refining
DAC '07: Proceedings of the 44th annual Design Automation Conference

Mapping packet processing applications onto embedded network processors (NP) is a challenging task due to the unique constraints of NP systems and the characteristics of network application domains. A remarkable difference with general multiprocessor ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Embedded Computing Systems Volume 8, Issue 3
April 2009
239 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/1509288
Issue’s Table of Contents

Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States

Journal Family
ACM Journals for the Design of Smart and Connected Systems
Publication History
- Published: 22 April 2009
- Accepted: 1 July 2006
- Revised: 1 May 2006
- Received: 1 August 2005
Published in tecs Volume 8, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Application profiling
embedded systems
multiprocessor scheduling
network processors
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 535
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Analytic modeling of network processors for parallel workload mapping

ACM Transactions on Embedded Computing Systems

Abstract

References

Cited By

Index Terms

Recommendations

Profiling and mapping of parallel workloads on network processors

Evaluating Network Processors using NetBench

Program mapping onto network processors by recursive bipartitioning and refining

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Analytic modeling of network processors for parallel workload mapping

ACM Transactions on Embedded Computing Systems

Abstract

References

Cited By

Index Terms

Recommendations

Profiling and mapping of parallel workloads on network processors

Evaluating Network Processors using NetBench

Program mapping onto network processors by recursive bipartitioning and refining

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media