Skip to main content

Characterizing the Sharing Behavior of Applications Using Software Transactional Memory

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12614))

Abstract

Software Transactional Memory (STM) is an alternative abstraction for process synchronization in parallel programming. It is often easier to use than locks, avoiding issues such as deadlocks. In order to improve STM performance, many studies have been made on transactional schedulers. However, in current architectures with complex memories hierarchies, it is also important to map threads in such a way that threads that share data are executed close to each other in the memory hierarchy, such that they can access data protected by STM faster. For a successful thread mapping of an STM application, it is important to perform an in-depth analysis of its sharing behavior to determine its suitability for different mapping policies and the expected performance gains. This paper characterizes the sharing behavior of the STAMP benchmark suite by using information extracted from the STM runtime, providing information to guide thread mapping based on their sharing behavior. Our main findings are that most of the STAMP applications are suitable for a static thread mapping approach to improve the performance since (1) the applications do not present dynamic behavior and (2) the sharing pattern does not change between executions. Furthermore, we show that sharing information gathered from the STM runtime can be used to analyze and reduce false sharing in TM applications.

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001 and PROCAD/LEAPaD.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Amslinger, R., Piatka, C., Haas, F., Weis, S., Ungerer, T., Altmeyer, S.: Hardware multiversioning for fail-operational multithreaded applications. In: 2020 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 20–27. IEEE CS, September 2020. https://doi.org/10.1109/SBAC-PAD49847.2020.00014

  2. Baldassin, A., Borin, E., Araujo, G.: Performance implications of dynamic memory allocators on transactional memory systems. In: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015, pp. 87–96. Association for Computing Machinery, New York (2015). https://doi.org/10.1145/2688500.2688504

  3. Barrow-Williams, N., Fensch, C., Moore, S.: A communication characterisation of Splash-2 and Parsec. In: 2009 IEEE International Symposium on Workload Characterization (IISWC), pp. 86–97 (2009). https://doi.org/10.1109/IISWC.2009.5306792

  4. Bordage, C., Jeannot, E.: Process affinity, metrics and impact on performance: an empirical study. In: Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. CCGrid 2018, pp. 523–532. IEEE Press (2018). https://doi.org/10.1109/CCGRID.2018.00079

  5. Bylina, B., Bylina, J.: OpenMP thread affinity for matrix factorization on multicore systems. In: 2017 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 489–492 (2017). https://doi.org/10.15439/2017F231

  6. de Carvalho, J.P.L., Honorio, B.C., Baldassin, A., Araujo, G.: Improving transactional code generation via variable annotation and barrier elision. In: 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). pp. 1008–1017 (2020). https://doi.org/10.1109/IPDPS47924.2020.00107

  7. Castro, M., Georgiev, K., Marangozova-Martin, V., Méhaut, J., Fernandes, L.G., Santana, M.: Analysis and tracing of applications based on software transactional memory on multicore architectures. In: 2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing, pp. 199–206 (2011). https://doi.org/10.1109/PDP.2011.27

  8. Castro, M., Góes, L.F.W., Méhaut, J.F.: Adaptive thread mapping strategies for transactional memory applications. J. Parallel Distrib. Comput. 74(9), 2845–2859 (2014). https://doi.org/10.1016/j.jpdc.2014.05.008

    Article  Google Scholar 

  9. Chan, K., Lam, K.T., Wang, C.L.: Cache affinity optimization techniques for scaling software transactional memory systems on multi-CMP architectures. In: 14th Internationl Symposium on Parallel and Distributed Computing, pp. 56–65. IEEE CS, June 2015. https://doi.org/10.1109/ISPDC.2015.14

  10. Chen, D.D., Gibbons, P.B., Mowry, T.C.: Tardis, TM.: Incremental repair for transactional memory. In: Proceedings of the Eleventh International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2020. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3380536.3380538

  11. Cruz, E.H.M., Diener, M., Navaux, P.O.A.: Thread and Data Mapping for Multicore Systems. SCS. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91074-1

    Book  Google Scholar 

  12. Cruz, E.H.M., Diener, M., Pilla, L.L., Navaux, P.O.A.: EagerMap: a task mapping algorithm to improve communication and load balancing in clusters of multicore systems. ACM Trans. Parallel Comput. 5(4) (Mar 2019). https://doi.org/10.1145/3309711

  13. Di Sanzo, P.: Analysis, classification and comparison of scheduling techniques for software transactional memories. IEEE Trans. Parallel Distrib. Syst. 28(12), 3356–3373 (2017). https://doi.org/10.1109/tpds.2017.2740285

    Article  Google Scholar 

  14. Di Sanzo, P., Pellegrini, A., Sannicandro, M., Ciciani, B., Quaglia, F.: Adaptive model-based scheduling in software transactional memory. IEEE Trans. Comput. 69(5), 621–632 (2020). https://doi.org/10.1109/tc.2019.2954139

    Article  MATH  Google Scholar 

  15. Diener, M., Cruz, E.H.M., Alves, M.A.Z., Navaux, P.O.A.: Communication in shared memory: Concepts, definitions, and efficient detection. In: 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 151–158, February 2016. https://doi.org/10.1109/PDP.2016.16

  16. Diener, M., Cruz, E.H., Pilla, L.L., Dupros, F., Navaux, P.O.: Characterizing communication and page usage of parallel applications for thread and data mapping. Performance Evaluation 88–89, 18–36 (2015). https://doi.org/10.1016/j.peva.2015.03.001

    Article  Google Scholar 

  17. Felber, P., Fetzer, C., Riegel, T.: Dynamic performance tuning of word-based software transactional memory. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2008, pp. 237–246. ACM, New York (2008). https://doi.org/10.1145/1345206.1345241

  18. Felber, P., Fetzer, C., Riegel, T., Marlier, P.: Time-based software transactional memory. IEEE Trans. Parallel Distrib. Syst. 21, 1793–1807 (2010). https://doi.org/10.1109/TPDS.2010.49

    Article  Google Scholar 

  19. Gaud, F., et al.: Challenges of memory management on modern NUMA systems. Commun. ACM 58(12), 59–66 (2015). https://doi.org/10.1145/2814328

    Article  Google Scholar 

  20. Góes, L.F.W., Ribeiro, C.P., Castro, M., Méhaut, J.-F., Cole, M., Cintra, M.: Automatic skeleton-driven memory affinity for transactional worklist applications. Int. J. Parallel Programm. 42(2), 365–382 (2013). https://doi.org/10.1007/s10766-013-0253-x

    Article  Google Scholar 

  21. Grahn, H.: Transactional memory. J. Parallel Distrib. Comput. 70(10), 993–1008 (2010). https://doi.org/10.1016/j.jpdc.2010.06.006

    Article  MATH  Google Scholar 

  22. Guerraoui, R., Herlihy, M., Pochon, B.: Towards a theory of transactional contention managers. In: Proceedings of the Twenty-fifth Annual ACM Symposium on Principles of Distributed Computing, PODC 2006, pp. 316–317. ACM, New York (2006). https://doi.org/10.1145/1146381.1146429

  23. Gustedt, J., Jeannot, E., Mansouri, F.: Automatic, abstracted and portable topology-aware thread placement. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 389–399 (2017). https://doi.org/10.1109/CLUSTER.2017.71

  24. Harris, T., Larus, J., Rajwar, R.: Transactional Memory, vol. 2. Morgan and Claypool Publishers, San Rafae (2010). https://doi.org/10.2200/S00272ED1V01Y201006CAC011

    Book  Google Scholar 

  25. Hughes, C., Poe, J., Qouneh, A., Li, T.: On the (dis)similarity of transactional memory workloads. In: 2009 IEEE International Symposium on Workload Characterization (IISWC), pp. 108–117 (2009). https://doi.org/10.1109/IISWC.2009.5306790

  26. Jeannot, E.: TopoMatch: Process mapping algorithms and tools for general topologies (2020). https://gitlab.inria.fr/ejeannot/topomatch. Accessed 20 July 2020

  27. Jeannot, E., Mercier, G., Tessier, F.: Process placement in multicore clusters: algorithmic issues and practical techniques. IEEE Trans. Parallel Distrib. Syst. 25(4), 993–1002 (2014). https://doi.org/10.1109/TPDS.2013.104

    Article  Google Scholar 

  28. Khaleghzadeh, H., Deldari, H., Reddy, R., Lastovetsky, A.: Hierarchical multicore thread mapping via estimation of remote communication. J. Supercomput. 74(3), 1321–1340 (2017). https://doi.org/10.1007/s11227-017-2176-6

    Article  Google Scholar 

  29. Luk, C.K., et al.: Pin: Building customized program analysis tools with dynamic instrumentation. In: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 190–200. ACM, New York (2005). https://doi.org/10.1145/1065010.1065034

  30. Majo, Z., Gross, T.R.: Memory system performance in a NUMA multicore multiprocessor. In: Proceedings of the 4th Annual International Conference on Systems and Storage, SYSTOR 2011, pp. 12:1–12:10. ACM, New York (2011). https://doi.org/10.1145/1987816.1987832

  31. Mazaheri, A., Wolf, F., Jannesari, A.: Unveiling thread communication bottlenecks using hardware-independent metrics. In: Proceedings of the 47th International Conference on Parallel Processing. ICPP 2018. ACM, New York (2018). https://doi.org/10.1145/3225058.3225142

  32. Minh, C.C., Chung, J., Kozyrakis, C., Olukotun, K.: STAMP: stanford transactional applications for multi-processing. In: IEEE International Symposium on Workload Characterization. pp. 35–46. IEEE CS, September 2008. https://doi.org/10.1109/IISWC.2008.4636089

  33. Mohammed, M.S., Abandah, G.A.: Communication characteristics of parallel shared-memory multicore applications. In: 2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT), pp. 1–6 (2015). https://doi.org/10.1109/AEECT.2015.7360553

  34. Mururu, G., Gavrilovska, A., Pande, S.: Quantifying and reducing execution variance in STM via model driven commit optimization. In: 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 109–121 (2019). https://doi.org/10.1109/CGO.2019.8661179

  35. Pasqualin, D.P., Diener, M., Du Bois, A.R., Pilla, M.L.: Online sharing-aware thread mapping in software transactional memory. In: 2020 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 35–42. IEEE CS, September 2020. https://doi.org/10.1109/SBAC-PAD49847.2020.00016

  36. Pasqualin, D.P., Diener, M., Du Bois, A.R., Pilla, M.L.: Thread affinity in software transactional memory. In: 2020 19th International Symposium on Parallel and Distributed Computing (ISPDC), pp. 180–187. IEEE CS, July 2020. https://doi.org/10.1109/ISPDC51135.2020.00033

  37. Pellegrini, F.: Static mapping by dual recursive bipartitioning of process architecture graphs. In: Proceedings of IEEE Scalable High Performance Computing Conference, pp. 486–493 (1994). https://doi.org/10.1109/SHPCC.1994.296682

  38. Poudel, P., Sharma, G.: Adaptive versioning in transactional memories. In: Ghaffari, M., Nesterenko, M., Tixeuil, S., Tucci, S., Yamauchi, Y. (eds.) Stabilization, Safety, and Security of Distributed Systems. pp. 277–295. Springer International Publishing, Cham (2019). https://doi.org/10.1007/978-3-030-34992-9_22

  39. Rane, A., Browne, J.: Performance optimization of data structures using memory access characterization. In: 2011 IEEE International Conference on Cluster Computing, pp. 570–574 (2011). https://doi.org/10.1109/CLUSTER.2011.77

  40. Sasongko, M.A., Chabbi, M., Akhtar, P., Unat, D.: ComDetective: a lightweight communication detection tool for threads. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC 2019 ACM, New York (2019). https://doi.org/10.1145/3295500.3356214

  41. Soomro, P.N., Sasongko, M.A., Unat, D.: BindMe: A thread binding library with advanced mapping algorithms. Concurr. Comput. Pract. Exp. 30(21), e4692 (2018). https://doi.org/10.1002/cpe.4692

    Article  Google Scholar 

  42. Stirb, I.: NUMA-BTDM: A thread mapping algorithm for balanced data locality on NUMA systems. In: 2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), pp. 317–320 (2016). https://doi.org/10.1109/PDCAT.2016.074

  43. Waliullah, M.M., Stenstrom, P.: Removal of conflicts in hardware transactional memory systems. Int. J. Parallel Program. 42(1), 198–218 (2012). https://doi.org/10.1007/s10766-012-0210-0

    Article  Google Scholar 

  44. Wang, Z., Bovik, A.C.: Mean squared error: Love it or leave it? a new look at signal fidelity measures. IEEE Signal Process. Mag. 26(1), 98–117 (2009). https://doi.org/10.1109/MSP.2008.930649

    Article  Google Scholar 

  45. Yu, Z., Zuo, Yu., Zhao, Y.: Convoider: a concurrency bug avoider based on transparent software transactional memory. Int. J. Parallel Program. 48(1), 32–60 (2019). https://doi.org/10.1007/s10766-019-00642-1

    Article  Google Scholar 

  46. Zhou, N., Delaval, G., Robu, B., Rutten, E., Méhaut, J.F.: An autonomic-computing approach on mapping threads to multi-cores for software transactional memory. Concurr. Comput. Pract. Exp. 30(18), e4506 (2018). https://doi.org/10.1002/cpe.4506

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Douglas Pereira Pasqualin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pasqualin, D.P., Diener, M., Du Bois, A.R., Pilla, M.L. (2021). Characterizing the Sharing Behavior of Applications Using Software Transactional Memory. In: Wolf, F., Gao, W. (eds) Benchmarking, Measuring, and Optimizing. Bench 2020. Lecture Notes in Computer Science(), vol 12614. Springer, Cham. https://doi.org/10.1007/978-3-030-71058-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-71058-3_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-71057-6

  • Online ISBN: 978-3-030-71058-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics