Characterizing the Sharing Behavior of Applications Using Software Transactional Memory

Pasqualin, Douglas Pereira; Diener, Matthias; Du Bois, André Rauber; Pilla, Maurício Lima

doi:10.1007/978-3-030-71058-3_1

Characterizing the Sharing Behavior of Applications Using Software Transactional Memory

Conference paper
First Online: 02 March 2021

906 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12614))

Abstract

Software Transactional Memory (STM) is an alternative abstraction for process synchronization in parallel programming. It is often easier to use than locks, avoiding issues such as deadlocks. In order to improve STM performance, many studies have been made on transactional schedulers. However, in current architectures with complex memories hierarchies, it is also important to map threads in such a way that threads that share data are executed close to each other in the memory hierarchy, such that they can access data protected by STM faster. For a successful thread mapping of an STM application, it is important to perform an in-depth analysis of its sharing behavior to determine its suitability for different mapping policies and the expected performance gains. This paper characterizes the sharing behavior of the STAMP benchmark suite by using information extracted from the STM runtime, providing information to guide thread mapping based on their sharing behavior. Our main findings are that most of the STAMP applications are suitable for a static thread mapping approach to improve the performance since (1) the applications do not present dynamic behavior and (2) the sharing pattern does not change between executions. Furthermore, we show that sharing information gathered from the STM runtime can be used to analyze and reduce false sharing in TM applications.

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001 and PROCAD/LEAPaD.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Amslinger, R., Piatka, C., Haas, F., Weis, S., Ungerer, T., Altmeyer, S.: Hardware multiversioning for fail-operational multithreaded applications. In: 2020 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 20–27. IEEE CS, September 2020. https://doi.org/10.1109/SBAC-PAD49847.2020.00014
Baldassin, A., Borin, E., Araujo, G.: Performance implications of dynamic memory allocators on transactional memory systems. In: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015, pp. 87–96. Association for Computing Machinery, New York (2015). https://doi.org/10.1145/2688500.2688504
Barrow-Williams, N., Fensch, C., Moore, S.: A communication characterisation of Splash-2 and Parsec. In: 2009 IEEE International Symposium on Workload Characterization (IISWC), pp. 86–97 (2009). https://doi.org/10.1109/IISWC.2009.5306792
Bordage, C., Jeannot, E.: Process affinity, metrics and impact on performance: an empirical study. In: Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. CCGrid 2018, pp. 523–532. IEEE Press (2018). https://doi.org/10.1109/CCGRID.2018.00079
Bylina, B., Bylina, J.: OpenMP thread affinity for matrix factorization on multicore systems. In: 2017 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 489–492 (2017). https://doi.org/10.15439/2017F231
de Carvalho, J.P.L., Honorio, B.C., Baldassin, A., Araujo, G.: Improving transactional code generation via variable annotation and barrier elision. In: 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). pp. 1008–1017 (2020). https://doi.org/10.1109/IPDPS47924.2020.00107
Castro, M., Georgiev, K., Marangozova-Martin, V., Méhaut, J., Fernandes, L.G., Santana, M.: Analysis and tracing of applications based on software transactional memory on multicore architectures. In: 2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing, pp. 199–206 (2011). https://doi.org/10.1109/PDP.2011.27
Castro, M., Góes, L.F.W., Méhaut, J.F.: Adaptive thread mapping strategies for transactional memory applications. J. Parallel Distrib. Comput. 74(9), 2845–2859 (2014). https://doi.org/10.1016/j.jpdc.2014.05.008
Article Google Scholar
Chan, K., Lam, K.T., Wang, C.L.: Cache affinity optimization techniques for scaling software transactional memory systems on multi-CMP architectures. In: 14th Internationl Symposium on Parallel and Distributed Computing, pp. 56–65. IEEE CS, June 2015. https://doi.org/10.1109/ISPDC.2015.14
Chen, D.D., Gibbons, P.B., Mowry, T.C.: Tardis, TM.: Incremental repair for transactional memory. In: Proceedings of the Eleventh International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2020. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3380536.3380538
Cruz, E.H.M., Diener, M., Navaux, P.O.A.: Thread and Data Mapping for Multicore Systems. SCS. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91074-1
Book Google Scholar
Cruz, E.H.M., Diener, M., Pilla, L.L., Navaux, P.O.A.: EagerMap: a task mapping algorithm to improve communication and load balancing in clusters of multicore systems. ACM Trans. Parallel Comput. 5(4) (Mar 2019). https://doi.org/10.1145/3309711
Di Sanzo, P.: Analysis, classification and comparison of scheduling techniques for software transactional memories. IEEE Trans. Parallel Distrib. Syst. 28(12), 3356–3373 (2017). https://doi.org/10.1109/tpds.2017.2740285
Article Google Scholar
Di Sanzo, P., Pellegrini, A., Sannicandro, M., Ciciani, B., Quaglia, F.: Adaptive model-based scheduling in software transactional memory. IEEE Trans. Comput. 69(5), 621–632 (2020). https://doi.org/10.1109/tc.2019.2954139
Article MATH Google Scholar
Diener, M., Cruz, E.H.M., Alves, M.A.Z., Navaux, P.O.A.: Communication in shared memory: Concepts, definitions, and efficient detection. In: 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 151–158, February 2016. https://doi.org/10.1109/PDP.2016.16
Diener, M., Cruz, E.H., Pilla, L.L., Dupros, F., Navaux, P.O.: Characterizing communication and page usage of parallel applications for thread and data mapping. Performance Evaluation 88–89, 18–36 (2015). https://doi.org/10.1016/j.peva.2015.03.001
Article Google Scholar
Felber, P., Fetzer, C., Riegel, T.: Dynamic performance tuning of word-based software transactional memory. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2008, pp. 237–246. ACM, New York (2008). https://doi.org/10.1145/1345206.1345241
Felber, P., Fetzer, C., Riegel, T., Marlier, P.: Time-based software transactional memory. IEEE Trans. Parallel Distrib. Syst. 21, 1793–1807 (2010). https://doi.org/10.1109/TPDS.2010.49
Article Google Scholar
Gaud, F., et al.: Challenges of memory management on modern NUMA systems. Commun. ACM 58(12), 59–66 (2015). https://doi.org/10.1145/2814328
Article Google Scholar
Góes, L.F.W., Ribeiro, C.P., Castro, M., Méhaut, J.-F., Cole, M., Cintra, M.: Automatic skeleton-driven memory affinity for transactional worklist applications. Int. J. Parallel Programm. 42(2), 365–382 (2013). https://doi.org/10.1007/s10766-013-0253-x
Article Google Scholar
Grahn, H.: Transactional memory. J. Parallel Distrib. Comput. 70(10), 993–1008 (2010). https://doi.org/10.1016/j.jpdc.2010.06.006
Article MATH Google Scholar
Guerraoui, R., Herlihy, M., Pochon, B.: Towards a theory of transactional contention managers. In: Proceedings of the Twenty-fifth Annual ACM Symposium on Principles of Distributed Computing, PODC 2006, pp. 316–317. ACM, New York (2006). https://doi.org/10.1145/1146381.1146429
Gustedt, J., Jeannot, E., Mansouri, F.: Automatic, abstracted and portable topology-aware thread placement. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 389–399 (2017). https://doi.org/10.1109/CLUSTER.2017.71
Harris, T., Larus, J., Rajwar, R.: Transactional Memory, vol. 2. Morgan and Claypool Publishers, San Rafae (2010). https://doi.org/10.2200/S00272ED1V01Y201006CAC011
Book Google Scholar
Hughes, C., Poe, J., Qouneh, A., Li, T.: On the (dis)similarity of transactional memory workloads. In: 2009 IEEE International Symposium on Workload Characterization (IISWC), pp. 108–117 (2009). https://doi.org/10.1109/IISWC.2009.5306790
Jeannot, E.: TopoMatch: Process mapping algorithms and tools for general topologies (2020). https://gitlab.inria.fr/ejeannot/topomatch. Accessed 20 July 2020
Jeannot, E., Mercier, G., Tessier, F.: Process placement in multicore clusters: algorithmic issues and practical techniques. IEEE Trans. Parallel Distrib. Syst. 25(4), 993–1002 (2014). https://doi.org/10.1109/TPDS.2013.104
Article Google Scholar
Khaleghzadeh, H., Deldari, H., Reddy, R., Lastovetsky, A.: Hierarchical multicore thread mapping via estimation of remote communication. J. Supercomput. 74(3), 1321–1340 (2017). https://doi.org/10.1007/s11227-017-2176-6
Article Google Scholar
Luk, C.K., et al.: Pin: Building customized program analysis tools with dynamic instrumentation. In: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 190–200. ACM, New York (2005). https://doi.org/10.1145/1065010.1065034
Majo, Z., Gross, T.R.: Memory system performance in a NUMA multicore multiprocessor. In: Proceedings of the 4th Annual International Conference on Systems and Storage, SYSTOR 2011, pp. 12:1–12:10. ACM, New York (2011). https://doi.org/10.1145/1987816.1987832
Mazaheri, A., Wolf, F., Jannesari, A.: Unveiling thread communication bottlenecks using hardware-independent metrics. In: Proceedings of the 47th International Conference on Parallel Processing. ICPP 2018. ACM, New York (2018). https://doi.org/10.1145/3225058.3225142
Minh, C.C., Chung, J., Kozyrakis, C., Olukotun, K.: STAMP: stanford transactional applications for multi-processing. In: IEEE International Symposium on Workload Characterization. pp. 35–46. IEEE CS, September 2008. https://doi.org/10.1109/IISWC.2008.4636089
Mohammed, M.S., Abandah, G.A.: Communication characteristics of parallel shared-memory multicore applications. In: 2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT), pp. 1–6 (2015). https://doi.org/10.1109/AEECT.2015.7360553
Mururu, G., Gavrilovska, A., Pande, S.: Quantifying and reducing execution variance in STM via model driven commit optimization. In: 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 109–121 (2019). https://doi.org/10.1109/CGO.2019.8661179
Pasqualin, D.P., Diener, M., Du Bois, A.R., Pilla, M.L.: Online sharing-aware thread mapping in software transactional memory. In: 2020 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 35–42. IEEE CS, September 2020. https://doi.org/10.1109/SBAC-PAD49847.2020.00016
Pasqualin, D.P., Diener, M., Du Bois, A.R., Pilla, M.L.: Thread affinity in software transactional memory. In: 2020 19th International Symposium on Parallel and Distributed Computing (ISPDC), pp. 180–187. IEEE CS, July 2020. https://doi.org/10.1109/ISPDC51135.2020.00033
Pellegrini, F.: Static mapping by dual recursive bipartitioning of process architecture graphs. In: Proceedings of IEEE Scalable High Performance Computing Conference, pp. 486–493 (1994). https://doi.org/10.1109/SHPCC.1994.296682
Poudel, P., Sharma, G.: Adaptive versioning in transactional memories. In: Ghaffari, M., Nesterenko, M., Tixeuil, S., Tucci, S., Yamauchi, Y. (eds.) Stabilization, Safety, and Security of Distributed Systems. pp. 277–295. Springer International Publishing, Cham (2019). https://doi.org/10.1007/978-3-030-34992-9_22
Rane, A., Browne, J.: Performance optimization of data structures using memory access characterization. In: 2011 IEEE International Conference on Cluster Computing, pp. 570–574 (2011). https://doi.org/10.1109/CLUSTER.2011.77
Sasongko, M.A., Chabbi, M., Akhtar, P., Unat, D.: ComDetective: a lightweight communication detection tool for threads. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC 2019 ACM, New York (2019). https://doi.org/10.1145/3295500.3356214
Soomro, P.N., Sasongko, M.A., Unat, D.: BindMe: A thread binding library with advanced mapping algorithms. Concurr. Comput. Pract. Exp. 30(21), e4692 (2018). https://doi.org/10.1002/cpe.4692
Article Google Scholar
Stirb, I.: NUMA-BTDM: A thread mapping algorithm for balanced data locality on NUMA systems. In: 2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), pp. 317–320 (2016). https://doi.org/10.1109/PDCAT.2016.074
Waliullah, M.M., Stenstrom, P.: Removal of conflicts in hardware transactional memory systems. Int. J. Parallel Program. 42(1), 198–218 (2012). https://doi.org/10.1007/s10766-012-0210-0
Article Google Scholar
Wang, Z., Bovik, A.C.: Mean squared error: Love it or leave it? a new look at signal fidelity measures. IEEE Signal Process. Mag. 26(1), 98–117 (2009). https://doi.org/10.1109/MSP.2008.930649
Article Google Scholar
Yu, Z., Zuo, Yu., Zhao, Y.: Convoider: a concurrency bug avoider based on transparent software transactional memory. Int. J. Parallel Program. 48(1), 32–60 (2019). https://doi.org/10.1007/s10766-019-00642-1
Article Google Scholar
Zhou, N., Delaval, G., Robu, B., Rutten, E., Méhaut, J.F.: An autonomic-computing approach on mapping threads to multi-cores for software transactional memory. Concurr. Comput. Pract. Exp. 30(18), e4506 (2018). https://doi.org/10.1002/cpe.4506
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Graduate Program (PPGC), Universidade Federal de Pelotas, Pelotas, RS, Brazil
Douglas Pereira Pasqualin, André Rauber Du Bois & Maurício Lima Pilla
University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
Matthias Diener
Google LLC, Sunnyvale, CA, 94089, USA
Maurício Lima Pilla

Authors

Douglas Pereira Pasqualin
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Diener
View author publications
You can also search for this author in PubMed Google Scholar
André Rauber Du Bois
View author publications
You can also search for this author in PubMed Google Scholar
Maurício Lima Pilla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Douglas Pereira Pasqualin .

Editor information

Editors and Affiliations

Department of Computer Science, Technical University of Darmstadt, Darmstadt, Germany
Felix Wolf
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Wanling Gao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pasqualin, D.P., Diener, M., Du Bois, A.R., Pilla, M.L. (2021). Characterizing the Sharing Behavior of Applications Using Software Transactional Memory. In: Wolf, F., Gao, W. (eds) Benchmarking, Measuring, and Optimizing. Bench 2020. Lecture Notes in Computer Science(), vol 12614. Springer, Cham. https://doi.org/10.1007/978-3-030-71058-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-71058-3_1
Published: 02 March 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71057-6
Online ISBN: 978-3-030-71058-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics