Abstract
Software Transactional Memory (STM) is an alternative abstraction for process synchronization in parallel programming. It is often easier to use than locks, avoiding issues such as deadlocks. In order to improve STM performance, many studies have been made on transactional schedulers. However, in current architectures with complex memories hierarchies, it is also important to map threads in such a way that threads that share data are executed close to each other in the memory hierarchy, such that they can access data protected by STM faster. For a successful thread mapping of an STM application, it is important to perform an in-depth analysis of its sharing behavior to determine its suitability for different mapping policies and the expected performance gains. This paper characterizes the sharing behavior of the STAMP benchmark suite by using information extracted from the STM runtime, providing information to guide thread mapping based on their sharing behavior. Our main findings are that most of the STAMP applications are suitable for a static thread mapping approach to improve the performance since (1) the applications do not present dynamic behavior and (2) the sharing pattern does not change between executions. Furthermore, we show that sharing information gathered from the STM runtime can be used to analyze and reduce false sharing in TM applications.
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001 and PROCAD/LEAPaD.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Amslinger, R., Piatka, C., Haas, F., Weis, S., Ungerer, T., Altmeyer, S.: Hardware multiversioning for fail-operational multithreaded applications. In: 2020 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 20–27. IEEE CS, September 2020. https://doi.org/10.1109/SBAC-PAD49847.2020.00014
Baldassin, A., Borin, E., Araujo, G.: Performance implications of dynamic memory allocators on transactional memory systems. In: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015, pp. 87–96. Association for Computing Machinery, New York (2015). https://doi.org/10.1145/2688500.2688504
Barrow-Williams, N., Fensch, C., Moore, S.: A communication characterisation of Splash-2 and Parsec. In: 2009 IEEE International Symposium on Workload Characterization (IISWC), pp. 86–97 (2009). https://doi.org/10.1109/IISWC.2009.5306792
Bordage, C., Jeannot, E.: Process affinity, metrics and impact on performance: an empirical study. In: Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. CCGrid 2018, pp. 523–532. IEEE Press (2018). https://doi.org/10.1109/CCGRID.2018.00079
Bylina, B., Bylina, J.: OpenMP thread affinity for matrix factorization on multicore systems. In: 2017 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 489–492 (2017). https://doi.org/10.15439/2017F231
de Carvalho, J.P.L., Honorio, B.C., Baldassin, A., Araujo, G.: Improving transactional code generation via variable annotation and barrier elision. In: 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). pp. 1008–1017 (2020). https://doi.org/10.1109/IPDPS47924.2020.00107
Castro, M., Georgiev, K., Marangozova-Martin, V., Méhaut, J., Fernandes, L.G., Santana, M.: Analysis and tracing of applications based on software transactional memory on multicore architectures. In: 2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing, pp. 199–206 (2011). https://doi.org/10.1109/PDP.2011.27
Castro, M., Góes, L.F.W., Méhaut, J.F.: Adaptive thread mapping strategies for transactional memory applications. J. Parallel Distrib. Comput. 74(9), 2845–2859 (2014). https://doi.org/10.1016/j.jpdc.2014.05.008
Chan, K., Lam, K.T., Wang, C.L.: Cache affinity optimization techniques for scaling software transactional memory systems on multi-CMP architectures. In: 14th Internationl Symposium on Parallel and Distributed Computing, pp. 56–65. IEEE CS, June 2015. https://doi.org/10.1109/ISPDC.2015.14
Chen, D.D., Gibbons, P.B., Mowry, T.C.: Tardis, TM.: Incremental repair for transactional memory. In: Proceedings of the Eleventh International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2020. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3380536.3380538
Cruz, E.H.M., Diener, M., Navaux, P.O.A.: Thread and Data Mapping for Multicore Systems. SCS. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91074-1
Cruz, E.H.M., Diener, M., Pilla, L.L., Navaux, P.O.A.: EagerMap: a task mapping algorithm to improve communication and load balancing in clusters of multicore systems. ACM Trans. Parallel Comput. 5(4) (Mar 2019). https://doi.org/10.1145/3309711
Di Sanzo, P.: Analysis, classification and comparison of scheduling techniques for software transactional memories. IEEE Trans. Parallel Distrib. Syst. 28(12), 3356–3373 (2017). https://doi.org/10.1109/tpds.2017.2740285
Di Sanzo, P., Pellegrini, A., Sannicandro, M., Ciciani, B., Quaglia, F.: Adaptive model-based scheduling in software transactional memory. IEEE Trans. Comput. 69(5), 621–632 (2020). https://doi.org/10.1109/tc.2019.2954139
Diener, M., Cruz, E.H.M., Alves, M.A.Z., Navaux, P.O.A.: Communication in shared memory: Concepts, definitions, and efficient detection. In: 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 151–158, February 2016. https://doi.org/10.1109/PDP.2016.16
Diener, M., Cruz, E.H., Pilla, L.L., Dupros, F., Navaux, P.O.: Characterizing communication and page usage of parallel applications for thread and data mapping. Performance Evaluation 88–89, 18–36 (2015). https://doi.org/10.1016/j.peva.2015.03.001
Felber, P., Fetzer, C., Riegel, T.: Dynamic performance tuning of word-based software transactional memory. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2008, pp. 237–246. ACM, New York (2008). https://doi.org/10.1145/1345206.1345241
Felber, P., Fetzer, C., Riegel, T., Marlier, P.: Time-based software transactional memory. IEEE Trans. Parallel Distrib. Syst. 21, 1793–1807 (2010). https://doi.org/10.1109/TPDS.2010.49
Gaud, F., et al.: Challenges of memory management on modern NUMA systems. Commun. ACM 58(12), 59–66 (2015). https://doi.org/10.1145/2814328
Góes, L.F.W., Ribeiro, C.P., Castro, M., Méhaut, J.-F., Cole, M., Cintra, M.: Automatic skeleton-driven memory affinity for transactional worklist applications. Int. J. Parallel Programm. 42(2), 365–382 (2013). https://doi.org/10.1007/s10766-013-0253-x
Grahn, H.: Transactional memory. J. Parallel Distrib. Comput. 70(10), 993–1008 (2010). https://doi.org/10.1016/j.jpdc.2010.06.006
Guerraoui, R., Herlihy, M., Pochon, B.: Towards a theory of transactional contention managers. In: Proceedings of the Twenty-fifth Annual ACM Symposium on Principles of Distributed Computing, PODC 2006, pp. 316–317. ACM, New York (2006). https://doi.org/10.1145/1146381.1146429
Gustedt, J., Jeannot, E., Mansouri, F.: Automatic, abstracted and portable topology-aware thread placement. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 389–399 (2017). https://doi.org/10.1109/CLUSTER.2017.71
Harris, T., Larus, J., Rajwar, R.: Transactional Memory, vol. 2. Morgan and Claypool Publishers, San Rafae (2010). https://doi.org/10.2200/S00272ED1V01Y201006CAC011
Hughes, C., Poe, J., Qouneh, A., Li, T.: On the (dis)similarity of transactional memory workloads. In: 2009 IEEE International Symposium on Workload Characterization (IISWC), pp. 108–117 (2009). https://doi.org/10.1109/IISWC.2009.5306790
Jeannot, E.: TopoMatch: Process mapping algorithms and tools for general topologies (2020). https://gitlab.inria.fr/ejeannot/topomatch. Accessed 20 July 2020
Jeannot, E., Mercier, G., Tessier, F.: Process placement in multicore clusters: algorithmic issues and practical techniques. IEEE Trans. Parallel Distrib. Syst. 25(4), 993–1002 (2014). https://doi.org/10.1109/TPDS.2013.104
Khaleghzadeh, H., Deldari, H., Reddy, R., Lastovetsky, A.: Hierarchical multicore thread mapping via estimation of remote communication. J. Supercomput. 74(3), 1321–1340 (2017). https://doi.org/10.1007/s11227-017-2176-6
Luk, C.K., et al.: Pin: Building customized program analysis tools with dynamic instrumentation. In: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 190–200. ACM, New York (2005). https://doi.org/10.1145/1065010.1065034
Majo, Z., Gross, T.R.: Memory system performance in a NUMA multicore multiprocessor. In: Proceedings of the 4th Annual International Conference on Systems and Storage, SYSTOR 2011, pp. 12:1–12:10. ACM, New York (2011). https://doi.org/10.1145/1987816.1987832
Mazaheri, A., Wolf, F., Jannesari, A.: Unveiling thread communication bottlenecks using hardware-independent metrics. In: Proceedings of the 47th International Conference on Parallel Processing. ICPP 2018. ACM, New York (2018). https://doi.org/10.1145/3225058.3225142
Minh, C.C., Chung, J., Kozyrakis, C., Olukotun, K.: STAMP: stanford transactional applications for multi-processing. In: IEEE International Symposium on Workload Characterization. pp. 35–46. IEEE CS, September 2008. https://doi.org/10.1109/IISWC.2008.4636089
Mohammed, M.S., Abandah, G.A.: Communication characteristics of parallel shared-memory multicore applications. In: 2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT), pp. 1–6 (2015). https://doi.org/10.1109/AEECT.2015.7360553
Mururu, G., Gavrilovska, A., Pande, S.: Quantifying and reducing execution variance in STM via model driven commit optimization. In: 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 109–121 (2019). https://doi.org/10.1109/CGO.2019.8661179
Pasqualin, D.P., Diener, M., Du Bois, A.R., Pilla, M.L.: Online sharing-aware thread mapping in software transactional memory. In: 2020 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 35–42. IEEE CS, September 2020. https://doi.org/10.1109/SBAC-PAD49847.2020.00016
Pasqualin, D.P., Diener, M., Du Bois, A.R., Pilla, M.L.: Thread affinity in software transactional memory. In: 2020 19th International Symposium on Parallel and Distributed Computing (ISPDC), pp. 180–187. IEEE CS, July 2020. https://doi.org/10.1109/ISPDC51135.2020.00033
Pellegrini, F.: Static mapping by dual recursive bipartitioning of process architecture graphs. In: Proceedings of IEEE Scalable High Performance Computing Conference, pp. 486–493 (1994). https://doi.org/10.1109/SHPCC.1994.296682
Poudel, P., Sharma, G.: Adaptive versioning in transactional memories. In: Ghaffari, M., Nesterenko, M., Tixeuil, S., Tucci, S., Yamauchi, Y. (eds.) Stabilization, Safety, and Security of Distributed Systems. pp. 277–295. Springer International Publishing, Cham (2019). https://doi.org/10.1007/978-3-030-34992-9_22
Rane, A., Browne, J.: Performance optimization of data structures using memory access characterization. In: 2011 IEEE International Conference on Cluster Computing, pp. 570–574 (2011). https://doi.org/10.1109/CLUSTER.2011.77
Sasongko, M.A., Chabbi, M., Akhtar, P., Unat, D.: ComDetective: a lightweight communication detection tool for threads. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC 2019 ACM, New York (2019). https://doi.org/10.1145/3295500.3356214
Soomro, P.N., Sasongko, M.A., Unat, D.: BindMe: A thread binding library with advanced mapping algorithms. Concurr. Comput. Pract. Exp. 30(21), e4692 (2018). https://doi.org/10.1002/cpe.4692
Stirb, I.: NUMA-BTDM: A thread mapping algorithm for balanced data locality on NUMA systems. In: 2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), pp. 317–320 (2016). https://doi.org/10.1109/PDCAT.2016.074
Waliullah, M.M., Stenstrom, P.: Removal of conflicts in hardware transactional memory systems. Int. J. Parallel Program. 42(1), 198–218 (2012). https://doi.org/10.1007/s10766-012-0210-0
Wang, Z., Bovik, A.C.: Mean squared error: Love it or leave it? a new look at signal fidelity measures. IEEE Signal Process. Mag. 26(1), 98–117 (2009). https://doi.org/10.1109/MSP.2008.930649
Yu, Z., Zuo, Yu., Zhao, Y.: Convoider: a concurrency bug avoider based on transparent software transactional memory. Int. J. Parallel Program. 48(1), 32–60 (2019). https://doi.org/10.1007/s10766-019-00642-1
Zhou, N., Delaval, G., Robu, B., Rutten, E., Méhaut, J.F.: An autonomic-computing approach on mapping threads to multi-cores for software transactional memory. Concurr. Comput. Pract. Exp. 30(18), e4506 (2018). https://doi.org/10.1002/cpe.4506
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Pasqualin, D.P., Diener, M., Du Bois, A.R., Pilla, M.L. (2021). Characterizing the Sharing Behavior of Applications Using Software Transactional Memory. In: Wolf, F., Gao, W. (eds) Benchmarking, Measuring, and Optimizing. Bench 2020. Lecture Notes in Computer Science(), vol 12614. Springer, Cham. https://doi.org/10.1007/978-3-030-71058-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-71058-3_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71057-6
Online ISBN: 978-3-030-71058-3
eBook Packages: Computer ScienceComputer Science (R0)