Concurrency Bug Avoiding Based on Optimized Software Transactional Memory

,

Software transactional memory (STM) [11][12][13][14][15][16][17][18][19][20][21] is a software-level concurrency control mechanism with which programmers can partition code into transactions and ensure them to execute atomically and in isolation with respect to each other.A transaction either commits at a commit point trivially or aborts, in which case it conflicts with other transactions, by revoking any effects it has made.STM can simplify concurrent programming by easing data protection, permitting sequential reasoning among transactions, and disabling part of concurrency bugs.Although STM is a promising technique for achieving easier/safer concurrency and avoiding concurrency bugs [22][23][24][25], it has drawbacks hindering its adoption in the wild world: (i) High human cost for equipping programs with transaction functionality.Most STM systems are implemented as programming libraries with rich APIs for use.For moving legacy code from lockbased to transaction-based, programmers have to check the code carefully and insert low-level STM API calls at proper points.e calls to STM APIs are used to demarcate transactions and identify potential shared accesses in them.Moreover, employing transactions also requires changes to the data/control structures [22,25].Although people have proposed methods [15] to alleviate this burden, they do not remove it all.(ii) Low compatibility with I/O calls.While user-space memory within a transaction is under the STM system's control, memory in the kernel may not be.As a result, the atomicity and isolation properties are not automatically enforced for changes to kernel data structures.In addition, some I/O operations, such as printing a message onto screen, cannot be reversed on abort.So most STMs, such as Grace [11], Intel STM [25], and Haskell STM [26], simply prohibit using I/O calls within transactions.Analyses of multithreaded programs written with locks show that I/O calls are a regular occurrence in critical sections [20,27].Also, a study of real-world concurrency bugs shows that about 15% of concurrency bugs' recovery involves revoking I/O calls' effects [22].Hence, forbidding I/O calls in transactions reduces the usability of STM and threatens its validity as a solution to concurrency bugs.(iii) Low compatibility with condition variables.A condition variable (or "condvar") does not follow an atomical and isolated specification: an invocation of the wait method cannot return without paring with an intervening signal by another thread.When a transaction that executes wait/signal operations aborts, there is no safe way to revoke effects generated by them.For example, revoking/re-executing a not-finished-yet wait may lead to missing a signal signaled by another thread and further block the calling thread forever.So lots of STMs [11-19, 23, 28, 29] do not support using condvars in transactions.is makes STMs fail to avoid quite a few concurrency bugs [22] and has also been recognized as an obstacle to transactionalizing legacy code [22,30,31].
To overwhelm these deficiencies, this paper presents a new and optimized STM system Convoider, which tries to transparently transactionalize applications without any manual effort, avoid various types of concurrency bugs, and support revocable I/O as well as proper condvar handling simultaneously.Figure 1 overviews Convoider.It controls applications in both linkage and runtime phases.
For applying Convoider to an application, users first relink related object files, such as relocatable object files and shared object files, with the linker ld, against a customized linker script file, which is used to control the memory layout of the output executable file.By way of this control, Convoider specifies the start address where global data are stored in the address space. is information is then used at runtime to make global memory revocable.
At runtime, Convoider instruments five kinds of operations: interthread synchronization operations, memory access operations, input/output operations, condvar signal/ wait operations, and lock/unlock operations.By intercepting interthread operations and designating code among them as transactions in each thread, Convoider automatically transactionalizes target programs without any source code modification and recompiling.By saving/ restoring stack frames and CPU registers on beginning/ aborting a transaction, Convoider makes execution flow revocable.By turning threads into processes, leveraging virtual memory protection, and customizing memory allocation/deallocation, Convoider makes memory manipulations revocable.By maintaining virtual file systems and redirecting I/O operations onto them, Convoider makes I/O effects revocable.By converting lock/unlock operations to no-ops, customizing signal/wait operations on condvars, and committing memory changes transactionally, Convoider makes deadlocks, data races, and atomicity violations impossible.
We have implemented Convoider in Ubuntu as a dynamic shared library for C/C++ applications.It runs with target applications in the same address space.We evaluated it against two benchmark suites: one real-world application suite including 12 real-world applications selected from PARSEC [32], SPLASH2 [33], and Phoenix [34] and one concurrency bug suite containing 31 concurrency bugs collected from Mocklinter [35], data-racetest [36], Maple [37], PSet [38], and Grace [11].Evaluation results show that Convoider succeeds in automatically transactionalizing many more programs and perfectly avoiding many more concurrency bugs than Grace.Meanwhile, Convoider is efficient and only averagely incurs 28% runtime overhead to the target applications.
is paper makes contributions in the following points: (1) An optimized STM system Convoider is proposed to transparently transactionalize multithreaded programs and avoid concurrency bugs.e rest of this paper is organized as follows: we review background works and describe our improvements on how to make execution flow and global/heap memory revocable in Section 2. We describe how to make I/O revocable in Section 3 and how to properly handle condvars in Section 4. Section 5 depicts the detailed execution of transactions.Section 6 evaluates Convoider and compares it with Grace [11], Dthreads [39], Dimmunix [40], and Slider [41].Section 7 concludes this study.

Background
Grace [11] is the first trial to automatically transactionalize traditional lock-based concurrent programs.It turns threads into processes, automatically partitions code into transactions, constructs a transactional memory by leveraging virtual memory protection, and avoids simple concurrency bugs.Dthreads [39] enhances Grace in adding support for proper condvar handling.Conversion [42] advances Dthreads by recreating an efficient virtual memory manager at the kernel level.We compare Convoider with Grace and Dthreads in Section 6.However, comparison between Convoider and Conversion is not made because they are not comparative in both implementation levels (user space level vs. kernel level) and usability (a shared library vs. an extended kernel).
We build Convoider on top of Grace [11] by reusing and enhancing its transaction partition, revocable execution, and transactional memory modules. is section will give background knowledge about how Grace demarcates transactions and makes execution flow and global/heap memory revocable, its deficiencies, and our improvements.

Transaction Partition and Revocable Execution.
Grace intercepts thread creates and joins and designates code among them as transactions.Grace only targets fork-join concurrent programs.While Convoider aims at coping with more general concurrent applications, it also intercepts other interthread synchronization operations such as condvar wait, signal, and broadcast.For illustrating how Convoider automatically demarcates transactions, take t 1 as an example in Figure 1, which contains two immediate interthread operations b 1,i and b 1,i+1 .When b 1,i is called, Convoider will try to commit t 1 's current transaction and meanwhile start a new transaction, which lasts until b 1,i+1 is called.
Grace saves/restores stack frames and CPU registers on starting/aborting a transaction.us, when a transaction conflicts with other transactions and aborts, it can roll back to its begin point and try again.Grace uses a pair of functions getcontext and setcontext to get/set the execution context, which contains CPU registers such as program counter and stack pointer.However, these two functions do not save/ restore stack frames.To do that, Grace itself copies the stack frames of the current thread to a private memory region when a transaction starts and writes back the frames if this transaction aborts.e written-back frames together with the restored registers make the transaction re-execute in the same execution context as it begins in its last execution.However, Grace only copies part of stack frames of the current thread, which would lead to stack smashing ( is is why Grace fails to apply to string_match in Section 6.3) when the partly stored stack frames are written back.Instead Convoider saves the whole stack rather than part of it by leveraging portable routines __builtin_return_address and __builtin_frame_address.By saving/restoring complete stack frames on beginning/aborting a transaction, Convoider can correctly make execution flow revocable.

Revocable Memory.
Grace implements an efficient software transactional memory by treating threads as processes: instead of spawning new threads, Grace forks off new processes.Because each "thread" is in fact a separate process, it is possible to use standard memory protection functions and signal handlers to track reads and writes to memory.Grace presumes that an application uses only global memory and heap memory to share data among threads and tracks accesses to memory at a page granularity.

Revocable Global Memory.
Grace assumes the size of the global data in the target application would not exceed 100 MB.It uses a customized linker script (Figure 1) to locate the start address of the global data with a linker variable "gracestart."Meanwhile, a symbol "_end" in ELF executable indicates the address of the first byte after uninitialized global data section (namely, .bsssection).erefore, at runtime, memory area between these two address stores the global bytes.e linker script is also used to instruct the linker to  Scientific Programming page align and separate read-only memory from global read/ write memory.When an executable is loaded, Grace creates a 100 MB region to hold the global memory and establishes two memory mappings, one shared and one local, for the region.It maintains a version region for the global region and uses a word in the former to track the changes of a page in the latter.For the version region, Grace also creates two mappings: one shared and one local.Each pair of mappings created for a global or version region is correlated with the same on-disk temporary file. is correlation can make the content of local mapping consistent with that of shared mapping by simply remapping the local mapping from the file.
e mechanism to make global memory revocable is illustrated in Figure 2. Suppose that currently threads t 1 and t 2 are concurrently executing transactions tx a and tx b , and tx b starts before and ends after tx a .When tx a starts, it performs several memory operations.Each read/write operation accesses memory through the private mapping (step ①), which permits reads to directly read data from the shared memory but redirects writes to a copy page of the corresponding "wanted" page in the shared memory (steps ② and ③).On starting, a transaction gets a private mapping for the global memory.Each page's permission in the private mapping is set to PROT_NONE.So, the first access issued by tx a to an PROT_NONE page causes Grace to add this page to tx a 's read set (in page granularity) and then set the page's permission to PROT_READ.If a subsequent access to this page is a write, Grace adds this page to tx a 's write set and sets its permission to PROT_READ| PROT_WRITE.Such later accesses to it will not trigger page faults and run at full speed.
Each time a transaction starts, Grace also creates for this transaction a local copy of the global version numbers (steps ④ and ⑤).When the transaction tries to commit, it will first check whether its read bytes are still valid by comparing version numbers of pages in its read set against the global corresponding version numbers (step ⑥).If every local version number is equal to its global counterpart, the check is passed and the transaction commits by doing the following things: for each page in the write set, (1) incrementing the global corresponding version number by one, and (2) copying its content to the counterpart page in the shared memory (steps ⑥, ⑦, ⑧, and ⑨).Otherwise, the transaction aborts by abandoning memory changes buffered in its write sets, rolling back to its begin point, and trying to execute again (step ⑩).In Figure 2, tx a and tx b both access a page p: tx b reads it, while tx a writes it.When tx b starts, it finds that p's global version number is 4 and gets a local copy of it.en tx a starts, it sees the same version number 4 of p and also gets a local copy of it.However, tx a commits before tx b and change p's global version number from 4 to 5. When tx b tries to commit, it finds that p's local version number is less than p's global one.In such a case, tx b knows some other transaction has already updated p's content, so, it aborts and re-executes to read the up-to-date content.

Revocable Heap Memory.
When a target application starts, Grace claims a fixed-size 512 MB region from OS for it to hold its heap data.Grace maintains a version region for the heap region and uses a word in the former to track the changes of a page in the latter.Also, Grace creates shared and local mappings for both the heap region and the version region.And, each pair of mappings is backed with the same on-disk temporary file.
Grace instruments all basic C/C++ memory malloc/ free family functions in order to steer the target application's memory claim/reclaim requests to the 512 MB heap region mentioned above.Figure 3 depicts the heap memory management mechanism.Grace embeds memory management metadata structures into the heap memory.is organization elegantly solves the problem of revoking effects of memory allocations/deallocations. Grace rolls back memory allocations/deallocations just as it rolls back ordinary updates to heap data or global data (Section 2.2.1).
Grace manages the heap region based on Hoard [43] and Heap Layers [44].It divides the heap region into 16 subheaps.Each thread uses a hash of its thread identifier to claim a subheap for satisfying memory malloc/free requests issued by that thread.e isolation of each thread's memory manipulations from the others' allows threads to run independently most of the time.Each subheap is initially set as a 1024 KB zone.Each zone has an associated 16 B arena, which stores three kinds of information: the size of remained memory in the zone, the start address of remained memory, and the pointer to the next zone.Memory in zones is allocated linearly: the first allocated object is followed by the second one and so on.Each memory object occupies bytes of a power of 2 (at least 8 B) and has an 8 B object header used to record the object's size and alignment information.Each subheap can have multiple zones, and they are linked with each other by using the next pointer in each zone's arena.As long as a thread does not exhaust memory in the current zone, it will run independently with any other thread.If running out of memory, it will obtain another zone from the global allocator, whose size is 1024 KB or bigger if the memory request asks for more bytes than 1024 KB (zone2 in Figure 3).Zones are allocated linearly from the heap region.
is zone allocation strategy would make the current thread conflict with another thread only if that thread also runs out of memory during the same period.In this situation, two threads will both view the newly allocated zone as their own zone.
Because each allocated object's size is a power of 2, when an object of size sz is freed, it is reclaimed and inserted into the head of bin log 2 sz.Each subheap has 29 bins and bin n (0 ≤ n < 29) is a double-linked list of free objects that each of them has a size of 2 3 * 2 n .Each chunk in a bin is a user memory object, not including the management information such as the object header (Figure 3).When the application requests a memory object of length len, Grace acts as follows: (1) If len is not larger than 8, it searches in bin 0 for a free chunk.If there exists one, Grace allocates it out.
Otherwise, the logic sets len to 8 and goes to (4).
(2) If len is greater than or equal to 2 31 , Grace reports an error.
searches in bin m for a free chunk.If there exists one, Grace allocates it to the application.Otherwise, the logic sets len to 2 3 * 2 m and goes to (4).(4) Grace allocates the requested memory from the current thread's zone.
When a transaction begins, Grace creates for this transaction a local copy of the shared heap memory by way of private mapping.All memory malloc/free operations are actually redirected to this private copy and only change pages in the current thread's own zone.For example, in Figure 3, a transaction executed by thread r only mallocs/ frees memory in zone r.When the transaction tries to commit, if Grace finds any page in its subheap is invalid, it rolls back memory allocations/deallocations for that transaction by simply discarding its write pages, recreating a local heap memory copy, and trying again.Currently, there are only 16  mapped to the same subheap, they may conflict with high possibility.
Convoider corrects a severe bug in Grace where Grace unnecessarily and wrongly aligns allocated memory to page size on responding to memory allocation requests.e bug may make a newly allocated memory chunk by Grace overlap with a currently being used memory object, leading to memory crashing.( is is why Grace fails to apply to kmeans and reverse_index in Section 6.3.)Convoider fixes the bug by allocating memory continuously, namely, not aligning memory to page size.

Revocable I/O
Convoider instruments 84 commonly used I/O operations, as listed in Table 1, to provide transactional input/output support for regular files and character files at three levels: system level, C level, and C++ level.Convoider currently does not support revocable I/O for directory files, so it does not instrument directory I/O routines such as opendir, readdir, chdir, and closedir.
To enable performing I/O operations in transactions, researchers have developed three strategies: deferral [20,27], namely, deferring I/O operations until commit, compensation [20,27], namely, performing I/O operations as usual during the execution of transactions and reversing their side effects on transactions aborting, and irrevocability [18,45], namely, ensuring that transactions with I/O operations will never abort.However, each strategy itself is imperfect [20].When two operations are deferred, the OS may not be able to guarantee that both of them will succeed finally, leading to an inconsistent state.When system calls must be reversed on transaction abortion, the side effect revoking may fail.To guarantee successful commit, irrevocable transactions (those ones with I/O operations) cannot execute concurrently, causing downgrading performance.Convoider leverages a combined strategy to make I/O operations reversible.Convoider integrates deferral and compensation with a new strategy: exclusiveness.Under Convoider, multiple transactions with I/O operations run concurrently most of the time because they privatize I/O changes within transactions.However, if an I/O operation that may change the current process's table of open file descriptors is going to be performed, the current transaction enters an exclusive mode by prohibiting other transactions from executing such operations until itself commits or aborts.e prohibited transactions abort themselves when they are allowed to progress.Meanwhile, transactions that do not change the open file table can run concurrently with the exclusive transaction.
e exclusive transaction still could abort because it may read out-of-date memory or access files that have been changed by other already-committed transactions.In this case, Convoider will revoke the effects of operations already performed by the exclusive transaction.
I/O operations in C/C++ programs can be categorized into three levels: system level, C level, and C++ level.At the system level, I/O operations manipulate files through file names or file descriptors.For confining effects of I/O operations within a transaction before the transaction commits, Convoider creates for the transaction a private copy of the global virtual file system when the transaction starts and redirects all system level I/O operations onto it while the transaction executes.e global virtual file system is created after the launch of the target application and shared among processes forked afterwards.When a transaction tries to commit, it must check whether the global counterparts of files in its private file system have not been changed by other transactions.
At the C level, I/O operations access file through file streams.Convoider maintains a global stream-to-descriptor map that takes the file stream pointer as key and the file descriptor as the value.When a transaction starts, Convoider creates for it a private copy of the global map.When C I/O operations are going to be performed, Convoider leverages this private map to make them access files directly through file descriptors instead of file streams.According to C standards, a file stream can be associated with multiple (only one at a time) file descriptors.A transaction may switch a stream from an old file to a new file, while another transaction could still use that stream to refer to the old file.So on committing, the transaction has also to check whether entries in its private map are equal to their counterparts in the global map.
C++ level I/O operations based on I/O operations listed in Table 1 are automatically revocable because their underlying operations are revocable.So, by making system I/O and C I/O operations revocable, Convoider makes C++ I/O operations revocable.

Revocable System I/O. Convoider makes system I/O
operations revocable by way of deferral, compensation, or exclusiveness strategies, depending on their semantics.Besides these strategies, Convoider also allows immediate execution of read-only operations, such as read and access, or sets some operations as no-ops, like sync, fsync, and fdatasync.Table 2 categories system I/O operations into five groups according to the strategy used to make them revocable.Note that fcntl falls into different categories because it has different operation semantics with different commands.For example, if having F_DUPFD as command, fcntl may formation.e v-node also contains an i-node for the file, which contains the owner of the file, the size of the file, and so on.For a process that forked from the main process, the forked process inherits its parent's file descriptor table.However, if the forked process opens a file, reads or writes it, there is no way to know whether the other process has accessed that file or not.To catch such information, Convoider maintains a virtual file system.Because Convoider only cares about regular files and character files, not supporting revocable I/O for directory files, the virtual file system is in fact a vector of virtual files.As shown in Figure 4, each virtual file contains a version number, indicating the number of commit times for this file, and some file attributes, including file name, file descriptor, file mode (indicating file's type and access permission), file status flag, file size, current offset, file owner, and file group owner.Besides these two fields, a private virtual file also contains two additional parts: a write list, used to buffer write bytes, and special virtual file records, including flags that indicate whether this virtual file is opened, closed, created, linked, unlinked, or symbol-linked, and strings, used to buffer the new link or symbol-link names.A virtual file's attributes are obtained from the real-file system when the real file is created or opened.
When an application is launched and about to execute, Convoider establishes a global virtual file system for it by creating virtual files for standard input/output/error files and inserting them to the file system.Later, whenever a transaction starts, Convoider creates a private virtual file system for it by duplicating the global one and uses this private file system to buffer changes for that transaction.

Deferral and Read-Only Strategies.
Figure 4 shows the mechanism that makes system I/O revocable.As shown in Figure 4, a process gets a private virtual file system when it begins to execute a transaction (step ①).For an I/O operation that writes bytes to a file or changes file attributes, Convoider buffers its write bytes or changes onto the private file system until the current transaction commits or aborts (step ②).For an I/O operation that reads bytes from a file, Convoider first reads specified bytes from the real file (step ③) and then checks whether there is an overlapping between the read bytes and the buffered write bytes of the virtual file corresponding to the real file.If true, Convoider replaces the read bytes in the overlapping range with the corresponding write bytes.At last, the adjusted bytes are returned as result.For an I/O operation that retrieves file attributes, Convoider directly returns the corresponding virtual file's attributes as result instead of accessing the real file (step ②).On committing, a transaction checks whether its private files are consistent with the global ones by comparing files' versions.If they all match with each other (step ④), the transaction does three things for each virtual file in its virtual file system: replacing the corresponding global file's attributes with this virtual file's attributes (step ⑤), writing buffered bytes or attribute changes into the real file (step ⑥), and incrementing the corresponding global file's version by one (step ⑤).1), cannot be delayed during the execution of a transaction.However, if different processes concurrently execute these operations, some process may get a private virtual file system that is inconsistent with its real file system.

Exclusiveness and Compensation
To keep file systems of different processes consistent with each other, Convoider makes these I/O operations execute exclusively: in a transaction, once any such operation is going to be executed, the transaction acquires an interprocess lock if the lock has not been acquired and executes that operation under protection.is lock is held by the current process until the transaction commits or aborts.us, other transactions that execute such operations are all blocked from making progress.Between when the lock is acquired and when it is released, Convoider uses a list of interprocess file information objects to record each such operation's operation type and parameters.When the transaction eventually commits, for each object in the list, it sends to all other active processes a signal, taking the object as the signal's accompanying data, to tell them what operation has been performed in this transaction.However, these signals are not immediately received by the target processes.
ey are received and handled only when the targeted processes start/re-execute transactions.For a blocked process by the interprocess lock, if it is allowed to continue, it aborts its current transaction and re-executes the transaction again to receive signals.For an unblocked process, it executes its current transaction as usual.If the transaction finally aborts, the unblocked process re-executes it and Once a signal is received by a target process, the process carries out the operation with the parameters indicated in the accompanying file information object.After handling signals, the process gets a private copy of the global file system. is time, the private virtual file system is consistent with the underlying real file system.When a transaction executes such operations exclusively, it opens or creates files in the real file system and also creates virtual files in its virtual file system.When the transaction commits, it compares all files except newly opened or created files in its private file system with their global counterparts.If they do not match all (step ⑦ in Figure 4), the transaction cancels out effects of such operations by closing or deleting corresponding files in the real file system (step ⑧) and rolls back to its starting point (step ⑨). 1 and reimplements them using revocable system I/O operations described in Section 3.1.For example, Convoider uses open and close to implement mkstemp and tmpfile, and utilizes link, unlink, and close to implement remove and rename.C I/O operations that take either file names or nothing as input can be simply reimplemented based on system I/O operations.

Revocable C I/O. Convoider instruments C I/O operations shown in Table
However, most C I/O operations manipulate files through file streams, which are generated by fopen, freopen, or fdopen.For these operations, Convoider cannot directly overwrite them using system I/O operations.To solve this problem, Convoider maintains a global map that each entry maps a file stream pointer to a file descriptor.When a transaction starts, Convoider creates for it a private copy of this global map. is transaction may use fopen or fdopen to create a new stream, or use freopen to change a stream's associating file, or use fclose to close a stream, or use other stream-based operations to access files.Convoider instruments fopen as follows: (a) calling open to open a specified file and return the file's descriptor, (b) converting the descriptor to a stream pointer and creating an entry to establish a mapping from this pointer to the descriptor, (c) inserting the entry to the private map of the current transaction, and (d) returning the pointer as a result.For fdopen, Convoider instruments it with the latter three steps used to instrument fopen.For freopen, Convoider reimplements it in a similar way to re-implementing fopen, except that instead of creating a new mapping entry, Convoider modifies a corresponding existing entry in the private map.For fclose, Convoider reimplements it by removing from the private map an entry taking the stream pointer specified in fclose as key.For other operations, Convoider leverages the private map to make them access files directly through descriptors instead of streams. is disables stream buffering, so Convoider sets operations setvbuf, setbuf, fwide, ferror, and clearerr as no-ops.When the transaction commits, it needs to check whether or not each entry in its private map matches with the corresponding global counterpart.Only when they all match, the transaction can successfully commit.e committing is simply done by copying contents of the private map back to the global map.

Condition Variable Handling
Convoider instruments all condvar operations and reimplements condvars in user space in a similar way to the

8
Scientific Programming method of Wang et al. [46].However, Convoider's solution is more portable because of needing no changes to condvar's programming interfaces.When a condvar's initialization operation is performed, Convoider allocates a page memory from the shared memory pool reserved for condvars and stores the begin address of this page into the first word of the current condvar.All subsequent operations to the condvar are redirected onto this page.As shown in Figure 5, this page p is organized into 512 slots, one double-word per slot.ese slots are further separated into two groups: the control group and the waiting thread group.e former group contains two slots: slot 0 and slot 1 .e first word p[0] of slot 0 is used to implement a customized lock: 1 indicating unlocked and 0 representing locked.Any access to this page should be performed under protection by this lock.e second word p [1] indicates the position of the next available slot. e value of p [1] monotonically increases from 0 and wraps around when it exceeds 511.For slot 1 , its first word records the number of threads that waits on this condvar and its second word is unused.e remaining slots fall into the second group.Each slot in this group is a tuple <tid, bsignaled>, where tid represents the waiting thread and bsignaled indicates whether this thread has been signaled by this "paged" condvar.
When a condvar's wait is performed, Convoider instruments it as follows: (a) committing the current transaction, (b) acquiring the lock of the paged condvar p[0], (c) incrementing the number of waiting threads p [2] by one, (d) setting the slot indicated by p [1] to <self, false>, where self represents the current thread's identifier, (e) spin waiting until the current thread is signaled on this condvar by another thread, (f ) decrementing p [2] by one, (g) releasing the lock p[0], and (h) starting a new transaction.
When a condvar's signal is performed, Convoider instruments it as follows: (a) committing the current transaction, (b) acquiring the page lock, (c) checking whether there is any waiting thread, (d) if there is any one, randomly selecting a waiting thread and signaling it by setting its bsignal to true, (e) releasing the lock, and (f ) starting a new transaction.
e broadcast operation of condvar is also instrumented in such a way, except that all waiting threads are signaled instead of only one thread being signaled.
When a condvar's destroy operation is performed, Convoider reclaims the page allocated for the condvar which is specified in the destroy's parameter.

Transaction Execution
A transaction can start, commit, or abort.When an application is going to execute, Convoider kicks off the first interesting transaction for it after necessary initializations are finished.Necessary initializations include saving execution context, creating memory mapping, creating version mapping and establishing virtual file system and stream-todescriptor map.
On starting, a transaction first checks whether there is any blocked signals sent by other transactions.If true, the transaction receives and handles them as described in Section 3. 1.3.en, this transaction saves current stack frames and CPU registers, creates private memory/version mappings, and establishes the private virtual file system and stream-to-descriptor map from their corresponding global ones.e transaction also sets the protection of each page in its private mappings to PROT_NONE and clears its read/ write page set.Additionally, for each file in the private virtual file system, this transaction initializes its special records (Section 3.1.1)with proper values and applies its attributes to the corresponding real file, thus keeping the real file consistent with the virtual file.
On committing, a transaction first performs consistency checks for memory and files it accesses during its execution.If all checks are passed, this transaction commits memory and files as described in Sections 2.2 and 3.Each transaction commits independently, needing not to wait for descendant transaction committing first.Due to committing transactions in no order, Convoider cannot prevent order violations from happening in one hundred percentage.
On aborting, a transaction discards any memory updates by calling madvise function with advice MADV_DONT NEED for all of the private mappings, and it abandons any buffered writes by clearing the write list of each file in the private file system.e transaction also empties the list of interprocess file objects and closes/unlinks any files opened/ created during its execution.
en, it releases the interprocess lock mentioned in Section 3.1.3if the lock is held.At last, the transaction restores stack frames saved on starting for the current process and rolls back to the start point.

Evaluation
We have implemented Convoider on Ubuntu-12.04as a dynamic shared library.Its goal is to transparently transactionalize multithreaded programs at no human cost, meanwhile providing revocable I/O support for regular/ character files, handling condvars properly, and avoiding concurrency bugs like deadlocks, data races, and atomicity violations.We evaluate Convoider against a real-world application suite and a concurrency bug suite to answer the following research questions:  [47], PARSEC [32], SPLASH2 [33], and Phoenix [34] to evaluate its applicability.STAMP is a benchmark suite designed for transactional memory research.It consists of eight macro applications and several microapplications, such as rbtree, list, and hashtable.All these applications are originally instrumented with low-level transactional memory API calls.However, Convoider is designed to transactionalize applications without needing any instrumentation.So, Convoider is not applicable to any STAMP application if we do not manually remove the transaction memory API instrumentation and convert these STM-based applications into lock-based applications.
PARSEC is a benchmark suite for shared memory onchip multi-processor architectures and contains thirteen multithreaded and memory-intensive applications.In all applications, five applications (fluidanimate, raytrace, bodytrack, canneal, and streamcluster) use pthread barriers for synchronization; thus, Convoider is not applicable to them.Among other applications, one application (freqmine) is written with OpenMP (instead of pthread library) and four applications' (facesim, vips, x264 and ferret) linking procedures are too complicate for Convoider to cope with.For the remainder three applications (blackscholes, dedup, and swaptions), Convoider is only applicable to swaptions.Convoider cannot transactionalize dedup because it requires an amount of memory exceeding Convoider's limits.When running with Convoider, dedup reports a memory allocation failure warning and terminates.Convoider cannot transactionalize blackscholes because it smashes stack when running with Convoider.
For SPLASH2, we test Convoider on its 4 kernel applications: lu, fft, radix, and cholesky.ese applications are also memory-intensive and CPU-intensive, and they synchronize among threads by using pthread mutexes and condvars.Results show that Convoider can work well with lu, fft, and radix but does not work with cholesky.For cholesky, Convoider is applicable to its serial version.However, for multithreaded cholesky, Convoider causes it to be blocked after it partitions the input into blocks and launches multiple threads to deal with these blocks.After careful check, we find that, besides standard pthread synchronization facilities, Cholesky also uses ad hoc synchronization constructs (for example, reading/writing shared flags) to synchronize threads.Convoider privatizes memory changes into a transaction until it commits.If two threads that run two transactions use shared flags to communicate, the read thread may wait forever because it cannot see the new value updated by the write thread without aborting/reexecuting.
We also test Convoider on Phoenix which totally contains eight applications: histogram, kmeans, liner_regression, matrix_multiply, string_match, word_count, reverse_index, and pca.All applications are memory-intensive and CPUintensive.Two of them synchronize with pthread mutexes.Testing results show that Convoider is applicable to all applications.Note that although the reverse_index application performs directory manipulation operations such as opendir and readdir, however, Convoider still can be applied to it because those operations are called in the first transaction which will definitely succeed in committing.
Table 3 lists all twelve applications with which Convoider can work.Among these applications, swaptions is written in C++ and others are written in C.Although all applications are multithreaded and memory-intensive, seven of them (swaptions, histogram, kmeans, linear_regression, matrix_multiply, string_match, and word_count) are embarrassing parallel, meaning that there is no memory shared among threads and each thread runs independently without synchronizing with others.And, although all applications read/ write files, only five of them (lu, fft, radix, string_match, and reverse_index) perform concurrent I/O operations, meaning that file accesses are carried out in threads that may run simultaneously.

Efficiency on Real-World Applications.
In this section, we evaluate Convoider's performance on twelve real-world applications listed in Table 3.For each application, we compare Convoider with pthread, Grace [11], and Dthreads [39] on the speedup incurred over the sequential version of the application.Dthreads is a deterministic concurrency system and an improver of Grace.6, we see that Convoider performs worse than pthread and better than the sequential execution.Averaged over twelve applications, the speedup ratio between pthread and Convoider is 2.58 : 1.86, and, namely, Convoider incurs 28% runtime overhead to the original multithreaded applications.Although Convoider brings large slowdown for the target applications, it succeeds in automatically transactionalizing all of them with almost no human efforts and runs them correctly without wrong outputs or crashes.Among twelve applications, Convoider can be directly applied to swaptions, lu, fft, radix, kmeans, and pca.For the remainder, we need to make minor (one or two) modifications to replace file mapping operation mmap with two operations: memory allocation malloc and file access operation read/write.Each such modification only needs three lines of codes.Convoider does not intercept mmap and has no perception of memory allocated by it, so without such modifications Convoider will not provide revocability for the memory.
In contrast, Grace is only perfectly applicable to two applications: swaptions and pca.For these two applications, Grace incurs about the same speedup as Convoider: the ratio is 3.19 : 2.85 averagely.Grace performs a little better than Convoider because it does not support revocable I/O while Convoider provides revocable I/O support for three system files: STDIN_FILENO, STDOUT_FILENO, and STDERR_FILENO and three file streams: stdin, stdout, and stderr.For histogram, linear_regression, matrix_multiply, and word_count, although Grace runs them without runtime errors or crashes, they terminate with wrong running results.So, the speedup incurred by Grace for these applications may be overestimated.For the other applications, Grace makes lu, fft, and radix hang forever because they synchronize with condition variables that Grace does not handle.For kmeans or reverse_index, Grace causes them to crash because of failed assertions or illegal memory accesses.Grace's problematic memory management is to blame for this crash.Lastly, Grace is also not applicable to string_match because when running with Grace, this application instantly terminates without any meaningful output.
Dthreads is perfectly applicable to nine applications: swaptions, histogram, kmeans, linear_regression, matrix_ multiply, pca, string_match, word_count, and rever-se_index.For these nine applications, Dthreads runs 7.3% slower than Convoider: the speedup ratio is 1.78 : 1.92.Dthreads performs a litter worse than Convoider because it uses a single global token to guarantee determinism.In Dthreads, on performing a lock/unlock or condvar wait/ signal operation, a thread is asked to first exclusively acquire the global token, leading to performance downgrading.For lu, fft, and radix, Dthreads makes them hang forever although it is declared to be able to handle locks and conditional variables properly.

Effectiveness on Avoiding Concurrency Bugs.
To evaluate Convoider's concurrency bug avoidance capability, we create a bug suite including totally thirty-one concurrency bugs which are classified into four categories: deadlock, data race, atomicity violation, and order violation.All bugs are listed in Table 5, together with their corresponding buggy programs, sources, descriptions, and categories.Among these bugs, fifteen bugs (bug#5, bug#6, bug#18, and bug#20-bug#31) are drawn from actual bugs described in the previous work on concurrency bug detection [35,37] and avoidance [11,38], while two bugs (bug#3 and bug#4) are real bugs without any changes.Our suite also includes four toy bugs [11,35,37,38] (bug#1, bug#2, bug#16, and bug#17) created for research purpose.e remainder nine bugs (bug#7-bug#15 and bug#19) are all collected from Google's data-race-test suite.Because concurrency errors are by their nature nondeterministic and occur only for particular thread interleaving, we insert delays (via usleep) at key points in the code.e delays makes these bugs occur in an almost 100% likelihood.us, all bugs can be easily triggered by directly running the corresponding buggy programs, except bug#3 and bug#4.For these two real bugs, we write and run triggering code to trigger them.In our experiments, we apply Convoider as well as other avoidance tools on these bugs.Given a bug, we apply a tool on it 10 times.Among 10 times of avoidance testing, only if one time the tool fails to avoid that bug, we say that the tool cannot avoid that bug.

Effectiveness on Avoiding
Deadlocks.We uses Convoider to avoid deadlock bugs bug#1-bug#9 and compare Convoider's avoidance capability with three state-of-the-art deadlock avoidance tools: Grace [11], Dimmunix [40], and Slider [41].We do not compare Convoider with Sammati  12 Scientific Programming [48] because it can only be compiled and run in a 64-bit environment while our platform is a 32-bit OS.However, as far as we know, from [48], Sammati can only avoid mutex deadlocks and cannot provide revocability for I/O operations.e experimental results are shown in Table 6, where the second column lists the types of the corresponding deadlock bugs.
As seen from Table 6, Convoider ties for strongest with Slider in deadlock avoidance capability among four tools: all deadlock bugs except bug#9 are perfectly avoided and the buggy programs terminate with expected behaviors.Comparatively, Dimmunix and Grace only perfectly avoid five and three bugs, respectively.Dimmunix is an offline deadlock avoider.It can only avoid deadlocks caused by mutexes, such as bug#1, bug#3, bug#4, bug#7, and bug#8.However, although bug#2 is caused by mutexes, Dimmunix fails to avoid it.We carefully check Dimmunix's source code and find its implementation for the lock-free queue contains data race bugs, which make Dimmunix fail to avoid bug#2.Grace avoids deadlocks by nullifying lock/unlock operations on mutexes/rwlocks, the same as Convoider does.However, different from Convoider, Grace does not support revocable I/O or proper condvar handling.
is causes that Grace cannot perfectly avoid bugs such as bug#1 and bug#4bug#6.For these bugs, their corresponding buggy programs terminate normally when running with Grace but with wrong outputs.e reason is that threads involved in any such bug all print out messages onto screen during the construction of the bug.
is will lead to conflicts when threads commits transactions.In such case, Grace will roll back victim threads but cannot revoke effects of print operations executed by those threads.So, under Grace, these deadlocks are avoided but wrong output is generated.For Grace, we also find with surprise that Grace fails to avoid bug#2, a simple mutex deadlock: the buggy program cannot terminate when running with Grace.It seems likely that Grace has trouble in revoking memory effects when rolling back victim threads involved in this bug.
We note that all tools cannot avoid bug#9.As shown in Figure 7, this bug actually is an acyclic deadlock [49].Each thread is waiting on a condvar, but there are no threads to send signals onto these condvars; thus, each thread will wait forever.For this bug, because Dimmunix and Grace do not handle condvars, they can do nothing to avoid the bug.Although Convoider and Slider rewrite condvars in user space, they keep the rewritten condvars the same semantics as the original ones.So, they either cannot avoid this bug.We conclude that, at least to deadlock bugs listed in Table 6, Convoider can successfully avoid deadlocks except the acyclic ones and is one of the most powerful deadlock avoidance tools.

Effectiveness on Avoiding Data
Races.We apply Convoider on data races bug#10-bug#17 and compare Convoider with Grace according to their avoidance capabilities.For these races, each thread except bug#17 prints a prompt by calling fprintf when finishing.e experiment results are shown in Table 7.
From Table 7, we see that Convoider correctly avoids all races, while Grace gets wrong output for bug#16 although it also avoids all races.However, according to our expectation, Grace should cause wrong outputs to bug#13 and bug#14 and fail to avoid bug#15.We carefully check the buggy programs and find that the way the involved threads are created is responsible for Grace's successes in perfectly avoiding these races.We find in the data-race-test suite, after a child thread is created, the main thread will access the same memory page as the new child.
is will trigger Grace's sequential commit protocol, which leads the main thread tries to commit if and only if the child thread finishes and commits.us, all threads complete in their creation order and outputs correctly.
We are surprised that Grace could successfully avoid bug#15.is bug, as shown in Figure 8, involves condvars which Grace cannot handle.In Figure 8, the main thread creates two child threads t 1 and t 2 : t 1 writes global variables GLOB and COND and sends a signal to a condvar cv, while t 2 waits on cv until COND becomes 1 and then sets GLOB as 2. Because of the sequential commit protocol, Grace makes t 2 execute after t 1 finishes.When t 2 runs to line L14, it finds COND is 1. us, it continues to execute remainder instructions without waiting on cv.erefore, data races on GLOB are avoided and correct prompts are output.However, if we change code to let t 1 be created after t 2 , then Grace will make the program hang on cv forever while Convoider still can work to avoid data races because it permits concurrent execution of these two threads.
Bugs such as bug#10-bug#12 are simple races: for each race, there are two threads in it that each thread performs a read or a write on a shared variable.For these races, Convoider and Grace avoid them in different ways.Grace avoids them by executing threads sequentially as stated above.us, races are impossible.However, Convoider executes threads concurrently.Because each thread only performs one access onto the page where the shared variable locates, Convoider will only record the page into the read sets and find there are no conflicts between threads (or transactions) when committing (Section 2.2.1).So Convoider commits these transactions one by one, thus avoiding races.
Bugs such as bug#16 and bug#17 are not only races but also atomicity violations.Once they happen, the corresponding buggy programs will end with a segmentation fault        8.
As shown in Table 8, Convoider perfectly avoids all atomicity violations, while Grace only perfectly avoids four bugs.Grace causes wrong outputs for bug#19-bug#21, bug#23, bug#27, and bug#28 and fails to avoid bug#24.For the former case, we take bug#27, as shown in Figure 9, as an example to illustrate how Grace causes wrong output for applications because of its no support for revocable I/O.reads t 1 and t 2 in Figure 9, respectively, print prompts at their entry/exit points.e bug#27 occurs when thread t 1 is performing cache resizing while another thread t 2 is storing SQL queries into the same cache.
e cache resizing (in function resize) is not atomic to both query _ cache and query _ cache _ size, making it possible that the intermediate status of query _ cache _ size is read by other threads.For example, after t 1 calling free_cache to free query _ cache, it executes L04 to temporarily set query _ cache _ size to arg before calling init_cache to set it as meaningful values.However, during this gap (L05), thread t 2 may interleave in and finds query _ cache _ size is not 0, so it calls write_ block_data to write the memory pointed by query _ cache, triggering a segmentation fault.
When running under Convoider and Grace, threads t 1 and t 2 are treated as two concurrent transactions.Suppose t 2 runs immediately after t 1 .At beginning, both transactions observe non-NULL query _ cache and nonzero query _ cache _ size (L32-L35).en t 1 and t 2 concurrently read/write these two global variables.When finishing, suppose t 1 reaches its commit point before t 2 .Because there is no already-committed transaction conflicting with t 1 , it successfully commits and terminates.However, when t 2 tries to commit, it finds it conflicts with t 1 because of reading shared variables updated by t 1 .So it rolls back and re-executes.At this time, Convoider will remove all messages buffered in t 2 's output buffer while Grace can do nothing to revoke the already-printed messages.At last, t 2 will output its prompts again, leading to wrong outputs.
Another noteworthy point is that Grace cannot avoid bug#24 (shown in Figure 10) while Convoider can. is bug lies in Apache MPM worker subsystem.In this subsystem, there are a listener thread, which accepts socket connection and dispatches connections, and some worker threads, which get connections from the listener thread and do the actual job.
e listener thread and the worker threads communicate through a queue.When a connection is accepted by the listener thread, an element will be pushed into the queue and the worker thread will pop the element from the queue and do the job.
e queue keeps track of the number of idle worker threads in the system.If the number of idlers reaches to 0, the listener will stop pushing elements into the queue and wait on a condvar queue _ info->wait _ for _ idler (L09-L15).Whenever a worker thread finishes its job, it will raise a signal if it finds the number of current idlers is 0 (L25-L30).is bug happens when the following steps are taken: (1) Initially, the listener thread just accepted a connection and set idler as 0. (2) A worker thread finishes its job, calls ap_queue_in-fo_set _idle to increment idlers from 0 to 1. (3) e listener thread sees that the idlers is 1, so decreases it to 0, gets another connection, and then waits for idle worker threads in function qp_queue_info_wait_for_idler.(4) e worker thread resumes its execution and issues a conditional signal.(5) e listener thread is waked up by the signal just issued by the worker thread sets idler as -1, causing an assertion failure.
Grace fails to avoid bug#24 because it does not properly handle condvar operations in the transaction environment.Due to not deeming conditional signals/waits as demarcation points of transactions, Grace would make a transaction keep updates to shared variables private until committing and prevent other transactions seeing up-to-date values of shared variables.Looking at bug#24's code in Figure 10, we suppose that initially queue _ info->idlers is 2 before t 1 and t 2 run.When t 2 runs, it consecutively accepts two connections, pushes two elements into the worker queue, and decreases queue _ info->idlers to 0. When another connection comes, t 2 will wait on queue _ info->wait _ for _ idlers for idle workers because it finds queue _ info->idlers is 0. Suppose at this time, t 1 runs.It will see queue _ info->idlers is 2 because In contrast to Grace, Convoider can perfectly avoid bug#24 due to its appropriate condvar handling described in Section 4. Convoider views condvar signal/wait operations as demarcations of transactions.So, in Figure 9, when t 2 decreases queue _ info->idlers to 0 and waits on queue _ info->wait _ for _ idler, it has committed its updates to shared variables to the global memory.erefore, when t 1 runs, it will find queue _ info->idlers is 0 and then issue a signal onto queue _ info->wait _ for _ idler, making t 2 proceed.
We conclude that, at least to atomicity violation bugs listed in Table 8, Convoider can perfectly atomicity violations.And owing to its revocable I/O support and proper condvar handling, Convoider can correctly avoid bugs involving condvar and I/O operations.

Effectiveness on Avoiding Order
Violations.We apply Convoider on order violations bug#29-bug#31 and compare Convoider with Grace in terms of their avoidance capabilities.If no avoidance measures are taken, bug#29 and bug#30 will trigger segmentation faults while bug#31 will lead to unexpected outputs.After applying Convoider and Grace to them, we list the experiment results in Table 9.

16
Scientific Programming As shown in Table 9, both Convoider and Grace successfully avoid bug#30 and bug#31 but fail to avoid bug#29.According to Lu et al. [4], an order violation occurs when two operations (or two groups of operations) from two different threads are executed in an undesirable order.Convoider does not impose order control on transactions, so it only can avoid order violations with probability 0.5.Grace either does not guarantee transactions to finish in some specific order in most cases.e only exception is that if a parent thread accesses any global or heap memory pages after creating a child thread, Grace's sequential commit protocol will be triggered and the parent thread will pause until the child thread successfully commits.In that case, the child thread definitely finishes before the parent thread.
Taking bug#29 and bug#30 as examples to illustrate how Convoider and Grace fail and succeed in avoiding them, respectively.For bug#29 shown in Figure 11, the global variable global _ opt may be referenced by thread t 2 (L07-L09) before it is initialized with valid values in t 1 (L04), leading to a segmentation fault.is bug cannot be avoided by Convoider and Grace because t 2 almost always runs concurrently with t 1 and reads uninitialized value NULL of global _ opt.For bug#30 shown in Figure 12, the global variable h->bandwidth also may be referenced by t 2 (L08) before it is initialized with meaningful values in t 1 (L04).However, both Convoider and Grace are able to avoid this bug because t 1 always finishes before t 2 .If t 1 and t 2 run concurrently, bug#30 will be triggered and cannot be avoided by Convoider and Grace.
We conclude that, according to the experiment results, Convoider cannot avoid order violations in one hundred percentage.

Conclusion
is paper presents Convoider, a runtime STM system with proper condvar handling and revocable I/O support, for transparently transactionalizing multithreaded C/C++ applications and dynamically avoiding concurrency bugs such as deadlocks, data races, and atomicity violations.
We perform three experiments to evaluate and compare Convoider with Grace, Dthreads, Dimmunix, and Slider in terms of the applicability, efficiency, and effectiveness.Evaluation results show that Convoider succeeds in transparently transactionalizing twelve realworld applications with averagely incurring only 28% runtime overhead and perfectly avoids 94% of thirty-one concurrency bugs used in our experiments.Comparison results show that Convoider could correctly transactionalize many more applications and avoid many more concurrency bugs than other tools.is study can help efficiently transactionalize legacy multithreaded applications and effectively improve the runtime reliability of them.

( 2 )
Revocable I/O support in Convoider enables making I/O calls in transactions.(3) Proper condvar handling in Convoider allows using them in transactions.(4) Experiments are carried out for evaluating Convoider's efficiency and effectiveness in transactionalizing real-world applications and avoiding concurrency bugs.

Figure 4 :
Figure 4: e mechanism to make system I/O revocable.
subheaps in heap memory.If two threads that execute two concurrent transactions are HeadFigure 3: e heap memory management mechanism.
3.1.1.Virtual File System.Convoider turns thread into process.So each thread is actually a process.In Linux, each process has its own file descriptor table, maintained by the kernel.Associated with each file descriptor are a file descriptor flag and a pointer to an open file table entry.e open file table is maintained for all open files and each entry in it has three fields: a file status flag (O_RDONLY, O_APPEND, and so on), a current file offset, and a pointer to an entry in the v-node table.Each open file has a v-node structure that indicates the type of file and other in- Strategies.I/O operations that open, create, or duplicate files, such as open, creat, and dup (Table

Table 2 :
Categories of system I/O operations.
receives signals.Otherwise, the process receives signals when it begins a new transaction.
Dthreads improves Grace in handling conditional variables, just as Convoider, but in a different way.Not like Convoider, Dthreads neither supports revocable I/O nor avoids concurrency bugs.e input parameter for each application has two versions: sequential

Table 3 :
Applications to which Convoider is applicable.

Table 4 :
Input parameters for 12 real-world applications.

Table 5 :
31concurrency bugs and their related information.

Table 7 :
Data races avoidance results by Convoider and Grace ( : success; : wrong output).Both Convoider and Grace successfully avoid such crashes.However, Grace gets wrong output for bug#16 because it does not support revocable I/O.We conclude that, at least for race bugs listed in Table7, Convoider can perfectly avoid races.And owing to its revocable I/O support, Convoider does not cause wrong outputs when avoiding races for racy programs.6.4.3.Effectiveness on Avoiding AtomicityViolations.We apply Convoider to atomicity violations bug#18-bug#28 and compare Convoider with Grace in terms of their avoidance capabilities.If no avoidance measures are carried out, bug#18 and bug#27 will cause segmentation faults to the buggy programs containing them, bug#20-bug#22 and bug#24 will cause assertion failures, bug#19, bug#25, and bug#28 will cause unexpected outputs, bug#23 will cause a buffer overflow error, and lastly bug#26 will cause a double free error.After leveraging Convoider and Grace to avoid these bugs, we list the experiment results in Table