Writing Reentrant and Thread-Safe Code
In single-threaded processes, only one flow of
control exists. The code executed by these processes thus need not be
reentrant or thread-safe. In multi-threaded programs, the same functions
and the same resources may be accessed concurrently by several flows of
control. To protect resource integrity, code written for multi-threaded
programs must be reentrant and thread-safe.
Reentrance and thread safety are both related to
the way that functions handle resources. Reentrance and thread safety
are separate concepts: a function can be either reentrant, thread-safe,
both, or neither.
This section provides information about writing
reentrant and thread-safe programs. It does not cover the topic of
writing thread-efficient programs. Thread-efficient programs are
efficiently parallelized programs. You must consider thread effiency
during the design of the program. Existing single-threaded programs can
be made thread-efficient, but this requires that they be completely
redesigned and rewritten.
Reentrance
A reentrant function does not hold static data
over successive calls, nor does it return a pointer to static data. All
data is provided by the caller of the function. A reentrant function
must not call non-reentrant functions.
A non-reentrant function can often, but not always, be identified by its external interface and its usage. For example, the strtok subroutine is not reentrant, because it holds the string to be broken into tokens. The ctime subroutine is also not reentrant; it returns a pointer to static data that is overwritten by each call.
Thread Safety
A thread-safe function protects shared resources
from concurrent access by locks. Thread safety concerns only the
implementation of a function and does not affect its external interface.
In C language, local variables are dynamically
allocated on the stack. Therefore, any function that does not use static
data or other shared resources is trivially thread-safe, as in the
following example:
/* thread-safe function */
int diff(int x, int y)
{
int delta;
delta = y - x;
if (delta < 0)
delta = -delta;
return delta;
}
The use of global data is thread-unsafe. Global
data should be maintained per thread or encapsulated, so that its access
can be serialized. A thread may read an error code corresponding to an
error caused by another thread. In AIX, each thread has its own errno value.
Making a Function Reentrant
In most cases, non-reentrant functions must be
replaced by functions with a modified interface to be reentrant.
Non-reentrant functions cannot be used by multiple threads. Furthermore,
it may be impossible to make a non-reentrant function thread-safe.
Returning Data
Many non-reentrant functions return a pointer to static data. This can be avoided in the following ways:
- Returning dynamically allocated data. In this case, it will be the caller's responsibility to free the storage. The benefit is that the interface does not need to be modified. However, backward compatibility is not ensured; existing single-threaded programs using the modified functions without changes would not free the storage, leading to memory leaks.
- Using caller-provided storage. This method is recommended, although the interface must be modified.
For example, a strtoupper function, converting a string to uppercase, could be implemented as in the following code fragment:
/* non-reentrant function */
char *strtoupper(char *string)
{
static char buffer[MAX_STRING_SIZE];
int index;
for (index = 0; string[index]; index++)
buffer[index] = toupper(string[index]);
buffer[index] = 0
return buffer;
}
This function is not reentrant (nor thread-safe).
To make the function reentrant by returning dynamically allocated data,
the function would be similar to the following code fragment:
/* reentrant function (a poor solution) */
char *strtoupper(char *string)
{
char *buffer;
int index;
/* error-checking should be performed! */
buffer = malloc(MAX_STRING_SIZE);
for (index = 0; string[index]; index++)
buffer[index] = toupper(string[index]);
buffer[index] = 0
return buffer;
}
A better solution consists of modifying the
interface. The caller must provide the storage for both input and output
strings, as in the following code fragment:
/* reentrant function (a better solution) */
char *strtoupper_r(char *in_str, char *out_str)
{
int index;
for (index = 0; in_str[index]; index++)
out_str[index] = toupper(in_str[index]);
out_str[index] = 0
return out_str;
}
The non-reentrant standard C library subroutines were made reentrant using caller-provided storage. This is discussed in Reentrant and Thread-Safe Libraries.
Keeping Data over Successive Calls
No data should be kept over successive calls,
because different threads may successively call the function. If a
function must maintain some data over successive calls, such as a
working buffer or a pointer, the caller should provide this data.
Consider the following example. A function
returns the successive lowercase characters of a string. The string is
provided only on the first call, as with the strtok
subroutine. The function returns 0 when it reaches the end of the
string. The function could be implemented as in the following code
fragment:
/* non-reentrant function */
char lowercase_c(char *string)
{
static char *buffer;
static int index;
char c = 0;
/* stores the string on first call */
if (string != NULL) {
buffer = string;
index = 0;
}
/* searches a lowercase character */
for (; c = buffer[index]; index++) {
if (islower(c)) {
index++;
break;
}
}
return c;
}
This function is not reentrant. To make it reentrant, the static data, the index
variable, must be maintained by the caller. The reentrant version of
the function could be implemented as in the following code fragment:
/* reentrant function */
char reentrant_lowercase_c(char *string, int *p_index)
{
char c = 0;
/* no initialization - the caller should have done it */
/* searches a lowercase character */
for (; c = string[*p_index]; (*p_index)++) {
if (islower(c)) {
(*p_index)++;
break;
}
}
return c;
}
The interface of the function changed and so did
its usage. The caller must provide the string on each call and must
initialize the index to 0 before the first call, as in the following
code fragment:
char *my_string;
char my_char;
int my_index;
...
my_index = 0;
while (my_char = reentrant_lowercase_c(my_string, &my_index)) {
...
}
Making a Function Thread-Safe
In multi-threaded programs, all functions called
by multiple threads must be thread-safe. However, a workaround exists
for using thread-unsafe subroutines in multi-threaded programs.
Non-reentrant functions usually are thread-unsafe, but making them
reentrant often makes them thread-safe, too.
Locking Shared Resources
Functions that use static data or any other
shared resources, such as files or terminals, must serialize the access
to these resources by locks in order to be thread-safe. For example, the
following function is thread-unsafe:
/* thread-unsafe function */
int increment_counter()
{
static int counter = 0;
counter++;
return counter;
}
To be thread-safe, the static variable counter must be protected by a static lock, as in the following example:
/* pseudo-code thread-safe function */
int increment_counter();
{
static int counter = 0;
static lock_type counter_lock = LOCK_INITIALIZER;
pthread_mutex_lock(counter_lock);
counter++;
pthread_mutex_unlock(counter_lock);
return counter;
}
In a multi-threaded application program using the
threads library, mutexes should be used for serializing shared
resources. Independent libraries may need to work outside the context of
threads and, thus, use other kinds of locks.
Workarounds for Thread-Unsafe Functions
It is possible to use a workaround to use
thread-unsafe functions called by multiple threads. This can be useful,
especially when using a thread-unsafe library in a multi-threaded
program, for testing or while waiting for a thread-safe version of the
library to be available. The workaround leads to some overhead, because
it consists of serializing the entire function or even a group of
functions. The following are possible workarounds:
- Use a global lock for the library, and lock it each time you use the
library (calling a library routine or using a library global variable).
This solution can create performance bottlenecks because only one
thread can access any part of the library at any given time. The
solution in the following pseudocode is acceptable only if the library
is seldom accessed, or as an initial, quickly implemented workaround.
/* this is pseudo code! */ lock(library_lock); library_call(); unlock(library_lock); lock(library_lock); x = library_var; unlock(library_lock);
- Use a lock for each library component (routine or global variable)
or group of components. This solution is somewhat more complicated to
implement than the previous example, but it can improve performance.
Because this workaround should only be used in application programs and
not in libraries, mutexes can be used for locking the library.
/* this is pseudo-code! */ lock(library_moduleA_lock); library_moduleA_call(); unlock(library_moduleA_lock); lock(library_moduleB_lock); x = library_moduleB_var; unlock(library_moduleB_lock);
Reentrant and Thread-Safe Libraries
Reentrant and thread-safe libraries are useful in
a wide range of parallel (and asynchronous) programming environments,
not just within threads. It is a good programming practice to always use
and write reentrant and thread-safe functions.
Using Libraries
Several libraries shipped with the AIX Base
Operating System are thread-safe. In the current version of AIX, the
following libraries are thread-safe:
- Standard C library (libc.a)
- Berkeley compatibility library (libbsd.a)
When writing multi-threaded programs, use the
reentrant versions of subroutines instead of the original version. For
example, the following code fragment:
token[0] = strtok(string, separators);
i = 0;
do {
i++;
token[i] = strtok(NULL, separators);
} while (token[i] != NULL);
should be replaced in a multi-threaded program by the following code fragment:
char *pointer;
...
token[0] = strtok_r(string, separators, &pointer);
i = 0;
do {
i++;
token[i] = strtok_r(NULL, separators, &pointer);
} while (token[i] != NULL);
Thread-unsafe libraries may be used by only one
thread in a program. Ensure the uniqueness of the thread using the
library; otherwise, the program will have unexpected behavior, or may
even stop.
Converting Libraries
Consider the following when converting an
existing library to a reentrant and thread-safe library. This
information applies only to C language libraries.
- Identify exported global variables. Those variables are usually defined in a header file with the export keyword. Exported global variables should be encapsulated. The variable should be made private (defined with the static keyword in the library source code), and access (read and write) subroutines should be created.
- Identify static variables and other shared resources. Static variables are usually defined with the static keyword. Locks should be associated with any shared resource. The granularity of the locking, thus choosing the number of locks, impacts the performance of the library. To initialize the locks, the one-time initialization facility may be used. For more information, see One-Time Initializations.
- Identify non-reentrant functions and make them reentrant. For more information, see Making a Function Reentrant.
- Identify thread-unsafe functions and make them thread-safe. For more information, see Making a Function Thread-Safe.
One-Time Initializations
Some C libraries are designed for dynamic
initialization, in which the global initialization for the library is
performed when the first procedure in the library is called. In a
single-threaded program, this is usually implemented using a static
variable whose value is checked on entry to each routine, as in the
following code fragment:
static int isInitialized = 0;
extern void Initialize();
int function()
{
if (isInitialized == 0) {
Initialize();
isInitialized = 1;
}
...
}
For dynamic library initialization in a
multi-threaded program, a simple initialization flag is not sufficient.
This flag must be protected against modification by multiple threads
simultaneously calling a library function. Protecting the flag requires
the use of a mutex; however, mutexes must be initialized before they are
used. Ensuring that the mutex is only initialized once requires a
recursive solution to this problem.
To keep the same structure in a multi-threaded program, use the pthread_once
subroutine. Otherwise, library initialization must be accomplished by
an explicit call to a library exported initialization function prior to
any use of the library. The pthread_once subroutine also provides an alternative for initializing mutexes and condition variables.
Read the following to learn more about one-time initializations:One-Time Initialization Object
The uniqueness of the initialization is ensured by the one-time initialization object. It is a variable having the pthread_once_t data type. In AIX and most other implementations of the threads library, the pthread_once_t data type is a structure.
A one-time initialization object is typically a global variable. It must be initialized with the PTHREAD_ONCE_INIT macro, as in the following example:
static pthread_once_t once_block = PTHREAD_ONCE_INIT;
The initialization can also be done in the
initial thread or in any other thread. Several one-time initialization
objects can be used in the same program. The only requirement is that
the one-time initialization object be initialized with the macro.
One-Time Initialization Routine
The pthread_once subroutine
calls the specified initialization routine associated with the specified
one-time initialization object if it is the first time it is called;
otherwise, it does nothing. The same initialization routine must always
be used with the same one-time initialization object. The initialization
routine must have the following prototype:
void init_routine();
The pthread_once subroutine does
not provide a cancelation point. However, the initialization routine
may provide cancelation points, and, if cancelability is enabled, the
first thread calling the pthread_once subroutine may be
canceled during the execution of the initialization routine. In this
case, the routine is not considered as executed, and the next call to
the pthread_once subroutine would result in recalling the initialization routine.
It is recommended to use cleanup handlers in
one-time initialization routines, especially when performing
non-idempotent operations, such as opening a file, locking a mutex, or
allocating memory. For more information, see Using Cleanup Handlers.
One-time initialization routines can be used for
initializing mutexes or condition variables or to perform dynamic
initialization. In a multi-threaded library, the code fragment shown
above (void init_routine();) would be written as follows:
static pthread_once_t once_block = PTHREAD_ONCE_INIT;
extern void Initialize();
int function()
{
pthread_once(&once_block, Initialize);
...
}
No comments:
Post a Comment