Tuesday, September 17, 2013

Writing Reentrant and Thread-Safe Code

    In single-threaded processes, only one flow of control exists. The code executed by these processes thus need not be reentrant or thread-safe. In multi-threaded programs, the same functions and the same resources may be accessed concurrently by several flows of control. To protect resource integrity, code written for multi-threaded programs must be reentrant and thread-safe.

    Reentrance and thread safety are both related to the way that functions handle resources. Reentrance and thread safety are separate concepts: a function can be either reentrant, thread-safe, both, or neither.
This section provides information about writing reentrant and thread-safe programs. It does not cover the topic of writing thread-efficient programs. Thread-efficient programs are efficiently parallelized programs. You must consider thread effiency during the design of the program. Existing single-threaded programs can be made thread-efficient, but this requires that they be completely redesigned and rewritten.

Reentrance

    A reentrant function does not hold static data over successive calls, nor does it return a pointer to static data. All data is provided by the caller of the function. A reentrant function must not call non-reentrant functions.

    A non-reentrant function can often, but not always, be identified by its external interface and its usage. For example, the strtok subroutine is not reentrant, because it holds the string to be broken into tokens. The ctime subroutine is also not reentrant; it returns a pointer to static data that is overwritten by each call.

Thread Safety

    A thread-safe function protects shared resources from concurrent access by locks. Thread safety concerns only the implementation of a function and does not affect its external interface.
In C language, local variables are dynamically allocated on the stack. Therefore, any function that does not use static data or other shared resources is trivially thread-safe, as in the following example:
/* thread-safe function */
int diff(int x, int y)
{
    int delta;
    delta = y - x;
    if (delta < 0)
    delta = -delta;
    return delta;
}

The use of global data is thread-unsafe. Global data should be maintained per thread or encapsulated, so that its access can be serialized. A thread may read an error code corresponding to an error caused by another thread. In AIX, each thread has its own errno value.

Making a Function Reentrant   

    In most cases, non-reentrant functions must be replaced by functions with a modified interface to be reentrant. Non-reentrant functions cannot be used by multiple threads. Furthermore, it may be impossible to make a non-reentrant function thread-safe.

Returning Data

Many non-reentrant functions return a pointer to static data. This can be avoided in the following ways:
  • Returning dynamically allocated data. In this case, it will be the caller's responsibility to free the storage. The benefit is that the interface does not need to be modified. However, backward compatibility is not ensured; existing single-threaded programs using the modified functions without changes would not free the storage, leading to memory leaks.
  • Using caller-provided storage. This method is recommended, although the interface must be modified.
For example, a strtoupper function, converting a string to uppercase, could be implemented as in the following code fragment:
/* non-reentrant function */
char *strtoupper(char *string)
{
    static char buffer[MAX_STRING_SIZE];
    int index;
    for (index = 0; string[index]; index++)
        buffer[index] = toupper(string[index]);
    buffer[index] = 0
    return buffer;
}
This function is not reentrant (nor thread-safe). To make the function reentrant by returning dynamically allocated data, the function would be similar to the following code fragment:
/* reentrant function (a poor solution) */
char *strtoupper(char *string)
{
    char *buffer;
    int index;
    /* error-checking should be performed! */
    buffer = malloc(MAX_STRING_SIZE);
    for (index = 0; string[index]; index++)
        buffer[index] = toupper(string[index]);
    buffer[index] = 0
    return buffer;
}
A better solution consists of modifying the interface. The caller must provide the storage for both input and output strings, as in the following code fragment:
/* reentrant function (a better solution) */
char *strtoupper_r(char *in_str, char *out_str)
{
    int index;
    for (index = 0; in_str[index]; index++)
    out_str[index] = toupper(in_str[index]);
    out_str[index] = 0
    return out_str;
}
The non-reentrant standard C library subroutines were made reentrant using caller-provided storage. This is discussed in Reentrant and Thread-Safe Libraries.

Keeping Data over Successive Calls

No data should be kept over successive calls, because different threads may successively call the function. If a function must maintain some data over successive calls, such as a working buffer or a pointer, the caller should provide this data.
Consider the following example. A function returns the successive lowercase characters of a string. The string is provided only on the first call, as with the strtok subroutine. The function returns 0 when it reaches the end of the string. The function could be implemented as in the following code fragment:

/* non-reentrant function */
char lowercase_c(char *string)
{
    static char *buffer;
    static int index;
    char c = 0;
    /* stores the string on first call */
    if (string != NULL) {
        buffer = string;
        index = 0;
    }
    /* searches a lowercase character */
    for (; c = buffer[index]; index++) {
        if (islower(c)) {
            index++;
            break;
        }
    }
    return c;
}
 
This function is not reentrant. To make it reentrant, the static data, the index variable, must be maintained by the caller. The reentrant version of the function could be implemented as in the following code fragment:
/* reentrant function */
char reentrant_lowercase_c(char *string, int *p_index)
{
    char c = 0;
    /* no initialization - the caller should have done it */
    /* searches a lowercase character */
    for (; c = string[*p_index]; (*p_index)++) {
        if (islower(c)) {
            (*p_index)++;
            break;
        }
    }
    return c;
}
 
The interface of the function changed and so did its usage. The caller must provide the string on each call and must initialize the index to 0 before the first call, as in the following code fragment:
char *my_string;
char my_char;
int my_index;
...
my_index = 0;
while (my_char = reentrant_lowercase_c(my_string, &my_index)) {
...
}

Making a Function Thread-Safe

In multi-threaded programs, all functions called by multiple threads must be thread-safe. However, a workaround exists for using thread-unsafe subroutines in multi-threaded programs. Non-reentrant functions usually are thread-unsafe, but making them reentrant often makes them thread-safe, too.

Locking Shared Resources

Functions that use static data or any other shared resources, such as files or terminals, must serialize the access to these resources by locks in order to be thread-safe. For example, the following function is thread-unsafe:
/* thread-unsafe function */
int increment_counter()
{
    static int counter = 0;
    counter++;
    return counter;
}
To be thread-safe, the static variable counter must be protected by a static lock, as in the following example:
/* pseudo-code thread-safe function */
int increment_counter();
{
    static int counter = 0;
    static lock_type counter_lock = LOCK_INITIALIZER;
    pthread_mutex_lock(counter_lock);
    counter++;
    pthread_mutex_unlock(counter_lock);
    return counter;
}
In a multi-threaded application program using the threads library, mutexes should be used for serializing shared resources. Independent libraries may need to work outside the context of threads and, thus, use other kinds of locks.

Workarounds for Thread-Unsafe Functions

It is possible to use a workaround to use thread-unsafe functions called by multiple threads. This can be useful, especially when using a thread-unsafe library in a multi-threaded program, for testing or while waiting for a thread-safe version of the library to be available. The workaround leads to some overhead, because it consists of serializing the entire function or even a group of functions. The following are possible workarounds:
  • Use a global lock for the library, and lock it each time you use the library (calling a library routine or using a library global variable). This solution can create performance bottlenecks because only one thread can access any part of the library at any given time. The solution in the following pseudocode is acceptable only if the library is seldom accessed, or as an initial, quickly implemented workaround.
    /* this is pseudo code! */
    lock(library_lock);
    library_call();
    unlock(library_lock);
    lock(library_lock);
    x = library_var;
    unlock(library_lock);
    
  • Use a lock for each library component (routine or global variable) or group of components. This solution is somewhat more complicated to implement than the previous example, but it can improve performance. Because this workaround should only be used in application programs and not in libraries, mutexes can be used for locking the library.
    /* this is pseudo-code! */
    lock(library_moduleA_lock);
    library_moduleA_call();
    unlock(library_moduleA_lock);
    lock(library_moduleB_lock);
    x = library_moduleB_var;
    unlock(library_moduleB_lock);
    

Reentrant and Thread-Safe Libraries

Reentrant and thread-safe libraries are useful in a wide range of parallel (and asynchronous) programming environments, not just within threads. It is a good programming practice to always use and write reentrant and thread-safe functions.

Using Libraries

Several libraries shipped with the AIX Base Operating System are thread-safe. In the current version of AIX, the following libraries are thread-safe:
  • Standard C library (libc.a)
  • Berkeley compatibility library (libbsd.a)
Some of the standard C subroutines are non-reentrant, such as the ctime and strtok subroutines. The reentrant version of the subroutines have the name of the original subroutine with a suffix _r (underscore followed by the letter r).
When writing multi-threaded programs, use the reentrant versions of subroutines instead of the original version. For example, the following code fragment:
token[0] = strtok(string, separators);
i = 0;
do {
    i++;
    token[i] = strtok(NULL, separators);
} while (token[i] != NULL);
should be replaced in a multi-threaded program by the following code fragment:
char *pointer;
...
token[0] = strtok_r(string, separators, &pointer);
i = 0;
do {
    i++;
    token[i] = strtok_r(NULL, separators, &pointer);
} while (token[i] != NULL);
Thread-unsafe libraries may be used by only one thread in a program. Ensure the uniqueness of the thread using the library; otherwise, the program will have unexpected behavior, or may even stop.

Converting Libraries

Consider the following when converting an existing library to a reentrant and thread-safe library. This information applies only to C language libraries.
  • Identify exported global variables. Those variables are usually defined in a header file with the export keyword. Exported global variables should be encapsulated. The variable should be made private (defined with the static keyword in the library source code), and access (read and write) subroutines should be created.
  • Identify static variables and other shared resources. Static variables are usually defined with the static keyword. Locks should be associated with any shared resource. The granularity of the locking, thus choosing the number of locks, impacts the performance of the library. To initialize the locks, the one-time initialization facility may be used. For more information, see One-Time Initializations.
  • Identify non-reentrant functions and make them reentrant. For more information, see Making a Function Reentrant.
  • Identify thread-unsafe functions and make them thread-safe. For more information, see Making a Function Thread-Safe.


One-Time Initializations

Some C libraries are designed for dynamic initialization, in which the global initialization for the library is performed when the first procedure in the library is called. In a single-threaded program, this is usually implemented using a static variable whose value is checked on entry to each routine, as in the following code fragment:

static int isInitialized = 0;
extern void Initialize();

int function()
{
        if (isInitialized == 0) {
                Initialize();
                isInitialized = 1;
        }
        ...
}
For dynamic library initialization in a multi-threaded program, a simple initialization flag is not sufficient. This flag must be protected against modification by multiple threads simultaneously calling a library function. Protecting the flag requires the use of a mutex; however, mutexes must be initialized before they are used. Ensuring that the mutex is only initialized once requires a recursive solution to this problem.
To keep the same structure in a multi-threaded program, use the pthread_once subroutine. Otherwise, library initialization must be accomplished by an explicit call to a library exported initialization function prior to any use of the library. The pthread_once subroutine also provides an alternative for initializing mutexes and condition variables.
Read the following to learn more about one-time initializations:

One-Time Initialization Object

The uniqueness of the initialization is ensured by the one-time initialization object. It is a variable having the pthread_once_t data type. In AIX and most other implementations of the threads library, the pthread_once_t data type is a structure.
A one-time initialization object is typically a global variable. It must be initialized with the PTHREAD_ONCE_INIT macro, as in the following example:

static pthread_once_t once_block = PTHREAD_ONCE_INIT;
The initialization can also be done in the initial thread or in any other thread. Several one-time initialization objects can be used in the same program. The only requirement is that the one-time initialization object be initialized with the macro.

One-Time Initialization Routine

The pthread_once subroutine calls the specified initialization routine associated with the specified one-time initialization object if it is the first time it is called; otherwise, it does nothing. The same initialization routine must always be used with the same one-time initialization object. The initialization routine must have the following prototype:
void init_routine();
The pthread_once subroutine does not provide a cancelation point. However, the initialization routine may provide cancelation points, and, if cancelability is enabled, the first thread calling the pthread_once subroutine may be canceled during the execution of the initialization routine. In this case, the routine is not considered as executed, and the next call to the pthread_once subroutine would result in recalling the initialization routine.
It is recommended to use cleanup handlers in one-time initialization routines, especially when performing non-idempotent operations, such as opening a file, locking a mutex, or allocating memory. For more information, see Using Cleanup Handlers.
One-time initialization routines can be used for initializing mutexes or condition variables or to perform dynamic initialization. In a multi-threaded library, the code fragment shown above (void init_routine();) would be written as follows:
static pthread_once_t once_block = PTHREAD_ONCE_INIT;
extern void Initialize();

int function()
{
        pthread_once(&once_block, Initialize);
        ...
}

No comments:

Post a Comment