Synchronization Primitives

2020-04-05

system design

Processors are often utilized effectively by running multiple tasks simultaneously. To guarantee the correct results of concurrent tasks, synchronization is desired across processes that execute those tasks. The fundamental of synchronization relies on the support of hardware, upon which a wide variety of high-level synchronization operations are built on software.

Hardware implementation

For uniprocessor systems, synchronization can be achieved by disabling interrupts. A process disables interrupts before accessing a critical section, and enables interrupts once it exits the critical section. Since there is no preemption with interrupts disabled, no other processes will be scheduled and thus there won’t be concurrent access to the critical section.

Disabling interrupts is impractical on multiprocess systems. Every attempt of a critical section access requires interrupts of all processors to be disabled, which is very inefficient. Instead, we rely on a set of hardware primitives that allow to read and modify memory locations atomically. They are also the foundation upon which higher-level synchronizations are constructed.

Test-and-set

Test-and-set sets a memory location to 1 and also returns its original value. It can be used to implement the waiting operation of acquiring a spin lock.

Fetch-and-add

Fetch-and-add increments a specified value to a memory location, which allows sequential increments of a value over a concurrent increment operations. Fetch-and-add is the foundation of implementing mutex locks and semaphores.

Compare-and-swap

Compare-and-swap compares the value of a memory location with a given value. If they are identical, it then updates that memory location with another specified value. Different from test-and-set, which always sets the memory location to 1, compare-and-swap delays the update until the condition is met (and can specify a new value other than 1). Therefore, compare-and-swap can be used to synchronize multiple updates of a memory location while only letting one to go through which updates from a specified old value to a specified new value. One typical application of compare-and-swap is to implement atomic adders.

void add(int *p, int a)
{
    int value = *p;
    while (compare-and-swap(p, value, value + a))
    {
        value = *p;
    }
}

Load-linked and store-conditional (LL/SC)

A load-linked operation fetches a value from a memory location, while a store-conditional operation overwrites that memory location with a specified new value if there is no intervening store operations to that memory location since the load-linked operation.

Software implementation

Spin locks

A thread acquires a spin lock by simply waiting in a loop (without doing other useful tasks) until the lock becomes available. Spin locks are efficient if a thread acquiring the lock is likely to be blocked for a short period. Though it’s busy waiting, it saves the cost of context switches.

Spin locks are undesired if they are known to be held for long durations. For example, if one thread holds a lock for a period longer than its scheduling quantum, an interruption at the end of the quantum will pause its progress towards the lock release until the scheduling switches back on it. With that, other threads waiting for the lock simply waste their quanta spinning.

Spin locks can be implemented based on test-and-set.

void Lock(boolean *lock)
{ 
    while (test_and_set(lock) == 1); 
}

Alternatively, Peterson’s algorithm gives a software-based implementation for two threads.

bool flag[2] = {false, false};
int turn;

// Thread 0:
flag[0] = true;
turn = 1;
while (flag[1] == true && turn == 1)
{
    // busy wait
}
... // critical section
flag[0] = false;

// Thread 1:
flag[1] = true;
turn = 0;
while (flag[0] == true && turn == 0)
{
    // busy wait
}
... // critical section
flag[1] = false;

Semaphores

A semaphore is simply a variable, which represents the available resource of accessing a critical section. Waiting processes usually do not need to repeatedly checking the availability of a semaphore resource. Instead, they are blocked not consuming CPU resources. Many operating systems provide the support of unblocking a process once the semaphore resource becomes available. Typically two operations are provided with semaphores:

Operation P (wait) decrements a semaphore’s value by one. If the value becomes negative after the decrement, the process who applied operation P will be blocked and added to a waiting queue. Otherwise, the process is allowed to access the critical section.
Operation V (signal/post) increments a semaphore’s value by one. Then one process from the waiting queue, if any, will be unblocked and allowed to access the critical section.

Semaphores that allow an arbitrary resource count are called counting semaphoares, whereas a binary semaphore restricts the resource count to be only either 0 or 1. Binary semaphores can be used to implement locks and condition variables.

Condition variables and monitors

Condition variables block processes until certain predefined conditions are met. Compared to semaphores, they allow broader conditions to be defined. Condition variables are usually used under a synchronization framework managed by monitors. A monitor consists of a mutex (lock) object and one or more condition variables.

Acquire(mutex);
while (!predicate)
{
    Wait(mutex, condition1);
}
... // critical section
Signal(condition2); // or NotifyAll(condition2);
Release(mutex);

There are three operations on condition variables:

Wait(mutex, condition) lets a thread wait until the predicate becomes true. Specifically, the thread first releases the mutex and then sleeps in the waiting queue associated with the specified condition. Note that these two steps need to be done atomically. Once the thread is waken up from the waiting queue, it automatically reacquires the mutex.
Signal(condition) indicates that the specified condition is true. It then wakes up one thread from the waiting queue associated with the specified condition and let it resume. When multiple threads are waiting on the same condition and predicate, Signal is more efficient over NotifyAll.
NotifyAll(condition) indicates that the specified condition is true. Different from Signal, it wakes up all threads from the waiting queue associated with the specified condition. When multiple threads are waiting on the same condition but different predicates, NotifyAll will be required. Otherwise, if only one thread is waken up with the wrong predicate, it then immediately goes back to sleep without waking up other threads.