A request to storage systems often involves multiple operations, while the lack of integrity among individual operations complicates the management of a storage system. To address that, transactions are used as a container to encapsulate multiple individual operations and facilitate their management.
Transactions are commonly used in database systems. One typical example in our daily life is the ATM operations. For example, the transfer of $100 from Alice to Blob involves two operations:
- The deduction of $100 from Alice’s bank account.
- The deposit of $100 to Blob’s bank account.
With multiple operations, more strict requirements are applied to ensure the success of services. For example, in order to complete the above money transfer, both operations need to be completed. If there exists the failure of either operation, both Alice’s and Blob’s accounts would be rolled back to their original balances. Transactions are here to help achieve such guarantee and more.
Technically, the properties of transactions are summarized as ACID, naming atomicity, consistency, isolation, and durability. Let’s review each of them.
All or none
Atomicity requires that the operations of a transaction must either all succeed or none happens. It can never be partially completed. Atomicity is the fundamental for transactions. One of the main reasons we are using transactions is to manage multiple operations as a complete unit.
Atomicity needs to support the capability of rolling back any changes of a transaction (in order to support the scenario that none happens). This is the foundation for other properties such as consistency and isolation. For databases, transaction rollback is done with the help of logs. Recall that a database records every operation into logs. With that, an operation applied can be reproduced by redoing the corresponding logs. Similarly, the state changed by an operation can be rolled back by undoing the corresponding logs.
Crash recovery
It’s almost impossible to build a storage system that never crashes. Failure is the norm. Instead, fault tolerance is the thing we need to always keep in mind when building storage systems, for example, the crash recovery after failures occur. The properties of consistency and durability are more under the context of crashes. In fact, I think they apply to general storage systems, not limited to transactions.
Consistency requires a storage system to be always consistent despite of failure and recovery. A system that is in the consistent state must have its data always logically correct. For example, there should be no valid data being destroyed or the existence of data that shouldn’t exist. Let’s review one scenario. Given a set of data and metadata where the metadata tells the location of the data, we have one operation that updates both of them (e.g., a new data insertion that comes with the location metadata updated). Since there is no guarantee of the disk write sequence for the data and metadata (e.g., due to the I/O scheduling for performance), a crash may leave only the metadata updated while the write of data didn’t survive. The location recorded in the metadata thus points to some invalid content and the system results in an inconsistent state.
Durability says that once a transaction commits, its change must survive permanently. Similarly, crashes may break that, for example, due to asynchronous writes. Because synchronous writes are too expensive, a storage system usually delays writes even though it confirms the completion of a request. If crashes happen before the delayed write arrives to the disk, we will lose the data.
One strategy of supporting consistency and durability under crashes relies on logging (more specific, write-ahead logging) with two rules:
- We will not write the data until the corresponding logs have been saved to the disk.
- A request will not be completed until the corresponding logs persist.
Under crashes, replaying logs help guarantee that all the updates of an operation are reapplied (so no inconsistency exist) and any data that should be durable can always be reproduced even if they are volatile before the crash.
Isolation
When transactions are executed concurrently, isolation ensures the execution of one transaction would not affect the other transactions. More specifically, we expect the result of concurrent execution of transactions to be identical as that when those transactions are executed sequentially with a certain order. Isolation may impact the execution concurrency of multiple transactions. Meanwhile, we may gain better performance with high concurrency by releasing the isolation to certain extents, which means the break of serializability. In fact, there are different trade-offs between concurrency and consistency applied in practice according to specific scenarios. We will look into more details in the later article. Note that consistency here is about to which degree we are releasing the serializable isolation, while the property of consistency we discussed above indicates the usability of a storage system after crash recovery.