The Transaction Concept: Virtues and Limitations Jim Gray One line summary: This papers presents an overview of the general properties of a transaction. It also talks about some problems which need to be solved. (in 1981) What is a transaction? * A transaction is "a collection of actions which comprise a consistent transformation of the state. It satisfies the system consistency constraints to transform the system from one consistent state to another. * System consistency constraints: assertions about the values of records and the transformations allowed. * Properties: (ACID) o Atomicity: the transaction either happens or it does not. No "leaving in the middle: is allowed. o Consistency: obey legal protocols. o Isolated: (not in the paper but I think was mentioned by Jim Gray's 262 lecture) transactions can execute concurrently, but it must appears to each successful transaction that it has executed in a serial schedule with the others. o Durability: once a transaction is committed, it cannot be back out. Committed: point of no going back. Aborted: all of the effects of a transaction need to be back out. Relaxed these constraints so that some actions can be ignored: o unprotected: the action need not be undone or redone if the transaction must be aborted. e.g. actions on scratch spaces. o protected: the action can and must be undone or redone if the transaction must be aborted. e.g. conventional database operations, insert, delete. o real: this action cannot be undone once done. e.g. committed actions. NonStop: How to get atomicity and durability? Build a perfect system! But only need an "almost perfect" system: * A system that fails once every many years is acceptable. * Human errors can cause failures, and this cannot be prevented by a perfect system. How to build a very reliable system? * Use many many unreliable parts to provide redundancy on a grand scale. For example, each process in NonStop has a backup process which can continue to work in case the first one died. Update in place: According to accounting principles, update-in-place is bad because there is no history on the changes made. Solutions: Time-domain Addressing: (also called version oriented system) Each objects is not updated but "evolved" to have some additional info. Object addresses become , so that it is possible to retrieve the state of an object at any time in the past. * Locking issues: Writing the header of the object implicitly lock the object from others. * Problems: Every read becomes a write because needs to update the time stamp. Waits are aborts. Timestamps forces a single granularity. Cannot have shared read access. Locking and logging: Allows undo and redo of actions by keeping an entry for each action in a log. The log should be kept in stable storage to avoid losing log entries in case of a system crash. Some terminology: * Restartability: handles failures during undo and redo phases. * Repeatability: to ensure a transaction reads the same value repeatedly , i.e., it cannot read some uncommitted values of another transaction. To accomplish this, use exclusive write locks and shared read locks. * Two phase commit protocol: A coordinator tells the nodes to commit, if a node said "yes, I will commit", it cannot say no later. The transaction commits if all nodes agreed to commit. Otherwise the transaction aborts. This is used for distributed transactions. Implementations of locking: * Predicate locking: a predicate can covers the exact type of records that a transaction wants to lock. This is expensive. * Hierarchical locking; pick a fixed set of predicates and group them into an acyclic graph and lock from root to leaf. Loses generality. Limitations: * Nested transactions: E.g. travel agent booking you a vacation. The whole process is a transaction to the customer. Each action (book tickets, car rental) is also a transaction to the agent, which can be aborted if for example, no mid-size cars are available. * Long-lived transactions: Frequency of deadlock goes up with the square of the multiprogramming level, and fourth power of the transaction size. Partial soln: allows only "active" transaction to hold locks. * Integration with programming languages I think these limitations have been investigated by various research groups.