When two transactions run concurrently which types of problems encounter are?

Note that serial execution doesn't mean that each transaction will get the same results, regardless of the order.

Consider the following two transactions:

T1: select sum(salary)
    from Employee where dept='Sales'

T2: insert into Employee
    values (....,'Sales',...)

If we execute

T1:           R(X) W(X) R(Y) W(Y)
T2: R(X) W(X)
5then
T1:           R(X) W(X) R(Y) W(Y)
T2: R(X) W(X)
6, we get a smaller salary total than if we execute
T1:           R(X) W(X) R(Y) W(Y)
T2: R(X) W(X)
6then
T1:           R(X) W(X) R(Y) W(Y)
T2: R(X) W(X)
5.

In both cases, however, the salary total is consistent with the state of the database at the time the query is executed.


A serial execution of consistent transactions is always consistent.

If transactions execute under a concurrent (nonserial) schedule, the potential exists for conflict among their effects.

In the worst case, the effect of executing the transactions ...

  • is to leave the database in an inconsistent state
  • even though each transaction, by itself, is consistent
So why don't we observe such problems in real DBMSs? ...
  • concurrency control mechanisms handle them   (see later).


Valid Concurrent Transaction

Not all concurrent executions cause problems.

For example, the schedules

T1: R(X) W(X)           R(Y) W(Y)
T2:           R(X) W(X)

or

T1: R(X) W(X)      R(Y)      W(Y)
T2:           R(X)      W(X)

or ...

leave the database in a consistent state.


Consider the following schedule where the transactions execute in parallel:

T1:  ...  A   B        C  ...
T2:    ...  A   B   C  ...
0

In this scenario:

  • T1:           R(X) W(X) R(Y) W(Y)
    T2: R(X) W(X)
    
    6reads data (
    Database   T1                   T2
    ---------  -------------------  --------------
    X    Y               X    Y               X
    100  50              ?    ?               ?
               read(X)   100
               X:=X+N    105
    105        write(X)
               read(Y)        50
               Y:=Y-N         45
         45    write(Y)
                                    read(X)   105
                                    X:=X+M    113
    113                             write(X)
    ---------
    113  45
    
    2) that
    T1:           R(X) W(X) R(Y) W(Y)
    T2: R(X) W(X)
    
    5is currently operating on
  • then makes changes to
    Database   T1                   T2
    ---------  -------------------  --------------
    X    Y               X    Y               X
    100  50              ?    ?               ?
               read(X)   100
               X:=X+N    105
    105        write(X)
               read(Y)        50
               Y:=Y-N         45
         45    write(Y)
                                    read(X)   105
                                    X:=X+M    113
    113                             write(X)
    ---------
    113  45
    
    2and overwrites
    T1:           R(X) W(X) R(Y) W(Y)
    T2: R(X) W(X)
    
    5's result
This is called a Write-Read (WR) Conflict or dirty read.

The result:

T1:           R(X) W(X) R(Y) W(Y)
T2: R(X) W(X)
5's update to
Database   T1                   T2
---------  -------------------  --------------
X    Y               X    Y               X
100  50              ?    ?               ?
           read(X)   100
           X:=X+N    105
105        write(X)
           read(Y)        50
           Y:=Y-N         45
     45    write(Y)
                                read(X)   105
                                X:=X+M    113
113                             write(X)
---------
113  45
2is lost.


Lost Update Problem (cont)

Consider the states in the WR Conflict schedule:

T1:  ...  A   B        C  ...
T2:    ...  A   B   C  ...
1


Consider the following schedule where one transaction fails:

T1:  ...  A   B        C  ...
T2:    ...  A   B   C  ...
2

Transaction

T1:           R(X) W(X) R(Y) W(Y)
T2: R(X) W(X)
5aborts after writing
Database   T1                   T2
---------  -------------------  --------------
X    Y               X    Y               X
100  50              ?    ?               ?
           read(X)   100
           X:=X+N    105
105        write(X)
           read(Y)        50
           Y:=Y-N         45
     45    write(Y)
                                read(X)   105
                                X:=X+M    113
113                             write(X)
---------
113  45
2.

The abort will undo the changes to

Database   T1                   T2
---------  -------------------  --------------
X    Y               X    Y               X
100  50              ?    ?               ?
           read(X)   100
           X:=X+N    105
105        write(X)
           read(Y)        50
           Y:=Y-N         45
     45    write(Y)
                                read(X)   105
                                X:=X+M    113
113                             write(X)
---------
113  45
2, but where the undo occurs can affect the results.

Consider three places where undo might occur:

T1:  ...  A   B        C  ...
T2:    ...  A   B   C  ...
3


Temporary Update - Case 1

This scenario is ok.  

T1:           R(X) W(X) R(Y) W(Y)
T2: R(X) W(X)
5's effects have been eliminated.

T1:  ...  A   B        C  ...
T2:    ...  A   B   C  ...
4


Temporary Update - Case 2

In this scenario, some of

T1:           R(X) W(X) R(Y) W(Y)
T2: R(X) W(X)
5's effects have been retained.

T1:  ...  A   B        C  ...
T2:    ...  A   B   C  ...
5


Temporary Update - Case 3

In this scenario,

T1:           R(X) W(X) R(Y) W(Y)
T2: R(X) W(X)
6's effects have been lost, even after commit.

T1:  ...  A   B        C  ...
T2:    ...  A   B   C  ...
6


For ACID, we must ensure that schedules are:

  • serializable

    The effect of executing n concurrent transactions is the same as the effect of executing them serially in some order.

    For assessing the correctness of concurrency control methods, need a test to check whether it produces serializable schedules.

  • recoverable

    A failed transaction should not affect the ability to recover the system to a consistent state.

    This can be ensured if transactions commit only after all transactions whose changes they read commit.


If a concurrent schedule for transactions T1 ..Tn acts like a serial schedule for T1 ..Tn, then consistency is guaranteed.

To determine this requires a notion of schedule equivalence.

Note: we are not attempting to determine equivalence of entire computations, simply of the interleaved sequences of read/write operations.

A serializable schedule is a concurrent schedule that produces a final state that is the same as that produced by some serial schedule.

There are two primary formulations of serializability:

  • conflict serializibility   (read/write operations occur in the "right" order)
  • view serializibility   (read operations see the correct version of data)


Consider two transactions T1 and T2 acting on data item X.

Considering only read/write operations, the possibilities are:

T1 firstT2 firstEquiv?R1(X) R2(X)R2(X) R1(X)yesR1(X) W2(X)W2(X) R1(X)noW1(X) R2(X)R2(X) W1(X)noW1(X) W2(X)W2(X) W1(X)no

If T1 and T2 act on different data items, result is equivalent regardless of order.


Conflict Serializability (cont)

Two transactions have a potential conflict if

  • they perform operations on the same data item
  • at least one of the operations is a write operation
In such cases, the order of operations affects the result.

Conversely, if two operations in a schedule don't conflict,
we can swap their order without affecting the overall result.

This gives a basis for determining equivalence of schedules.


Conflict Serializability (cont)

If we can transform a schedule

  • by swapping the orders of non-conflicting operations
  • such that the result is a serial schedule
then we say that the schedule is conflict serializible.

If a concurrent schedule is equivalent to some (any) serial schedule, then we have a consistency guarantee.


Conflict Serializability (cont)

Example: transform a concurrent schedule to serial schedule

T1:  ...  A   B        C  ...
T2:    ...  A   B   C  ...
7


View Serializability is

  • an alternative formulation of serializability
  • that is less conservative than conflict serializability (CS)
    (some safe schedules that are view serializable are not conflict serializable)
As with CS, it is based on a notion of schedule equivalence
  • a schedule is "safe" if view equivalent to a serial schedule
The idea: if all the read operations in two schedules ...
  • always read the result of the same write operations
  • then the schedules must produce the same result


View Serializability (cont)

Two schedules S and S' on T1 .. Tn are view equivalent iff

  • for each shared data item X
    • if Tj reads the initial value of X in S, then it also reads the initial value of X in S'
    • if Tj reads X in S and X was produced by Tk, then Tj must also read the value of X produced by Tk in S'
    • if Tj performs the final write of X in S, then it must also perform the final write of X in S'
To check serializibilty of S, find a serial schedule that is view equivalent to S


In practice, we don't test specific schedules for serializability.

However, in designing concurrency control schemes, we need a way of checking whether they produce "safe" schedules.

This is typically achieved by a demonstration that the scheme generates only serializable schedules, and we need a serializability test for this.

There is a simple and efficient test for conflict serializability;
there is a more complex test for view serializablity.

Both tests are based on notions of

  • building a graph to represent transaction interactions
  • testing properties of this graph (checking for cycles)


Testing Serializability (cont)

A precedence graph G = (V,E) for a schedule S consists of

  • a vertex in V for each transaction from T1 .. Tn
  • an edge in E for each pair Tj and Tk, such that
    • there is a pair of conflicting operations between Tj & Tk
    • the Tj operation occurs before the Tk operation
Note: the edge is directed from Tj   →   Tk


Testing Serializability (cont)

If an edge Tj → Tk exists in the precedence graph

  • then Tj must appear before Tk in any serial schedule
Implication: if the precedence graph has cycles, then S can't be serialized.

Thus, the serializability test is reduced to cycle-detection

(and there are cycle-detection algorithms available in many algorithms textbooks)


Serializability Test Examples

Serializable schedule   (with conflicting operations shown in red):

T1:  ...  A   B        C  ...
T2:    ...  A   B   C  ...
8

Precedence graph for this schedule:

When two transactions run concurrently which types of problems encounter are?

No cycles ⇒ serializable  (as we already knew)


Serializability Test Examples (cont)

Consider this schedule:

T1:  ...  A   B        C  ...
T2:    ...  A   B   C  ...
9

Precedence graph for this schedule:

When two transactions run concurrently which types of problems encounter are?

Has a cycle ⇒ not serializable


Serializability Test Examples (cont)

Consider this 3-transaction schedule:

T1: read(X)           T2: read(X)
    X := X + N            X := X + M
    write(X)              write(X)
    read(Y)
    Y := Y - N
    write(Y)
0

Precedence graph for this schedule:

When two transactions run concurrently which types of problems encounter are?

No cycles ⇒ serializable


Serializability Test Examples (cont)

Consider this 3-transaction schedule:

T1: read(X)           T2: read(X)
    X := X + N            X := X + M
    write(X)              write(X)
    read(Y)
    Y := Y - N
    write(Y)
1

Precedence graph for this schedule:

When two transactions run concurrently which types of problems encounter are?

Has a cycle ⇒ not serializable


Having serializability tests is useful theoretically, but they do not provide a practical tool for organising schedules.

Why not practical?

  • the # possible schedules for n transactions is O(n!)
  • the cost of testing for serializability via graphs is O(n2)
What is required are methods
  • that can be applied to each transaction individually
  • which guarantee that any combination of transactions is serializable


Concurrency Control (cont)

Approaches to ensuring ACID transactions:

  • lock-based

    Synchronise transaction execution via locks on some portion of the database.

  • version-based

    Allow multiple consistent versions of the data to exist, and allow each transaction exclusive access to one version.

  • timestamp-based

    Organise transaction execution in advance by assigning timestamps to operations.

  • validation-based   (optimistic concurrency control)

    Exploit typical execution-sequence properties of transactions to determine safety dynamically.


Lock-based Concurrency Control

Synchronise access to shared data items via following rules:

  • before reading X, get shared (read) lock on X
  • before writing X, get exclusive (write) lock on X
  • an attempt to get a shared lock on X is blocked if another transaction already has exclusive lock on X
  • an attempt to get an exclusive lock on X is blocked if another transaction has any kind of lock on X
These rules alone do not guarantee serializability.


To guarantee serializability, we require an additional constraint on how locks are applied:

  • no transaction can request a lock after it has released one of its locks
Each transaction is then structured as:
  • growing phase where locks are acquired
  • action phase where "real work" is done
  • shrinking phase where locks are released


Appropriate locking can guarantee correctness.

However, it also introduces potential undesirable effects:

  • deadlock

    No transactions can proceed; each waiting on lock held by another.

  • starvation

    One transaction is permanently "frozen out" of access to data.

  • reduced performance

    Locking introduces delays while waiting for locks to be released.


Deadlock occurs when two transactions are waiting for a lock on an item held by the other.

Example:

T1: read(X)           T2: read(X)
    X := X + N            X := X + M
    write(X)              write(X)
    read(Y)
    Y := Y - N
    write(Y)
2


Handling deadlock involves forcing a transaction to "back off".

  • select a process to "back off"
    • choose on basis of how far transaction has progressed, # locks held, ...
  • roll back the selected process
    • how far does this it need to be rolled back? (less roll-back is better)
  • prevent starvation
    • need methods to ensure that same transaction isn't always chosen


Starvation occurs when one transaction

  • waits on a lock indefinitely
  • while other transactions continue normally
Whether it occurs depends on the lock wait/release strategy.

Multiple locks ⇒ need to decide which to release first.

Solutions:

  • implement a fair wait/release strategy (e.g. first-come-first-served)
  • use deadlock prevention schemes, such as "wait-die"


Locking typically reduces concurrency ⇒ reduces throughput.

Granularity of locking can impact performance:

+ lock a small item ⇒ more of database accessible

+ lock a small item ⇒ quick update ⇒ quick lock release

- lock small items ⇒ more locks ⇒ more lock management

Granularity levels: field, row (tuple), table, whole database

Multiple lock-granularities give best scope for optimising performance.


Multi-version Concurrency Control

One approach to reducing the requirement for locks is to

  • provide multiple (consistent) versions of the database
  • give each transaction access to an "appropriate" version
    (i.e. a version that will mantain the serializability of the transaction)
This approach is called multi-version concurrency control (MVCC).

The primary difference between MVCC and standard locking models:

  • read locks do not conflict with write locks, so that
  • reading never blocks writing, and writing never blocks reading


Database systems using MVCC ensure

  • statement-level read consistency
    (i.e. once an SQL SELECT statement starts, its view of the data is "frozen")
  • readers do not wait for writers or other readers of the same data
  • writers do not wait for readers of the same data
  • writers only wait for other writers if they attempt to update identical rows in concurrent transactions
With this behaviour:
  • a SELECT statement sees a consistent view of the database
  • but it may not see the "current" view of the database

    E.g. T1 does a select and then concurrent T2 deletes some of T1's selected tuples


PostgreSQL uses MVCC to reduce locking requirements.

Consequences:

  • several versions of each tuple may exist ⇒ uses more storage
  • each transaction needs to check each tuple's visibility
  • periodically, clean up "old" tuples   (
    Database   T1                   T2
    ---------  -------------------  --------------
    X    Y               X    Y               X
    100  50              ?    ?               ?
                                    read(X)   100
                                    X:=X+M    108
    108                             write(X)
               read(X)   108
               X:=X+N    113
    113        write(X)
               read(Y)        50
               Y:=Y-N         45
         45    write(Y)
    ---------
    113  45
    
    4)
An "old" tuple is one that is no longer visible to any transaction.

Concurrency control is still needed (via implicit locking):

  • amount of locking is determined by user-chosen isolation level
  • the system then applies appropriate locking automatically


PostgreSQL and MVCC (cont)

A transaction sees a consistent view of the database, but it may not see the "current" view of the database.

E.g. T1 does a select and then concurrent T2 deletes some of T1's selected tuples

This is not a problem unless the transactions communicate outside the database system.

For applications that require that every transaction accesses the current consistent version of the data, explicit locking must be used.

What are the problems of concurrent transactions?

If concurrency control is not maintained, three serious problems may be caused by concurrent transaction execution: lost updates, uncommitted data, and inconsistent retrievals.

What are the potential problems when DBMS executes multiple transactions concurrently?

Lost update problem, dirty read problem , unrepeatable read problem and phantom problems are the potential problem that might occur when a DBMS executes multiple transactions concurrently.

When two concurrent transactions update the same data this may lead to the problem called?

The lost update problem occurs when 2 concurrent transactions try to read and update the same data. Let's understand this with the help of an example.