When two transactions run concurrently which types of problems encounter are?

Note that serial execution doesn't mean that each transaction will get the same results, regardless of the order.

Nội dung chính Show

What are the problems of concurrent transactions?
What are the potential problems when DBMS executes multiple transactions concurrently?
When two concurrent transactions update the same data this may lead to the problem called?

Consider the following two transactions:

T1: select sum(salary)
    from Employee where dept='Sales'

T2: insert into Employee
    values (....,'Sales',...)

If we execute

T1:           R(X) W(X) R(Y) W(Y)
T2: R(X) W(X)

5then

T1:           R(X) W(X) R(Y) W(Y)
T2: R(X) W(X)

6, we get a smaller salary total than if we execute

T1:           R(X) W(X) R(Y) W(Y)
T2: R(X) W(X)

6then

T1:           R(X) W(X) R(Y) W(Y)
T2: R(X) W(X)

In both cases, however, the salary total is consistent with the state of the database at the time the query is executed.

A serial execution of consistent transactions is always consistent.

If transactions execute under a concurrent (nonserial) schedule, the potential exists for conflict among their effects.

In the worst case, the effect of executing the transactions ...

is to leave the database in an inconsistent state
even though each transaction, by itself, is consistent

So why don't we observe such problems in real DBMSs? ...

concurrency control mechanisms handle them (see later).

Valid Concurrent Transaction

Not all concurrent executions cause problems.

For example, the schedules

T1: R(X) W(X)           R(Y) W(Y)
T2:           R(X) W(X)

T1: R(X) W(X)      R(Y)      W(Y)
T2:           R(X)      W(X)

or ...

leave the database in a consistent state.

Consider the following schedule where the transactions execute in parallel:

T1:  ...  A   B        C  ...
T2:    ...  A   B   C  ...

In this scenario:

T1:           R(X) W(X) R(Y) W(Y)
T2: R(X) W(X)

6reads data (

Database   T1                   T2
---------  -------------------  --------------
X    Y               X    Y               X
100  50              ?    ?               ?
           read(X)   100
           X:=X+N    105
105        write(X)
           read(Y)        50
           Y:=Y-N         45
     45    write(Y)
                                read(X)   105
                                X:=X+M    113
113                             write(X)
---------
113  45

2) that

T1:           R(X) W(X) R(Y) W(Y)
T2: R(X) W(X)

5is currently operating on

then makes changes to

Database   T1                   T2
---------  -------------------  --------------
X    Y               X    Y               X
100  50              ?    ?               ?
           read(X)   100
           X:=X+N    105
105        write(X)
           read(Y)        50
           Y:=Y-N         45
     45    write(Y)
                                read(X)   105
                                X:=X+M    113
113                             write(X)
---------
113  45

2and overwrites

T1:           R(X) W(X) R(Y) W(Y)
T2: R(X) W(X)

5's result

This is called a Write-Read (WR) Conflict or dirty read.

The result:

T1:           R(X) W(X) R(Y) W(Y)
T2: R(X) W(X)

5's update to

Database   T1                   T2
---------  -------------------  --------------
X    Y               X    Y               X
100  50              ?    ?               ?
           read(X)   100
           X:=X+N    105
105        write(X)
           read(Y)        50
           Y:=Y-N         45
     45    write(Y)
                                read(X)   105
                                X:=X+M    113
113                             write(X)
---------
113  45

2is lost.

Lost Update Problem (cont)

Consider the states in the WR Conflict schedule:

T1:  ...  A   B        C  ...
T2:    ...  A   B   C  ...

Consider the following schedule where one transaction fails:

T1:  ...  A   B        C  ...
T2:    ...  A   B   C  ...

Transaction

T1:           R(X) W(X) R(Y) W(Y)
T2: R(X) W(X)

5aborts after writing

Database   T1                   T2
---------  -------------------  --------------
X    Y               X    Y               X
100  50              ?    ?               ?
           read(X)   100
           X:=X+N    105
105        write(X)
           read(Y)        50
           Y:=Y-N         45
     45    write(Y)
                                read(X)   105
                                X:=X+M    113
113                             write(X)
---------
113  45

The abort will undo the changes to

Database   T1                   T2
---------  -------------------  --------------
X    Y               X    Y               X
100  50              ?    ?               ?
           read(X)   100
           X:=X+N    105
105        write(X)
           read(Y)        50
           Y:=Y-N         45
     45    write(Y)
                                read(X)   105
                                X:=X+M    113
113                             write(X)
---------
113  45

2, but where the undo occurs can affect the results.

Consider three places where undo might occur:

T1:  ...  A   B        C  ...
T2:    ...  A   B   C  ...

Temporary Update - Case 1

This scenario is ok.

T1:           R(X) W(X) R(Y) W(Y)
T2: R(X) W(X)

5's effects have been eliminated.

T1:  ...  A   B        C  ...
T2:    ...  A   B   C  ...

Temporary Update - Case 2

In this scenario, some of

T1:           R(X) W(X) R(Y) W(Y)
T2: R(X) W(X)

5's effects have been retained.

T1:  ...  A   B        C  ...
T2:    ...  A   B   C  ...

Temporary Update - Case 3

In this scenario,

T1:           R(X) W(X) R(Y) W(Y)
T2: R(X) W(X)

6's effects have been lost, even after commit.

T1:  ...  A   B        C  ...
T2:    ...  A   B   C  ...

For ACID, we must ensure that schedules are:

serializable
The effect of executing n concurrent transactions is the same as the effect of executing them serially in some order.
For assessing the correctness of concurrency control methods, need a test to check whether it produces serializable schedules.
recoverable
A failed transaction should not affect the ability to recover the system to a consistent state.

This can be ensured if transactions commit only after all transactions whose changes they read commit.

If a concurrent schedule for transactions T1 ..Tn acts like a serial schedule for T1 ..Tn, then consistency is guaranteed.

To determine this requires a notion of schedule equivalence.

Note: we are not attempting to determine equivalence of entire computations, simply of the interleaved sequences of read/write operations.

A serializable schedule is a concurrent schedule that produces a final state that is the same as that produced by some serial schedule.

There are two primary formulations of serializability:

conflict serializibility (read/write operations occur in the "right" order)
view serializibility (read operations see the correct version of data)

Consider two transactions T1 and T2 acting on data item X.

Considering only read/write operations, the possibilities are:

T1 firstT2 firstEquiv?R1(X) R2(X)R2(X) R1(X)yesR1(X) W2(X)W2(X) R1(X)noW1(X) R2(X)R2(X) W1(X)noW1(X) W2(X)W2(X) W1(X)no

If T1 and T2 act on different data items, result is equivalent regardless of order.

Conflict Serializability (cont)

Two transactions have a potential conflict if

they perform operations on the same data item
at least one of the operations is a write operation

In such cases, the order of operations affects the result.

Conversely, if two operations in a schedule don't conflict,
we can swap their order without affecting the overall result.

This gives a basis for determining equivalence of schedules.

Conflict Serializability (cont)

If we can transform a schedule

by swapping the orders of non-conflicting operations
such that the result is a serial schedule

then we say that the schedule is conflict serializible.

If a concurrent schedule is equivalent to some (any) serial schedule, then we have a consistency guarantee.

Conflict Serializability (cont)

Example: transform a concurrent schedule to serial schedule

T1:  ...  A   B        C  ...
T2:    ...  A   B   C  ...

View Serializability is

an alternative formulation of serializability
that is less conservative than conflict serializability (CS)
(some safe schedules that are view serializable are not conflict serializable)

As with CS, it is based on a notion of schedule equivalence

a schedule is "safe" if view equivalent to a serial schedule

The idea: if all the read operations in two schedules ...

always read the result of the same write operations
then the schedules must produce the same result

View Serializability (cont)

Two schedules S and S' on T1 .. Tn are view equivalent iff

for each shared data item X
- if Tj reads the initial value of X in S, then it also reads the initial value of X in S'
- if Tj reads X in S and X was produced by Tk, then Tj must also read the value of X produced by Tk in S'
- if Tj performs the final write of X in S, then it must also perform the final write of X in S'

To check serializibilty of S, find a serial schedule that is view equivalent to S

In practice, we don't test specific schedules for serializability.

However, in designing concurrency control schemes, we need a way of checking whether they produce "safe" schedules.

This is typically achieved by a demonstration that the scheme generates only serializable schedules, and we need a serializability test for this.

There is a simple and efficient test for conflict serializability;
there is a more complex test for view serializablity.

Both tests are based on notions of

building a graph to represent transaction interactions
testing properties of this graph (checking for cycles)

Testing Serializability (cont)

A precedence graph G = (V,E) for a schedule S consists of

a vertex in V for each transaction from T1 .. Tn
an edge in E for each pair Tj and Tk, such that
- there is a pair of conflicting operations between Tj & Tk
- the Tj operation occurs before the Tk operation

Note: the edge is directed from Tj → Tk

Testing Serializability (cont)

If an edge Tj → Tk exists in the precedence graph

then Tj must appear before Tk in any serial schedule

Implication: if the precedence graph has cycles, then S can't be serialized.

Thus, the serializability test is reduced to cycle-detection

(and there are cycle-detection algorithms available in many algorithms textbooks)

Serializability Test Examples

Serializable schedule (with conflicting operations shown in red):

T1:  ...  A   B        C  ...
T2:    ...  A   B   C  ...

Precedence graph for this schedule:

No cycles ⇒ serializable (as we already knew)

Serializability Test Examples (cont)

Consider this schedule:

T1:  ...  A   B        C  ...
T2:    ...  A   B   C  ...

Precedence graph for this schedule:

Has a cycle ⇒ not serializable

Serializability Test Examples (cont)

Consider this 3-transaction schedule:

T1: read(X)           T2: read(X)
    X := X + N            X := X + M
    write(X)              write(X)
    read(Y)
    Y := Y - N
    write(Y)

Precedence graph for this schedule:

No cycles ⇒ serializable

Serializability Test Examples (cont)

Consider this 3-transaction schedule:

T1: read(X)           T2: read(X)
    X := X + N            X := X + M
    write(X)              write(X)
    read(Y)
    Y := Y - N
    write(Y)

Precedence graph for this schedule:

Has a cycle ⇒ not serializable

Having serializability tests is useful theoretically, but they do not provide a practical tool for organising schedules.

Why not practical?

the # possible schedules for n transactions is O(n!)
the cost of testing for serializability via graphs is O(n2)

What is required are methods

that can be applied to each transaction individually
which guarantee that any combination of transactions is serializable

Concurrency Control (cont)

Approaches to ensuring ACID transactions:

lock-based
Synchronise transaction execution via locks on some portion of the database.
version-based
Allow multiple consistent versions of the data to exist, and allow each transaction exclusive access to one version.
timestamp-based
Organise transaction execution in advance by assigning timestamps to operations.
validation-based (optimistic concurrency control)
Exploit typical execution-sequence properties of transactions to determine safety dynamically.

Lock-based Concurrency Control

Synchronise access to shared data items via following rules:

before reading X, get shared (read) lock on X
before writing X, get exclusive (write) lock on X
an attempt to get a shared lock on X is blocked if another transaction already has exclusive lock on X
an attempt to get an exclusive lock on X is blocked if another transaction has any kind of lock on X

These rules alone do not guarantee serializability.

To guarantee serializability, we require an additional constraint on how locks are applied:

no transaction can request a lock after it has released one of its locks

Each transaction is then structured as:

growing phase where locks are acquired
action phase where "real work" is done
shrinking phase where locks are released

Appropriate locking can guarantee correctness.

However, it also introduces potential undesirable effects:

deadlock
No transactions can proceed; each waiting on lock held by another.
starvation
One transaction is permanently "frozen out" of access to data.
reduced performance
Locking introduces delays while waiting for locks to be released.

Deadlock occurs when two transactions are waiting for a lock on an item held by the other.

Example:

T1: read(X)           T2: read(X)
    X := X + N            X := X + M
    write(X)              write(X)
    read(Y)
    Y := Y - N
    write(Y)

Handling deadlock involves forcing a transaction to "back off".

select a process to "back off"
- choose on basis of how far transaction has progressed, # locks held, ...
roll back the selected process
- how far does this it need to be rolled back? (less roll-back is better)
prevent starvation
- need methods to ensure that same transaction isn't always chosen

Starvation occurs when one transaction

waits on a lock indefinitely
while other transactions continue normally

Whether it occurs depends on the lock wait/release strategy.

Multiple locks ⇒ need to decide which to release first.

Solutions:

implement a fair wait/release strategy (e.g. first-come-first-served)
use deadlock prevention schemes, such as "wait-die"

Locking typically reduces concurrency ⇒ reduces throughput.

Granularity of locking can impact performance:

+ lock a small item ⇒ more of database accessible

+ lock a small item ⇒ quick update ⇒ quick lock release

- lock small items ⇒ more locks ⇒ more lock management

Granularity levels: field, row (tuple), table, whole database

Multiple lock-granularities give best scope for optimising performance.

Multi-version Concurrency Control

One approach to reducing the requirement for locks is to

provide multiple (consistent) versions of the database
give each transaction access to an "appropriate" version
(i.e. a version that will mantain the serializability of the transaction)

This approach is called multi-version concurrency control (MVCC).

The primary difference between MVCC and standard locking models:

read locks do not conflict with write locks, so that
reading never blocks writing, and writing never blocks reading

Database systems using MVCC ensure

statement-level read consistency
(i.e. once an SQL SELECT statement starts, its view of the data is "frozen")
readers do not wait for writers or other readers of the same data
writers do not wait for readers of the same data
writers only wait for other writers if they attempt to update identical rows in concurrent transactions

With this behaviour:

a SELECT statement sees a consistent view of the database
but it may not see the "current" view of the database
E.g. T1 does a select and then concurrent T2 deletes some of T1's selected tuples

PostgreSQL uses MVCC to reduce locking requirements.

Consequences:

several versions of each tuple may exist ⇒ uses more storage
each transaction needs to check each tuple's visibility

periodically, clean up "old" tuples (

Database   T1                   T2
---------  -------------------  --------------
X    Y               X    Y               X
100  50              ?    ?               ?
                                read(X)   100
                                X:=X+M    108
108                             write(X)
           read(X)   108
           X:=X+N    113
113        write(X)
           read(Y)        50
           Y:=Y-N         45
     45    write(Y)
---------
113  45

An "old" tuple is one that is no longer visible to any transaction.

Concurrency control is still needed (via implicit locking):

amount of locking is determined by user-chosen isolation level
the system then applies appropriate locking automatically

PostgreSQL and MVCC (cont)

A transaction sees a consistent view of the database, but it may not see the "current" view of the database.

E.g. T1 does a select and then concurrent T2 deletes some of T1's selected tuples

This is not a problem unless the transactions communicate outside the database system.

For applications that require that every transaction accesses the current consistent version of the data, explicit locking must be used.

What are the problems of concurrent transactions?

If concurrency control is not maintained, three serious problems may be caused by concurrent transaction execution: lost updates, uncommitted data, and inconsistent retrievals.

What are the potential problems when DBMS executes multiple transactions concurrently?

Lost update problem, dirty read problem , unrepeatable read problem and phantom problems are the potential problem that might occur when a DBMS executes multiple transactions concurrently.

When two concurrent transactions update the same data this may lead to the problem called?

The lost update problem occurs when 2 concurrent transactions try to read and update the same data. Let's understand this with the help of an example.

When two transactions run concurrently which types of problems encounter are?

What are the problems of concurrent transactions?

What are the potential problems when DBMS executes multiple transactions concurrently?

When two concurrent transactions update the same data this may lead to the problem called?

Bài Viết Liên Quan

Quảng Cáo

Có thể bạn quan tâm

Toplist được quan tâm

Quảng cáo

Xem Nhiều

Quảng cáo

Chúng tôi

Điều khoản

Trợ giúp

Mạng xã hội