Transaction Processing Concepts – Advanced Database Systems

Understand Transaction Processing Concepts in Advanced Database Systems. Learn about ACID properties, concurrency control problems, serializability, and recovery in this comprehensive lesson.


1. Introduction to Transactions

1.1 What is a Transaction?

Important Points:
  • A transaction is a logical unit of database processing.
  • It includes one or more database access operations (insertion, deletion, modification, retrieval).
  • It is an atomic unit of work: either completed entirely or not done at all.
  • Boundaries are defined by BEGIN_TRANSACTION and END_TRANSACTION.

Detailed Explanation:

Imagine you are transferring money from your bank account to your friend’s account. This process involves two steps:

  1. Money is deducted from your account.
  2. Money is added to your friend’s account.

In database terms, these two steps must happen together as a single unit. If the first step happens but the second doesn’t (maybe due to a power cut), the database would be in an incorrect state (money disappeared). This single unit of work is called a Transaction.

A transaction ensures that a group of operations is treated as one indivisible block. The database system ensures that either all operations in the block happen, or none of them happen.

Types of Transactions:

  • Read-only Transaction: The transaction only retrieves data. It does not change anything.
  • Read-write Transaction: The transaction reads data and also updates (inserts, deletes, modifies) data.

Example:

A student registering for a course. The transaction involves:

  • Checking if the course has space (Read).
  • Adding the student ID to the course list (Write/Insert).
  • Decreasing the available seat count (Write/Update).
Exam Question:
Which of the following best describes a transaction in a database system?

A) A single SQL query that only reads data
B) A logical unit of work that must be completed entirely or not at all
C) A physical storage unit on the hard disk
D) A user login session

1.2 Transaction Operations

Important Points:
  • BEGIN_TRANSACTION: Marks the start of the transaction.
  • READ/WRITE: The actual operations performed on data items.
  • END_TRANSACTION: Marks the end of the transaction logic.
  • COMMIT_TRANSACTION: Signals successful completion; changes are saved permanently.
  • ROLLBACK (ABORT): Signals unsuccessful completion; changes are undone.

Detailed Explanation:

The database management system (DBMS) needs to track the lifecycle of a transaction. This involves specific commands that tell the system what is happening.

1. BEGIN_TRANSACTION: This tells the DBMS “I am starting a new task. Please track everything I do from this point.”

2. READ or WRITE: These are the actual work being done. Reading retrieves data; Writing changes data.

3. END_TRANSACTION: This tells the DBMS “I have finished my work.” However, the decision to keep the changes hasn’t been made yet.

4. COMMIT_TRANSACTION: This is the “Save” button. It tells the DBMS “Everything went well. Make these changes permanent.”

5. ROLLBACK: This is the “Undo” button. It tells the DBMS “Something went wrong. Forget everything I did and go back to the previous state.”

Transaction Lifecycle Example:

BEGIN_TRANSACTION
   |
   v
READ(Account_Balance)  ----> Get current balance
   |
   v
WRITE(New_Balance)     ----> Update the balance
   |
   v
[Check for Errors]
   |
   +----[Success]----> COMMIT_TRANSACTION
   |
   +----[Failure]----> ROLLBACK

1.3 Transaction States

Important Points:
  • Active: Transaction is currently executing.
  • Partially Committed: The last statement has been executed, but changes aren’t final yet.
  • Committed: Changes are permanently saved.
  • Failed: Transaction cannot proceed normally.
  • Aborted: Transaction has been rolled back to the previous state.

Detailed Explanation:

A transaction moves through different states during its life, similar to how a student moves from “Enrolled” to “Graduated” or “Dropped Out.”

                +--------+
                | Active | <--- Transaction starts here
                +--------+
                     |
                     | (End transaction statement)
                     v
        +----------------------+
        | Partially Committed  | <--- Execution done, checking...
        +----------------------+
               /         \
              /           \ (Success)
             / (Fail)      \
            v               v
      +---------+      +-----------+
      | Failed  |      | Committed | <--- Changes are permanent
      +---------+      +-----------+
           |
           | (Rollback)
           v
      +---------+
      | Aborted | <--- Database restored to previous state
      +---------+
  • Active State: The transaction stays here while it is reading and writing data.
  • Partially Committed: After the final operation, the system checks if everything is okay (e.g., "Did the disk write successfully?").
  • Committed State: Once the check passes, the transaction is "Committed." The changes are now permanent.
  • Failed/Aborted: If the check fails or an error occurs, the transaction goes to "Failed" and then "Aborted." The DBMS undoes any changes made.
Exam Question:
What happens when a transaction enters the "Aborted" state?

A) The changes made by the transaction are saved to the database.
B) The transaction continues to execute in the background.
C) The database is restored to the state it was in before the transaction started.
D) The transaction moves to the "Committed" state.


2. Properties of Transactions (ACID)

2.1 The ACID Properties

Important Points:
  • Atomicity: "All or nothing" property.
  • Consistency: Must transform database from one consistent state to another.
  • Isolation: Transactions execute independently; partial results are invisible to others.
  • Durability: Permanent changes survive failures.

Detailed Explanation:

To guarantee reliable transaction processing, databases follow four fundamental properties, known collectively as ACID.

See also  Query Processing and Optimization - Advanced Database Systems

1. Atomicity (The "All or Nothing" Property)

A transaction is like an atom—it cannot be split. Either all its operations happen, or none of them do. There is no middle ground.

Responsibility: DBMS (specifically the Recovery Manager).

Example: In a funds transfer, if the "deduct money" step works but "add money" fails, Atomicity ensures the "deduct" step is undone so money isn't lost.

2. Consistency Preservation

A transaction must follow the rules of the database. If the database is correct before the transaction, it must be correct after. It cannot violate integrity constraints.

Responsibility: Application developers and DBMS.

Example: If a rule says "Account balance cannot be negative," a transaction trying to withdraw more money than exists will be rejected to preserve consistency.

3. Isolation

Transactions often run at the same time (concurrently). Isolation ensures that one transaction cannot see the intermediate (unfinished) work of another transaction.

Responsibility: DBMS (specifically the Concurrency Control Manager).

Example: If Transaction A is transferring money but hasn't finished, Transaction B should not be able to read the balance until Transaction A is done.

4. Durability

Once a transaction says "Commit" (it is successful), the changes must be permanent. Even if the power goes out immediately after, the data must be saved.

Responsibility: DBMS (Recovery Manager).

Example: You withdraw cash from an ATM and the receipt prints. Even if the bank's server crashes one second later, your account balance must reflect the withdrawal.

Exam Question:
Which ACID property ensures that once a transaction is committed, its changes are permanent and will survive system failures?

A) Atomicity
B) Consistency
C) Isolation
D) Durability


3. Database Architecture for Transactions

3.1 Key Modules in DBMS

Important Points:
  • Transaction Manager: Coordinates transactions for application programs.
  • Scheduler: Implements concurrency control strategies.
  • Recovery Manager: Restores the database to a consistent state after a failure.
  • Buffer Manager: Transfers data between disk storage and main memory.

Detailed Explanation:

The DBMS has specialized modules to handle transactions. Think of them as different departments in a company working together.

ModuleRole (Job Description)
Transaction ManagerHandles the transaction lifecycle (begin, commit, abort). It acts as the "Coordinator."
SchedulerControls the order in which transactions execute to prevent interference (Concurrency Control).
Recovery ManagerEnsures the database can recover from crashes. It maintains logs (records) of all changes.
Buffer ManagerManages the movement of data. It brings data from the slow hard disk into fast memory (RAM) for processing.
   [ Application Program ]
            |
            v
+-----------------------+
| Transaction Manager   | <--- Coordinates execution
+-----------------------+
            |
            v
+-----------------------+
|     Scheduler         | <--- Controls order (Concurrency)
+-----------------------+
            |
            v
+-----------------------+
|   Buffer Manager      | <--- Moves data Disk <-> RAM
+-----------------------+
            |
            v
      [ Disk Storage ]
Exam Question:
Which DBMS module is responsible for ensuring that concurrent transactions do not interfere with each other?

A) Transaction Manager
B) Recovery Manager
C) Scheduler
D) Buffer Manager


4. Concurrency Control

4.1 Why Concurrency Control is Needed

Important Points:
  • Concurrency: Managing simultaneous operations on the database.
  • Objective: Enable many users to access shared data at the same time.
  • Risk: Reading is safe; Writing (updating) can cause interference and inconsistencies.

Detailed Explanation:

In a multi-user database system (like a university registration system), hundreds of users might try to access data at the same time.

If everyone is just reading data, there is no problem. Everyone sees the same thing.

However, if users are writing (updating) data at the same time, they can interfere with each other, leading to incorrect data.

4.2 Problems Caused by Concurrency

Important Points:
  • Lost Update Problem: One update overwrites another.
  • Uncommitted Dependency (Dirty Read): Reading data from a transaction that hasn't committed yet.
  • Inconsistent Analysis Problem: Reading data that is being modified, leading to incorrect calculations.

Problem 1: The Lost Update Problem

Explanation: This happens when two transactions read the same value and update it. One transaction's update is "lost" because it is overwritten by the other.

Example:

Two transactions T1 and T2 are updating a bank balance (Balx) which starts at 100.

TimeT1T2Balx (in DB)
t1Begin Transaction-100
t2read(balx) -> reads 100Begin Transaction100
t3balx = balx - 10read(balx) -> reads 100100
t4write(balx) -> writes 90balx = balx + 10090
t5Commitwrite(balx) -> writes 200!200
t6-Commit200

Result: T1 updated the balance to 90. But T2, which had read the *old* value (100), wrote 200. T1's deduction of 10 birr is lost. The balance should be 190, but it is 200.

Problem 2: Uncommitted Dependency (Dirty Read)

Explanation: This occurs when a transaction reads data written by another transaction that has not yet committed. If the second transaction aborts (rolls back), the first transaction has read "dirty" (invalid) data.

Example:

TimeT3T4Balx
t1Begin Transaction-100
t2read(balx)-100
t3balx = balx + 100-100
t4write(balx) -> 200Begin Transaction200
t5-read(balx) -> reads 200200
t6Rollback (Fails!)-100 (Restored)
t7-Uses value 200...100

Result: T4 read the value 200. However, T3 rolled back (undid its changes). The actual database value is back to 100, but T4 is working with the value 200, which never officially existed.

Problem 3: Inconsistent Analysis Problem

Explanation: This happens when a transaction reads several values, but a second transaction updates some of them while the first is still running. The first transaction gets a "snapshot" that mixes old and new data.

Example: T5 is calculating a sum. T6 is updating accounts.

TimeT5 (Sum)T6 (Update)BalxBalyBalzSum
t1BeginBegin1005025-
t2sum=0read(balx)10050250
t3read(balx)balx=balx-1010050250
t4sum=sum+balx (100)write(balx)905025100
t5read(baly)read(balz)905025100
t6sum=sum+baly (50)balz=balz+10905025150
t7-write(balz)905035150
t8read(balz)Commit905035150
t9sum=sum+balz (35)-905035185

Result: T5 read old Balx (100) but new Balz (35). The total sum is 185. The correct sum should be either 175 (all old values) or 185 (all new values), or 195 (100+50+35 or 90+50+35 depending on logic). Mixing states gives an inconsistent analysis. (Note: PDF data shows sum=185 but states it is incorrect based on timing).

Exam Question:
What is the "Lost Update" problem?

A) A transaction reads data that is later rolled back.
B) Two transactions update the same data item, and one update overwrites the other.
C) A transaction calculates a sum based on mixed old and new values.
D) The database loses power during an update.


5. Recovery and Failures

5.1 Why Recovery is Needed

Important Points:
  • The DBMS must not allow partial execution of a transaction.
  • If a transaction fails, the database must be restored to a previous consistent state.

5.2 Types of Failures

Important Points:
  • Transaction Failure: Logical errors or bad input.
  • System Crash: Hardware or software failure (power loss).
  • Media Failure: Disk drive damage.

Detailed Explanation:

Transactions can fail for many reasons. The DBMS must handle these gracefully.

  1. Computer Failure (System Crash): A power outage or operating system error causes the system to stop. Memory (RAM) contents are lost, but the hard disk is usually safe.
  2. Transaction or System Error: An integer overflow, division by zero, or logical error in the code.
  3. Local Errors: The transaction itself detects a condition (e.g., "Insufficient funds") and decides to abort.
  4. Concurrency Control Enforcement: The system kills a transaction to resolve a deadlock (two transactions waiting on each other).
  5. Disk Failure: The hard disk itself is damaged. This is serious because data stored on disk might be lost (Head crash).
  6. Physical Problems: Fire, flood, or theft of the storage media.
Exam Question:
Which type of failure involves a "Deadlock" situation where the DBMS forces a transaction to abort?

A) Disk Failure
B) System Crash
C) Concurrency Control Enforcement
D) Physical Catastrophe


6. Serializability and Recoverability

6.1 Schedules

Important Points:
  • Schedule: A sequence of operations from a set of transactions preserving the order of each individual transaction.
  • Serial Schedule: Transactions execute one after another (no interleaving).
  • Non-Serial Schedule: Operations from different transactions are interleaved (mixed).

Detailed Explanation:

When multiple transactions run at once, the DBMS creates a Schedule—a list of the order in which operations happen.

Serial Schedule:

This is the safest but slowest way. Transaction 1 starts, runs, and finishes. Only then does Transaction 2 start.

  • Pros: No interference, always correct.
  • Cons: Poor resource utilization. If T1 waits for a user input, the CPU sits idle instead of doing T2.

Non-Serial Schedule (Interleaved):

Operations are mixed. T1 writes X, then T2 reads X, then T1 reads Y, etc.

  • Pros: Faster, better CPU usage.
  • Cons: Can cause the concurrency problems (Lost Update, Dirty Read) discussed earlier.

6.2 Conflict Serializability

Important Points:
  • A non-serial schedule is "Correct" if it is equivalent to a serial schedule. This is called Serializable.
  • Conflict: Occurs if two operations from different transactions access the same data item and at least one is a WRITE.
  • Conflict Serializable: A schedule that can be transformed into a serial schedule by swapping non-conflicting operations.

Detailed Explanation:

We want the speed of non-serial schedules but the safety of serial schedules. We achieve this with Conflict Serializability.

When do operations conflict?

  1. They belong to different transactions.
  2. They access the same data item (X).
  3. At least one of them is a WRITE operation.

Examples of Conflicts:

  • Read(X) and Write(X) -> Conflict (Read-Write conflict).
  • Write(X) and Write(X) -> Conflict (Write-Write conflict).
  • Read(X) and Read(X) -> NO Conflict.
  • Read(X) and Write(Y) -> NO Conflict (different items).

6.3 Testing for Serializability (Precedence Graph)

Important Points:
  • Use a Precedence Graph (Directed Graph) to test if a schedule is serializable.
  • Nodes: Represent transactions.
  • Edges: Represent conflicts. Ti -> Tj means Ti must come before Tj.
  • Rule: If the graph has a CYCLE, the schedule is NOT conflict serializable.

How to build the graph:

  1. Create a node for each transaction.
  2. Draw an edge Ti -> Tj if Tj reads a value written by Ti.
  3. Draw an edge Ti -> Tj if Tj writes a value after Ti has read it.
  4. Draw an edge Ti -> Tj if Tj writes a value after Ti has written it.

Example:

Consider this schedule:

TimeT7T8
t1read(balx)-
t2write(balx)-
t3-read(balx)
t4-write(balx)
t5read(baly)-
t6write(baly)-
t7Commit-
t8-read(baly)
t9-write(baly)
t10-Commit

Analysis:

  • T7 writes balx, then T8 reads balx. Edge: T7 -> T8.
  • T7 writes baly, then T8 reads/writes baly. Edge: T7 -> T8.
Precedence Graph:

    (T7) ---------> (T8)

Result: The graph has no cycle. Therefore, this schedule IS conflict serializable. It is equivalent to the serial schedule T7 followed by T8.

Exam Question:
If a precedence graph for a schedule contains a cycle, what does it mean?

A) The schedule is serializable.
B) The schedule is serial.
C) The schedule is NOT conflict serializable.
D) The schedule has no conflicts.


Exam Tips

Key Points to Remember for Exams:

  1. ACID: Remember Atomicity (all or nothing), Consistency (rules preserved), Isolation (no interference), Durability (permanent changes).
  2. Transaction Operations: Know the difference between COMMIT (success, save) and ROLLBACK (fail, undo).
  3. Concurrency Problems:
    • Lost Update: Overwriting someone's work.
    • Dirty Read: Reading uncommitted data.
    • Inconsistent Analysis: Reading mixed old/new data.
  4. Serializability: A schedule is correct (Serializable) if it gives the same result as a serial schedule.
  5. Precedence Graph:
    • Draw nodes for transactions.
    • Draw edges for conflicts (Read-Write, Write-Write on same item).
    • Cycle = Not Serializable.
    • No Cycle = Serializable.
  6. Conflict Definition: Two operations conflict if they are on the same data item, from different transactions, and at least one is a WRITE.

Practice Challenge Questions for Exam Preparation

Question 1:
Explain why a "Read-Read" conflict does not exist in concurrency control.

Question 2:
Consider two transactions T1 and T2. T1: Read(A), Write(A) T2: Read(A), Write(A) If a schedule interleaves them as: T1 Read(A), T2 Read(A), T1 Write(A), T2 Write(A). What problem does this cause?

Question 3:
Define the role of the Buffer Manager in the context of transaction processing.

Question 4:
Construct a precedence graph for the following schedule and determine if it is serializable:
Schedule:
T1: Read(X)
T2: Write(X)
T1: Read(Y)
T2: Write(Y)
T1: Write(Y)
T2: Commit
T1: Commit

Question 5:
Describe the difference between a Serial Schedule and a Serializable Schedule.


This completes Chapter Three: Transaction Processing Concepts. Review the ACID properties and the concurrency problems carefully. Good luck with your studies!

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top