Transaction Management: Questions and Answers

Conceptual Questions

1. Why is atomicity crucial in banking systems? Explain with a real-world failure scenario.

Answer: Atomicity ensures all operations in a transaction complete together or none at all.

Failure Scenario: Customer transfers ₹50,000 from savings to checking account:

  1. Debit ₹50,000 from savings ✓

  2. System crashes before crediting checking account ✗

  3. Result: ₹50,000 disappears forever

With Atomicity: System automatically rolls back the debit, restoring original balances. Customer's money is protected.

2. Distinguish between conflict serializability and view serializability with examples.

Answer:

  • Conflict Serializable: Can transform to serial schedule by swapping non-conflicting operations

  • View Serializable: Produces same result as some serial schedule (weaker condition)

Example: Schedule with blind writes (write without prior read)

T₁: write(A)
T₂: write(A)  ← blind write
T₃: write(A)  ← blind write
  • View Serializable: Yes (final value determined by T₃ in both concurrent and serial execution)

  • Conflict Serializable: No (all operations conflict, cannot swap any)

3. Explain the trade-off between isolation levels and performance in e-commerce systems.

Answer: High Isolation (Serializable):

  • Pros: Data always consistent, no anomalies

  • Cons: Heavy locking, low concurrency, poor performance during sales

Lower Isolation (Read Committed):

  • Pros: Better performance, higher throughput

  • Cons: Possible anomalies like phantom reads

E-commerce Choice: Most use Read Committed for browsing, Serializable for payments.

4. Design a transaction state diagram for online ticket booking system.

Answer:

[Start] → [Active: Selecting seats]
    ↓
[Partially Committed: Payment processing]
    ↓                    ↓
[Committed: Ticket confirmed] OR [Failed: Payment rejected]
                                      ↓
                                  [Aborted: Seats released]

External Actions: Send confirmation email only after COMMIT state.

Scenario-Based Questions

1. Two customers simultaneously book the last hotel room. Without proper concurrency control, what problems occur? Design solutions.

Answer: Problem Without Control:

Customer A: Check availability → "1 room left"
Customer B: Check availability → "1 room left" 
Customer A: Book room → "Booking confirmed"
Customer B: Book room → "Booking confirmed"
Result: Hotel overbooked!

Solutions:

  1. Locking: First customer locks room record, second waits

  2. Optimistic: Use timestamps, abort later transaction

  3. Atomic Decrement: Use UPDATE rooms SET available = available - 1 WHERE available > 0

2. Design a cascadeless schedule for three transactions: T₁ (transfer A→B), T₂ (transfer B→C), T₃ (transfer C→A).

Answer: Cascadeless Schedule:

T₁: read(A), write(A), read(B), write(B), COMMIT
T₂:                                         read(B), write(B), read(C), write(C), COMMIT
T₃:                                                                                 read(C), write(C), read(A), write(A), COMMIT

Why Cascadeless: Each transaction reads only committed data. If any transaction aborts, others don't need to abort.

3. A social media platform allows users to "like" posts. Analyze what happens under different isolation levels when two users like the same post simultaneously.

Answer: Scenario: Post has 99 likes, User A and User B both click "like"

Read Uncommitted:

  • Both read 99, both increment to 100

  • Final count: 100 (lost update) ✗

Read Committed:

  • Same problem as above ✗

Repeatable Read:

  • Both read 99, write attempts conflict

  • One succeeds → 100, other retries → 101 ✓

Serializable:

  • Operations fully serialized

  • Final count: 101 ✓

Real Solution: Use atomic increment: UPDATE posts SET likes = likes + 1

Technical Questions

1. Construct precedence graph for the following schedule and determine if it's conflict serializable:

T₁: read(A), write(B)
T₂: write(A), read(B)  
T₃: read(A), write(A)

Answer: Precedence Graph:

  • T₁ → T₂ (T₁ reads A before T₂ writes A)

  • T₂ → T₁ (T₂ reads B before T₁ writes B)

  • T₂ → T₃ (T₂ writes A before T₃ reads A)

  • T₂ → T₃ (T₂ writes A before T₃ writes A)

Result: Cycle T₁ ⟷ T₂ exists → NOT conflict serializable

2. Explain why two-phase locking guarantees serializability but can cause deadlocks.

Answer: Why It Guarantees Serializability:

  • Growing phase ensures all locks acquired before any release

  • Creates strict ordering between conflicting operations

  • Equivalent to some serial execution

Why Deadlocks Occur:

T₁: lock(A), [growing phase], needs lock(B) ← waits for T₂
T₂: lock(B), [growing phase], needs lock(A) ← waits for T₁
Circular wait = Deadlock

Prevention: Order all locks (always acquire A before B)

3. Compare timestamp-based concurrency control with two-phase locking.

Answer:

Aspect

Timestamp

Two-Phase Locking

Deadlock

Never occurs

Can happen

Starvation

Possible (repeatedly aborted transactions)

Less likely

Overhead

Timestamps + rollbacks

Lock management

Concurrency

Higher (no waiting)

Lower (blocking)

Recovery

Complex (cascade aborts)

Simpler

Application Questions

1. Design a transaction for online shopping cart checkout ensuring all ACID properties.

Answer:

BEGIN TRANSACTION
    -- Atomicity ensured by transaction boundaries
    
    -- Check inventory (Consistency)
    FOR each item IN cart:
        IF inventory[item] < quantity THEN
            ROLLBACK -- maintains consistency
        END IF
    
    -- Reserve inventory
    UPDATE inventory SET quantity = quantity - cart_quantity
    WHERE product_id IN cart_items
    
    -- Process payment (Isolation through locking)
    CALL payment_gateway(total_amount, card_details)
    IF payment_failed THEN
        ROLLBACK -- Atomicity
    END IF
    
    -- Create order record
    INSERT INTO orders (customer_id, items, total)
    
    -- Durability ensured by commit to stable storage
COMMIT TRANSACTION

-- External actions only after commit
SEND confirmation_email()
CALL shipping_service()

2. Why can't file systems provide proper transaction atomicity? Give a concrete example.

Answer: File System Limitations:

  • No built-in rollback mechanism

  • No write-ahead logging

  • No coordination between multiple file updates

Concrete Example: Student enrollment system

1. Add student to students.txt ✓
2. System crashes
3. Add student to course_roster.txt ✗ (never executed)

Result: Student in system but not enrolled in course
No automatic way to undo step 1

Database Solution: Transaction automatically rolls back all changes if any step fails.

3. A stock trading system processes 1000 trades/second. Compare serial vs concurrent execution impact.

Answer: Serial Execution:

  • One trade at a time

  • Average trade processing: 10ms

  • Throughput: 100 trades/second

  • Problem: 90% of trades wait in queue

Concurrent Execution (with proper isolation):

  • Multiple trades processed together

  • Non-conflicting trades (different stocks) run parallel

  • Conflicting trades (same stock) still serialized

  • Effective throughput: 800-900 trades/second

  • Result: Better resource utilization, lower latency


Numerical and Advanced Questions

Performance Analysis

1. Transaction Throughput Calculation

Question: System processes transactions with distribution:

  • 70% SELECT queries (avg: 5ms)

  • 20% UPDATE operations (avg: 15ms)

  • 10% Complex reports (avg: 100ms)

Calculate expected transaction time and maximum transactions per second with 95% confidence interval.

Answer: Expected Time: E(T) = 0.7×5 + 0.2×15 + 0.1×100 = 3.5 + 3 + 10 = 16.5ms

Variance: Var(T) = 0.7×5² + 0.2×15² + 0.1×100² - 16.5² = 17.5 + 45 + 1000 - 272.25 = 790.25

Standard Deviation: σ = √790.25 ≈ 28.1ms

95% Confidence Interval: 16.5 ± 1.96×28.1 = [−38.6, 71.6]ms (taking max as 71.6ms)

Maximum Throughput: 1000ms / 16.5ms ≈ 60.6 transactions/second

2. Deadlock Probability in Banking System

Question: Banking system with 20 concurrent transactions. Each transaction needs 2 random accounts from pool of 10 accounts. Probability of deadlock if lock holding time ~ Exponential(λ = 0.2/sec)?

Answer: Using birthday paradox approach:

  • Total account pairs needed: 20 × 2 = 40 locks

  • Available account pairs: C(10,2) = 45 pairs

  • Resource contention ratio: 40/45 = 0.89

Using approximation for high contention: P(deadlock) ≈ 1 - e^(-40²/(2×45)) ≈ 1 - e^(-17.78) ≈ 99.998%

Conclusion: System will deadlock almost immediately! Need better design.

3. Lock Compatibility Matrix Analysis

Question: Database with shared (S) and exclusive (X) locks. If request pattern follows Poisson(λ=50/sec) with 80% reads, 20% writes:

a) Average waiting time for write operations b) System utilization under heavy load

Answer: Request Rates:

  • Read requests: 40/sec (S-locks)

  • Write requests: 10/sec (X-locks)

Using M/M/1/K queueing model:

a) Write Wait Time:

  • Service rate for writes: μ = 1/0.015 = 66.7/sec (assuming 15ms avg hold time)

  • Utilization for writes: ρ = 10/66.7 = 0.15

  • Average wait: W = ρ/(μ(1-ρ)) = 0.15/(66.7×0.85) ≈ 2.65ms

b) System Utilization:

  • Read utilization: ρₛ = 40/(1/0.005) = 40/200 = 0.20

  • Write utilization: ρₓ = 0.15 (calculated above)

  • Combined utilization: max(ρₛ, ρₓ) = 20%

4. Concurrency Control Comparison

Question: Compare performance of three concurrency control methods for TPC-C benchmark:

  • Two-Phase Locking (2PL)

  • Timestamp Ordering (TO)

  • Optimistic Concurrency Control (OCC)

Transaction mix: 45% Payment, 43% New-Order, 4% Order-Status, 4% Delivery, 4% Stock-Level

Answer:

Method

Throughput (tps)

Abort Rate

Avg Response Time

2PL

1,200

2%

45ms

TO

1,800

15%

35ms

OCC

2,500

25%

28ms

Analysis:

  • 2PL: Stable, low abort rate, moderate performance

  • TO: Better throughput, higher abort rate due to cascading

  • OCC: Best performance under moderate contention, but abort rate increases exponentially with high contention

Recommendation: 2PL for high-consistency systems, OCC for read-heavy workloads

5. Buffer Pool Impact on Transaction Performance

Question: Database with 1GB buffer pool, 100GB database. Transaction access pattern follows 80-20 rule (80% accesses to 20% of data). Calculate expected I/O operations per transaction.

Answer: Hot Data Analysis:

  • Hot data size: 20% × 100GB = 20GB

  • Buffer can hold: 1GB (5% of hot data)

  • Hit rate for hot data: 1GB/20GB = 5%

Access Pattern:

  • 80% accesses to hot data: Hit rate = 5%

  • 20% accesses to cold data: Hit rate ≈ 0%

Expected I/O per Access: E(I/O) = 0.8 × (1 - 0.05) + 0.2 × (1 - 0) = 0.8 × 0.95 + 0.2 × 1 = 0.76 + 0.2 = 0.96 I/O operations per access

If average transaction has 10 data accesses: 9.6 I/O operations per transaction

6. Distributed Transaction Coordination

Question: Distributed system with 5 nodes using Two-Phase Commit. Each node has 99% availability. Calculate: a) Probability of successful commit b) Expected commit latency with network delay N(50ms, 10ms)

Answer: a) Successful Commit Probability: All nodes must be available: P(success) = (0.99)⁵ = 0.951 or 95.1%

b) Commit Latency:

  • Phase 1 (Prepare): Coordinator → All nodes → Coordinator

  • Phase 2 (Commit): Coordinator → All nodes

Expected latency:

  • Max of 5 network delays + processing time

  • Using order statistics: E[max of 5 samples] ≈ μ + σ × 1.16 = 50 + 10 × 1.16 = 61.6ms

  • Total: 2 × 61.6ms = 123.2ms expected commit time

7. Lock Granularity Optimization

Question: Database table with 1M rows, 100 bytes per row. Compare locking strategies:

  • Row-level locking

  • Page-level locking (8KB pages, ~80 rows per page)

  • Table-level locking

For workload: 60% single-row updates, 40% full-table scans.

Answer: Row-Level Locking:

  • Lock overhead: 1M locks for full scan

  • Memory: 1M × 32 bytes = 32MB lock table

  • Pros: High concurrency for updates

  • Cons: High overhead for scans

Page-Level Locking:

  • Lock overhead: 12,500 page locks for full scan

  • Memory: 12,500 × 32 bytes = 400KB lock table

  • Pros: Balanced concurrency and overhead

  • Cons: False conflicts (locking unrelated rows on same page)

Table-Level Locking:

  • Lock overhead: 1 lock regardless of scan size

  • Memory: 32 bytes total

  • Pros: Minimal overhead

  • Cons: No concurrency for updates during scans

Recommendation: Page-level for this workload (best balance)

8. Phantom Read Prevention Analysis

Question: Database with isolation level "Repeatable Read". Transaction T₁ counts records WHERE salary > 50000. During T₁ execution, T₂ inserts employee with salary = 60000. Analyze what happens and design prevention mechanism.

Answer: Problem:

T₁: SELECT COUNT(*) WHERE salary > 50000Result: 100
T₂: INSERT employee (salary = 60000), COMMIT
T₁: SELECT COUNT(*) WHERE salary > 50000Result: 101 (phantom!)

Prevention Mechanisms:

  1. Predicate Locking: Lock all records satisfying "salary > 50000"

  2. Gap Locking: Lock gaps in salary index where new records could fit

  3. Serializable Isolation: Full serializability prevents phantoms

Implementation: Most databases use gap locking in B+ tree indexes.

9. Recovery Log Analysis

Question: Transaction log contains:

<START T₁>
<T₁, A, 100, 150>
<START T₂> 
<T₂, B, 200, 250>
<T₁, C, 300, 350>
<COMMIT T₁>
<T₂, A, 150, 175>
<SYSTEM CRASH>

Determine final database state using UNDO/REDO recovery.

Answer: Analysis:

  • T₁: Committed before crash → REDO all T₁ operations

  • T₂: Not committed → UNDO all T₂ operations

Recovery Steps:

  1. REDO T₁: A = 150, C = 350

  2. UNDO T₂: B = 200 (restore original), A = 150 (T₁'s value preserved)

Final State: A = 150, B = 200, C = 350

10. Multi-Version Concurrency Control (MVCC)

Question: Database uses MVCC with timestamp ordering. Three transactions start:

  • T₁ (ts=100): read(A), write(A)

  • T₂ (ts=200): read(A), write(B)

  • T₃ (ts=150): read(A), read(B)

Initial: A₀ (ts=50), B₀ (ts=75). Determine which operations succeed.

Answer: Version Chain Analysis:

T₁ Operations:

  • read(A): ts=100 > A₀.ts=50 → SUCCESS, reads A₀

  • write(A): Creates A₁(ts=100)

T₂ Operations:

  • read(A): ts=200 > A₁.ts=100 → SUCCESS, reads A₁

  • write(B): Creates B₁(ts=200)

T₃ Operations:

  • read(A): ts=150, sees A₁(ts=100) → SUCCESS

  • read(B): ts=150 < B₁(ts=200) → reads B₀

Result: All operations succeed, no conflicts in MVCC!


Advanced Numerical Problems

11. Cache-Conscious Transaction Processing

Question: Database buffer pool has 3 cache levels:

  • L1: 64KB, 1ns access, 64B cache lines

  • L2: 1MB, 10ns access

  • L3: 8MB, 50ns access

  • RAM: 100ns access

Transaction accesses 1000 random records (100B each). Calculate expected access time with 90% L1 hit rate, 95% L2 hit rate, 98% L3 hit rate.

Answer: Hit Rate Calculation:

  • L1 hits: 90% → 900 accesses × 1ns = 900ns

  • L2 hits: 10% × 95% = 9.5% → 95 accesses × 10ns = 950ns

  • L3 hits: 10% × 5% × 98% = 0.49% → 4.9 accesses × 50ns = 245ns

  • RAM hits: 10% × 5% × 2% = 0.01% → 0.1 accesses × 100ns = 10ns

Total Expected Time: (900 + 950 + 245 + 10) / 1000 = 2.105ns per access

Total Transaction Time: 1000 × 2.105ns = 2.105μs

12. Distributed Consensus with Byzantine Failures

Question: Distributed database with 7 nodes using PBFT (Practical Byzantine Fault Tolerance). System can tolerate f Byzantine nodes. Calculate:

a) Maximum Byzantine nodes tolerable b) Message complexity for one consensus round c) Latency assuming 50ms network delay

Answer: a) Byzantine Tolerance: PBFT requires n ≥ 3f + 1

  • With n=7: 7 ≥ 3f + 1 → f ≤ 2

  • Maximum Byzantine nodes: 2

b) Message Complexity:

  • Pre-prepare: 1 → n-1 = 6 messages

  • Prepare: (n-1) × (n-1) = 6 × 6 = 36 messages

  • Commit: (n-1) × (n-1) = 36 messages

  • Total: 6 + 36 + 36 = 78 messages per consensus

c) Latency: 3 sequential message phases × 50ms = 150ms

13. Workload Skew Analysis Using Zipf Distribution

Question: OLTP system with 1M customers. Access pattern follows Zipf distribution with parameter α=1.2. Top 1% customers generate what percentage of total traffic?

Answer: Zipf Distribution: P(rank k) ∝ k^(-α)

Calculation:

  • Total customers: N = 1,000,000

  • Top 1%: k = 10,000 customers

  • Zipf parameter: α = 1.2

Cumulative probability for top 1%: P(top 1%) = Σ(k=1 to 10,000) k^(-1.2) / Σ(k=1 to 1,000,000) k^(-1.2)

Using Zipf approximation: P(top 1%) ≈ (10,000^(1-1.2) - 1) / (1,000,000^(1-1.2) - 1) = (10,000^(-0.2) - 1) / (1,000,000^(-0.2) - 1)

Result: Top 1% customers generate approximately 31.6% of total traffic

14. Query Optimization with Join Selectivity

Question: Three-way join query:

SELECT * FROM A JOIN B ON A.id = B.id 
              JOIN C ON B.key = C.key
WHERE A.status = 'active' AND C.value > 1000

Table statistics:

  • A: 1M rows, 10% active status

  • B: 500K rows

  • C: 2M rows, 5% have value > 1000

Calculate optimal join order and estimated result size.

Answer: Selectivity Analysis:

  • A after filter: 1M × 0.1 = 100K rows

  • C after filter: 2M × 0.05 = 100K rows

  • B: 500K rows (no filter)

Join Cost Estimation (assuming nested loop):

  1. A⋈B⋈C: 100K × 500K × 100K = 5×10¹⁵ operations

  2. A⋈C⋈B: Need correlation between A.id and C.key (assume 10%)

  3. C⋈B⋈A: 100K × 500K × 100K = 5×10¹⁵ operations

Optimal Strategy: Use hash joins with smallest filtered tables first: Order: Filter A and C first, then A⋈B, finally ⋈C

Estimated Result: 100K × (1/500K) × 100K ≈ 20 rows

15. Snapshot Isolation Anomaly Detection

Question: Two transactions under Snapshot Isolation:

T₁: r₁(x=100), r₁(y=100), w₁(x=x+y), w₁(y=x+y)
T₂: r₂(x=100), r₂(y=100), w₂(x=x*2), w₂(y=y*2)

Both start simultaneously with snapshot: x=100, y=100. Determine: a) Final database state b) Whether write skew anomaly occurs

Answer: a) Transaction Execution:

T₁'s View: x=100, y=100 (snapshot)

  • w₁(x=100+100=200)

  • w₁(y=100+100=200)

T₂'s View: x=100, y=100 (same snapshot)

  • w₂(x=100×2=200)

  • w₂(y=100×2=200)

Final State: Depends on commit order

  • If T₁ commits first: x=200, y=200

  • If T₂ commits first: x=200, y=200

  • Result: x=200, y=200

b) Write Skew Analysis: No write skew occurs because both transactions write to same variables with same final values. No anomaly detected.

16. Lock Escalation Threshold Optimization

Question: Database system escalates from row-level to table-level locking when lock count exceeds threshold. Given:

  • Row lock overhead: 64 bytes per lock

  • Table lock overhead: 128 bytes per lock

  • Lock memory limit: 100MB

  • Average rows per table: 50,000

Calculate optimal escalation threshold to minimize memory usage.

Answer: Memory Usage Models:

Row-level: Memory = (number of locked rows) × 64 bytes

Table-level: Memory = (number of locked tables) × 128 bytes

Break-even point for single table: Row memory = Table memory n × 64 = 128 n = 2 rows

But considering lock memory limit:

  • Maximum row locks: 100MB / 64B = 1.64M locks

  • If average table has 50K rows, can lock 32 complete tables

  • Optimal threshold: 50,000 / 32 = 1,563 rows per table

Recommendation: Set escalation threshold at 1,500 rows to stay within memory limits.

17. Optimistic Concurrency Control Performance Model

Question: System uses OCC with validation phase. Transaction conflict probability follows Poisson distribution. Given:

  • Average active transactions: λ = 20

  • Transaction duration: μ = 100ms

  • Validation time: 5ms per transaction

Calculate expected abort rate and effective throughput.

Answer: Conflict Analysis:

  • System load: ρ = λ × μ = 20 × 0.1 = 2.0

  • Conflict probability per transaction: P(conflict) = 1 - e^(-ρ) = 1 - e^(-2) ≈ 0.865

Abort Rate Calculation: With restart overhead, abort rate stabilizes at: Abort Rate = P(conflict) × (1 - restart efficiency) ≈ 0.865 × 0.8 ≈ 69.2%

Effective Throughput:

  • Theoretical throughput: 1/0.105s = 9.52 tps per transaction slot

  • With aborts: 9.52 × (1 - 0.692) = 2.93 transactions/second per slot

  • Total system throughput: 20 × 2.93 = 58.6 tps

Conclusion: System is overloaded under OCC; 2PL would perform better.

18. Multi-Granularity Locking Protocol

Question: Hierarchical locking system: Database → Table → Page → Row. Transaction needs:

  • Shared access to 1000 rows in Table A

  • Exclusive access to 50 rows in Table B

  • Shared access to entire Table C

Design optimal locking strategy and calculate lock overhead.

Answer: Optimal Lock Strategy:

Table A (1000 rows shared):

  • Acquire IS (Intent Shared) lock on Database

  • Acquire IS lock on Table A

  • Acquire S locks on affected pages (assume 10 rows/page = 100 page locks)

  • Total: 1 + 1 + 100 = 102 locks

Table B (50 rows exclusive):

  • Database already has IX (Intent Exclusive)

  • Acquire IX lock on Table B

  • Acquire X locks on affected pages (5 page locks)

  • Total: 1 + 5 = 6 locks

Table C (entire table shared):

  • Acquire S lock on Table C directly

  • Total: 1 lock

Total Lock Overhead: 102 + 6 + 1 = 109 locks Memory Usage: 109 × 64 bytes = 6.97KB

Alternative (without multi-granularity): 1000 + 50 + (entire table C) ≈ 50,000+ locks Savings: 99.8% reduction in lock overhead!

Updated on