Conceptual Questions

1. Why is atomicity crucial in banking systems? Explain with a real-world failure scenario.

Answer: Atomicity ensures all operations in a transaction complete together or none at all.

Failure Scenario: Customer transfers ₹50,000 from savings to checking account:

Debit ₹50,000 from savings ✓
System crashes before crediting checking account ✗
Result: ₹50,000 disappears forever

With Atomicity: System automatically rolls back the debit, restoring original balances. Customer's money is protected.

2. Distinguish between conflict serializability and view serializability with examples.

Answer:

Conflict Serializable: Can transform to serial schedule by swapping non-conflicting operations
View Serializable: Produces same result as some serial schedule (weaker condition)

Example: Schedule with blind writes (write without prior read)

T₁: write(A)
T₂: write(A)  ← blind write
T₃: write(A)  ← blind write

View Serializable: Yes (final value determined by T₃ in both concurrent and serial execution)
Conflict Serializable: No (all operations conflict, cannot swap any)

3. Explain the trade-off between isolation levels and performance in e-commerce systems.

Answer: High Isolation (Serializable):

Pros: Data always consistent, no anomalies
Cons: Heavy locking, low concurrency, poor performance during sales

Lower Isolation (Read Committed):

Pros: Better performance, higher throughput
Cons: Possible anomalies like phantom reads

E-commerce Choice: Most use Read Committed for browsing, Serializable for payments.

4. Design a transaction state diagram for online ticket booking system.

Answer:

[Start] → [Active: Selecting seats]
    ↓
[Partially Committed: Payment processing]
    ↓                    ↓
[Committed: Ticket confirmed] OR [Failed: Payment rejected]
                                      ↓
                                  [Aborted: Seats released]

External Actions: Send confirmation email only after COMMIT state.

Scenario-Based Questions

1. Two customers simultaneously book the last hotel room. Without proper concurrency control, what problems occur? Design solutions.

Answer: Problem Without Control:

Customer A: Check availability → "1 room left"
Customer B: Check availability → "1 room left" 
Customer A: Book room → "Booking confirmed"
Customer B: Book room → "Booking confirmed"
Result: Hotel overbooked!

Solutions:

Locking: First customer locks room record, second waits
Optimistic: Use timestamps, abort later transaction
Atomic Decrement: Use UPDATE rooms SET available = available - 1 WHERE available > 0

2. Design a cascadeless schedule for three transactions: T₁ (transfer A→B), T₂ (transfer B→C), T₃ (transfer C→A).

Answer: Cascadeless Schedule:

T₁: read(A), write(A), read(B), write(B), COMMIT
T₂:                                         read(B), write(B), read(C), write(C), COMMIT
T₃:                                                                                 read(C), write(C), read(A), write(A), COMMIT

Why Cascadeless: Each transaction reads only committed data. If any transaction aborts, others don't need to abort.

Answer: Scenario: Post has 99 likes, User A and User B both click "like"

Read Uncommitted:

Both read 99, both increment to 100
Final count: 100 (lost update) ✗

Read Committed:

Same problem as above ✗

Repeatable Read:

Both read 99, write attempts conflict
One succeeds → 100, other retries → 101 ✓

Serializable:

Operations fully serialized
Final count: 101 ✓

Real Solution: Use atomic increment: UPDATE posts SET likes = likes + 1

Technical Questions

1. Construct precedence graph for the following schedule and determine if it's conflict serializable:

T₁: read(A), write(B)
T₂: write(A), read(B)  
T₃: read(A), write(A)

Answer: Precedence Graph:

T₁ → T₂ (T₁ reads A before T₂ writes A)
T₂ → T₁ (T₂ reads B before T₁ writes B)
T₂ → T₃ (T₂ writes A before T₃ reads A)
T₂ → T₃ (T₂ writes A before T₃ writes A)

Result: Cycle T₁ ⟷ T₂ exists → NOT conflict serializable

2. Explain why two-phase locking guarantees serializability but can cause deadlocks.

Answer: Why It Guarantees Serializability:

Growing phase ensures all locks acquired before any release
Creates strict ordering between conflicting operations
Equivalent to some serial execution

Why Deadlocks Occur:

T₁: lock(A), [growing phase], needs lock(B) ← waits for T₂
T₂: lock(B), [growing phase], needs lock(A) ← waits for T₁
Circular wait = Deadlock

Prevention: Order all locks (always acquire A before B)

3. Compare timestamp-based concurrency control with two-phase locking.

Answer:

Aspect	Timestamp	Two-Phase Locking
Deadlock	Never occurs	Can happen
Starvation	Possible (repeatedly aborted transactions)	Less likely
Overhead	Timestamps + rollbacks	Lock management
Concurrency	Higher (no waiting)	Lower (blocking)
Recovery	Complex (cascade aborts)	Simpler

Application Questions

1. Design a transaction for online shopping cart checkout ensuring all ACID properties.

Answer:

BEGIN TRANSACTION
    -- Atomicity ensured by transaction boundaries
    
    -- Check inventory (Consistency)
    FOR each item IN cart:
        IF inventory[item] < quantity THEN
            ROLLBACK -- maintains consistency
        END IF
    
    -- Reserve inventory
    UPDATE inventory SET quantity = quantity - cart_quantity
    WHERE product_id IN cart_items
    
    -- Process payment (Isolation through locking)
    CALL payment_gateway(total_amount, card_details)
    IF payment_failed THEN
        ROLLBACK -- Atomicity
    END IF
    
    -- Create order record
    INSERT INTO orders (customer_id, items, total)
    
    -- Durability ensured by commit to stable storage
COMMIT TRANSACTION

-- External actions only after commit
SEND confirmation_email()
CALL shipping_service()

2. Why can't file systems provide proper transaction atomicity? Give a concrete example.

Answer: File System Limitations:

No built-in rollback mechanism
No write-ahead logging
No coordination between multiple file updates

Concrete Example: Student enrollment system

1. Add student to students.txt ✓
2. System crashes
3. Add student to course_roster.txt ✗ (never executed)

Result: Student in system but not enrolled in course
No automatic way to undo step 1

Database Solution: Transaction automatically rolls back all changes if any step fails.

3. A stock trading system processes 1000 trades/second. Compare serial vs concurrent execution impact.

Answer: Serial Execution:

One trade at a time
Average trade processing: 10ms
Throughput: 100 trades/second
Problem: 90% of trades wait in queue

Concurrent Execution (with proper isolation):

Multiple trades processed together
Non-conflicting trades (different stocks) run parallel
Conflicting trades (same stock) still serialized
Effective throughput: 800-900 trades/second
Result: Better resource utilization, lower latency

Numerical and Advanced Questions

Performance Analysis

1. Transaction Throughput Calculation

Question: System processes transactions with distribution:

70% SELECT queries (avg: 5ms)
20% UPDATE operations (avg: 15ms)
10% Complex reports (avg: 100ms)

Calculate expected transaction time and maximum transactions per second with 95% confidence interval.

Answer: Expected Time: E(T) = 0.7×5 + 0.2×15 + 0.1×100 = 3.5 + 3 + 10 = 16.5ms

Variance: Var(T) = 0.7×5² + 0.2×15² + 0.1×100² - 16.5² = 17.5 + 45 + 1000 - 272.25 = 790.25

Standard Deviation: σ = √790.25 ≈ 28.1ms

95% Confidence Interval: 16.5 ± 1.96×28.1 = [−38.6, 71.6]ms (taking max as 71.6ms)

Maximum Throughput: 1000ms / 16.5ms ≈ 60.6 transactions/second

2. Deadlock Probability in Banking System

Question: Banking system with 20 concurrent transactions. Each transaction needs 2 random accounts from pool of 10 accounts. Probability of deadlock if lock holding time ~ Exponential(λ = 0.2/sec)?

Answer: Using birthday paradox approach:

Total account pairs needed: 20 × 2 = 40 locks
Available account pairs: C(10,2) = 45 pairs
Resource contention ratio: 40/45 = 0.89

Using approximation for high contention: P(deadlock) ≈ 1 - e^(-40²/(2×45)) ≈ 1 - e^(-17.78) ≈ 99.998%

Conclusion: System will deadlock almost immediately! Need better design.

3. Lock Compatibility Matrix Analysis

Question: Database with shared (S) and exclusive (X) locks. If request pattern follows Poisson(λ=50/sec) with 80% reads, 20% writes:

a) Average waiting time for write operations b) System utilization under heavy load

Answer: Request Rates:

Read requests: 40/sec (S-locks)
Write requests: 10/sec (X-locks)

Using M/M/1/K queueing model:

a) Write Wait Time:

Service rate for writes: μ = 1/0.015 = 66.7/sec (assuming 15ms avg hold time)
Utilization for writes: ρ = 10/66.7 = 0.15
Average wait: W = ρ/(μ(1-ρ)) = 0.15/(66.7×0.85) ≈ 2.65ms

b) System Utilization:

Read utilization: ρₛ = 40/(1/0.005) = 40/200 = 0.20
Write utilization: ρₓ = 0.15 (calculated above)
Combined utilization: max(ρₛ, ρₓ) = 20%

4. Concurrency Control Comparison

Question: Compare performance of three concurrency control methods for TPC-C benchmark:

Two-Phase Locking (2PL)
Timestamp Ordering (TO)
Optimistic Concurrency Control (OCC)

Transaction mix: 45% Payment, 43% New-Order, 4% Order-Status, 4% Delivery, 4% Stock-Level

Answer:

Method	Throughput (tps)	Abort Rate	Avg Response Time
2PL	1,200	2%	45ms
TO	1,800	15%	35ms
OCC	2,500	25%	28ms

Analysis:

2PL: Stable, low abort rate, moderate performance
TO: Better throughput, higher abort rate due to cascading
OCC: Best performance under moderate contention, but abort rate increases exponentially with high contention

Recommendation: 2PL for high-consistency systems, OCC for read-heavy workloads

5. Buffer Pool Impact on Transaction Performance

Question: Database with 1GB buffer pool, 100GB database. Transaction access pattern follows 80-20 rule (80% accesses to 20% of data). Calculate expected I/O operations per transaction.

Answer: Hot Data Analysis:

Hot data size: 20% × 100GB = 20GB
Buffer can hold: 1GB (5% of hot data)
Hit rate for hot data: 1GB/20GB = 5%

Access Pattern:

80% accesses to hot data: Hit rate = 5%
20% accesses to cold data: Hit rate ≈ 0%

Expected I/O per Access: E(I/O) = 0.8 × (1 - 0.05) + 0.2 × (1 - 0) = 0.8 × 0.95 + 0.2 × 1 = 0.76 + 0.2 = 0.96 I/O operations per access

If average transaction has 10 data accesses: 9.6 I/O operations per transaction

6. Distributed Transaction Coordination

Question: Distributed system with 5 nodes using Two-Phase Commit. Each node has 99% availability. Calculate: a) Probability of successful commit b) Expected commit latency with network delay N(50ms, 10ms)

Answer: a) Successful Commit Probability: All nodes must be available: P(success) = (0.99)⁵ = 0.951 or 95.1%

b) Commit Latency:

Phase 1 (Prepare): Coordinator → All nodes → Coordinator
Phase 2 (Commit): Coordinator → All nodes

Expected latency:

Max of 5 network delays + processing time
Using order statistics: E[max of 5 samples] ≈ μ + σ × 1.16 = 50 + 10 × 1.16 = 61.6ms
Total: 2 × 61.6ms = 123.2ms expected commit time

7. Lock Granularity Optimization

Question: Database table with 1M rows, 100 bytes per row. Compare locking strategies:

Row-level locking
Page-level locking (8KB pages, ~80 rows per page)
Table-level locking

For workload: 60% single-row updates, 40% full-table scans.

Answer: Row-Level Locking:

Lock overhead: 1M locks for full scan
Memory: 1M × 32 bytes = 32MB lock table
Pros: High concurrency for updates
Cons: High overhead for scans

Page-Level Locking:

Lock overhead: 12,500 page locks for full scan
Memory: 12,500 × 32 bytes = 400KB lock table
Pros: Balanced concurrency and overhead
Cons: False conflicts (locking unrelated rows on same page)

Table-Level Locking:

Lock overhead: 1 lock regardless of scan size
Memory: 32 bytes total
Pros: Minimal overhead
Cons: No concurrency for updates during scans

Recommendation: Page-level for this workload (best balance)

8. Phantom Read Prevention Analysis

Question: Database with isolation level "Repeatable Read". Transaction T₁ counts records WHERE salary > 50000. During T₁ execution, T₂ inserts employee with salary = 60000. Analyze what happens and design prevention mechanism.

Answer: Problem:

T₁: SELECT COUNT(*) WHERE salary > 50000  → Result: 100
T₂: INSERT employee (salary = 60000), COMMIT
T₁: SELECT COUNT(*) WHERE salary > 50000  → Result: 101 (phantom!)

Prevention Mechanisms:

Predicate Locking: Lock all records satisfying "salary > 50000"
Gap Locking: Lock gaps in salary index where new records could fit
Serializable Isolation: Full serializability prevents phantoms

Implementation: Most databases use gap locking in B+ tree indexes.

9. Recovery Log Analysis

Question: Transaction log contains:

<START T₁>
<T₁, A, 100, 150>
<START T₂> 
<T₂, B, 200, 250>
<T₁, C, 300, 350>
<COMMIT T₁>
<T₂, A, 150, 175>
<SYSTEM CRASH>

Determine final database state using UNDO/REDO recovery.

Answer: Analysis:

T₁: Committed before crash → REDO all T₁ operations
T₂: Not committed → UNDO all T₂ operations

Recovery Steps:

REDO T₁: A = 150, C = 350
UNDO T₂: B = 200 (restore original), A = 150 (T₁'s value preserved)

Final State: A = 150, B = 200, C = 350

10. Multi-Version Concurrency Control (MVCC)

Question: Database uses MVCC with timestamp ordering. Three transactions start:

T₁ (ts=100): read(A), write(A)
T₂ (ts=200): read(A), write(B)
T₃ (ts=150): read(A), read(B)

Initial: A₀ (ts=50), B₀ (ts=75). Determine which operations succeed.

Answer: Version Chain Analysis:

T₁ Operations:

read(A): ts=100 > A₀.ts=50 → SUCCESS, reads A₀
write(A): Creates A₁(ts=100)

T₂ Operations:

read(A): ts=200 > A₁.ts=100 → SUCCESS, reads A₁
write(B): Creates B₁(ts=200)

T₃ Operations:

read(A): ts=150, sees A₁(ts=100) → SUCCESS
read(B): ts=150 < B₁(ts=200) → reads B₀

Result: All operations succeed, no conflicts in MVCC!

Advanced Numerical Problems

11. Cache-Conscious Transaction Processing

Question: Database buffer pool has 3 cache levels:

L1: 64KB, 1ns access, 64B cache lines
L2: 1MB, 10ns access
L3: 8MB, 50ns access
RAM: 100ns access

Transaction accesses 1000 random records (100B each). Calculate expected access time with 90% L1 hit rate, 95% L2 hit rate, 98% L3 hit rate.

Answer: Hit Rate Calculation:

L1 hits: 90% → 900 accesses × 1ns = 900ns
L2 hits: 10% × 95% = 9.5% → 95 accesses × 10ns = 950ns
L3 hits: 10% × 5% × 98% = 0.49% → 4.9 accesses × 50ns = 245ns
RAM hits: 10% × 5% × 2% = 0.01% → 0.1 accesses × 100ns = 10ns

Total Expected Time: (900 + 950 + 245 + 10) / 1000 = 2.105ns per access

Total Transaction Time: 1000 × 2.105ns = 2.105μs

12. Distributed Consensus with Byzantine Failures

Question: Distributed database with 7 nodes using PBFT (Practical Byzantine Fault Tolerance). System can tolerate f Byzantine nodes. Calculate:

a) Maximum Byzantine nodes tolerable b) Message complexity for one consensus round c) Latency assuming 50ms network delay

Answer: a) Byzantine Tolerance: PBFT requires n ≥ 3f + 1

With n=7: 7 ≥ 3f + 1 → f ≤ 2
Maximum Byzantine nodes: 2

b) Message Complexity:

Pre-prepare: 1 → n-1 = 6 messages
Prepare: (n-1) × (n-1) = 6 × 6 = 36 messages
Commit: (n-1) × (n-1) = 36 messages
Total: 6 + 36 + 36 = 78 messages per consensus

c) Latency: 3 sequential message phases × 50ms = 150ms

13. Workload Skew Analysis Using Zipf Distribution

Question: OLTP system with 1M customers. Access pattern follows Zipf distribution with parameter α=1.2. Top 1% customers generate what percentage of total traffic?

Answer: Zipf Distribution: P(rank k) ∝ k^(-α)

Calculation:

Total customers: N = 1,000,000
Top 1%: k = 10,000 customers
Zipf parameter: α = 1.2

Cumulative probability for top 1%: P(top 1%) = Σ(k=1 to 10,000) k^(-1.2) / Σ(k=1 to 1,000,000) k^(-1.2)

Using Zipf approximation: P(top 1%) ≈ (10,000^(1-1.2) - 1) / (1,000,000^(1-1.2) - 1) = (10,000^(-0.2) - 1) / (1,000,000^(-0.2) - 1)

Result: Top 1% customers generate approximately 31.6% of total traffic

14. Query Optimization with Join Selectivity

Question: Three-way join query:

SELECT * FROM A JOIN B ON A.id = B.id 
              JOIN C ON B.key = C.key
WHERE A.status = 'active' AND C.value > 1000

Table statistics:

A: 1M rows, 10% active status
B: 500K rows
C: 2M rows, 5% have value > 1000

Calculate optimal join order and estimated result size.

Answer: Selectivity Analysis:

A after filter: 1M × 0.1 = 100K rows
C after filter: 2M × 0.05 = 100K rows
B: 500K rows (no filter)

Join Cost Estimation (assuming nested loop):

A⋈B⋈C: 100K × 500K × 100K = 5×10¹⁵ operations
A⋈C⋈B: Need correlation between A.id and C.key (assume 10%)
C⋈B⋈A: 100K × 500K × 100K = 5×10¹⁵ operations

Optimal Strategy: Use hash joins with smallest filtered tables first: Order: Filter A and C first, then A⋈B, finally ⋈C

Estimated Result: 100K × (1/500K) × 100K ≈ 20 rows

15. Snapshot Isolation Anomaly Detection

Question: Two transactions under Snapshot Isolation:

T₁: r₁(x=100), r₁(y=100), w₁(x=x+y), w₁(y=x+y)
T₂: r₂(x=100), r₂(y=100), w₂(x=x*2), w₂(y=y*2)

Both start simultaneously with snapshot: x=100, y=100. Determine: a) Final database state b) Whether write skew anomaly occurs

Answer: a) Transaction Execution:

T₁'s View: x=100, y=100 (snapshot)

w₁(x=100+100=200)
w₁(y=100+100=200)

T₂'s View: x=100, y=100 (same snapshot)

w₂(x=100×2=200)
w₂(y=100×2=200)

Final State: Depends on commit order

If T₁ commits first: x=200, y=200
If T₂ commits first: x=200, y=200
Result: x=200, y=200

b) Write Skew Analysis: No write skew occurs because both transactions write to same variables with same final values. No anomaly detected.

16. Lock Escalation Threshold Optimization

Question: Database system escalates from row-level to table-level locking when lock count exceeds threshold. Given:

Row lock overhead: 64 bytes per lock
Table lock overhead: 128 bytes per lock
Lock memory limit: 100MB
Average rows per table: 50,000

Calculate optimal escalation threshold to minimize memory usage.

Answer: Memory Usage Models:

Row-level: Memory = (number of locked rows) × 64 bytes

Table-level: Memory = (number of locked tables) × 128 bytes

Break-even point for single table: Row memory = Table memory n × 64 = 128 n = 2 rows

But considering lock memory limit:

Maximum row locks: 100MB / 64B = 1.64M locks
If average table has 50K rows, can lock 32 complete tables
Optimal threshold: 50,000 / 32 = 1,563 rows per table

Recommendation: Set escalation threshold at 1,500 rows to stay within memory limits.

17. Optimistic Concurrency Control Performance Model

Question: System uses OCC with validation phase. Transaction conflict probability follows Poisson distribution. Given:

Average active transactions: λ = 20
Transaction duration: μ = 100ms
Validation time: 5ms per transaction

Calculate expected abort rate and effective throughput.

Answer: Conflict Analysis:

System load: ρ = λ × μ = 20 × 0.1 = 2.0
Conflict probability per transaction: P(conflict) = 1 - e^(-ρ) = 1 - e^(-2) ≈ 0.865

Abort Rate Calculation: With restart overhead, abort rate stabilizes at: Abort Rate = P(conflict) × (1 - restart efficiency) ≈ 0.865 × 0.8 ≈ 69.2%

Effective Throughput:

Theoretical throughput: 1/0.105s = 9.52 tps per transaction slot
With aborts: 9.52 × (1 - 0.692) = 2.93 transactions/second per slot
Total system throughput: 20 × 2.93 = 58.6 tps

Conclusion: System is overloaded under OCC; 2PL would perform better.

18. Multi-Granularity Locking Protocol

Question: Hierarchical locking system: Database → Table → Page → Row. Transaction needs:

Shared access to 1000 rows in Table A
Exclusive access to 50 rows in Table B
Shared access to entire Table C

Design optimal locking strategy and calculate lock overhead.

Answer: Optimal Lock Strategy:

Table A (1000 rows shared):

Acquire IS (Intent Shared) lock on Database
Acquire IS lock on Table A
Acquire S locks on affected pages (assume 10 rows/page = 100 page locks)
Total: 1 + 1 + 100 = 102 locks

Table B (50 rows exclusive):

Database already has IX (Intent Exclusive)
Acquire IX lock on Table B
Acquire X locks on affected pages (5 page locks)
Total: 1 + 5 = 6 locks

Table C (entire table shared):

Acquire S lock on Table C directly
Total: 1 lock

Total Lock Overhead: 102 + 6 + 1 = 109 locks Memory Usage: 109 × 64 bytes = 6.97KB

Alternative (without multi-granularity): 1000 + 50 + (entire table C) ≈ 50,000+ locks Savings: 99.8% reduction in lock overhead!

Transaction Management: Questions and Answers

Conceptual Questions

1. Why is atomicity crucial in banking systems? Explain with a real-world failure scenario.

2. Distinguish between conflict serializability and view serializability with examples.

3. Explain the trade-off between isolation levels and performance in e-commerce systems.

4. Design a transaction state diagram for online ticket booking system.

Scenario-Based Questions

1. Two customers simultaneously book the last hotel room. Without proper concurrency control, what problems occur? Design solutions.

2. Design a cascadeless schedule for three transactions: T₁ (transfer A→B), T₂ (transfer B→C), T₃ (transfer C→A).

3. A social media platform allows users to "like" posts. Analyze what happens under different isolation levels when two users like the same post simultaneously.

Technical Questions

1. Construct precedence graph for the following schedule and determine if it's conflict serializable:

2. Explain why two-phase locking guarantees serializability but can cause deadlocks.

3. Compare timestamp-based concurrency control with two-phase locking.

Application Questions

1. Design a transaction for online shopping cart checkout ensuring all ACID properties.

2. Why can't file systems provide proper transaction atomicity? Give a concrete example.

3. A stock trading system processes 1000 trades/second. Compare serial vs concurrent execution impact.

Numerical and Advanced Questions

Performance Analysis

1. Transaction Throughput Calculation

2. Deadlock Probability in Banking System

3. Lock Compatibility Matrix Analysis

4. Concurrency Control Comparison

5. Buffer Pool Impact on Transaction Performance

6. Distributed Transaction Coordination

7. Lock Granularity Optimization

8. Phantom Read Prevention Analysis

9. Recovery Log Analysis

10. Multi-Version Concurrency Control (MVCC)

Advanced Numerical Problems

11. Cache-Conscious Transaction Processing

12. Distributed Consensus with Byzantine Failures

13. Workload Skew Analysis Using Zipf Distribution

14. Query Optimization with Join Selectivity

15. Snapshot Isolation Anomaly Detection

16. Lock Escalation Threshold Optimization

17. Optimistic Concurrency Control Performance Model

18. Multi-Granularity Locking Protocol