Conceptual Questions
1. Why is atomicity crucial in banking systems? Explain with a real-world failure scenario.
Answer: Atomicity ensures all operations in a transaction complete together or none at all.
Failure Scenario: Customer transfers ₹50,000 from savings to checking account:
-
Debit ₹50,000 from savings ✓
-
System crashes before crediting checking account ✗
-
Result: ₹50,000 disappears forever
With Atomicity: System automatically rolls back the debit, restoring original balances. Customer's money is protected.
2. Distinguish between conflict serializability and view serializability with examples.
Answer:
-
Conflict Serializable: Can transform to serial schedule by swapping non-conflicting operations
-
View Serializable: Produces same result as some serial schedule (weaker condition)
Example: Schedule with blind writes (write without prior read)
T₁: write(A)
T₂: write(A) ← blind write
T₃: write(A) ← blind write
-
View Serializable: Yes (final value determined by T₃ in both concurrent and serial execution)
-
Conflict Serializable: No (all operations conflict, cannot swap any)
3. Explain the trade-off between isolation levels and performance in e-commerce systems.
Answer: High Isolation (Serializable):
-
Pros: Data always consistent, no anomalies
-
Cons: Heavy locking, low concurrency, poor performance during sales
Lower Isolation (Read Committed):
-
Pros: Better performance, higher throughput
-
Cons: Possible anomalies like phantom reads
E-commerce Choice: Most use Read Committed for browsing, Serializable for payments.
4. Design a transaction state diagram for online ticket booking system.
Answer:
[Start] → [Active: Selecting seats]
↓
[Partially Committed: Payment processing]
↓ ↓
[Committed: Ticket confirmed] OR [Failed: Payment rejected]
↓
[Aborted: Seats released]
External Actions: Send confirmation email only after COMMIT state.
Scenario-Based Questions
1. Two customers simultaneously book the last hotel room. Without proper concurrency control, what problems occur? Design solutions.
Answer: Problem Without Control:
Customer A: Check availability → "1 room left"
Customer B: Check availability → "1 room left"
Customer A: Book room → "Booking confirmed"
Customer B: Book room → "Booking confirmed"
Result: Hotel overbooked!
Solutions:
-
Locking: First customer locks room record, second waits
-
Optimistic: Use timestamps, abort later transaction
-
Atomic Decrement: Use
UPDATE rooms SET available = available - 1 WHERE available > 0
2. Design a cascadeless schedule for three transactions: T₁ (transfer A→B), T₂ (transfer B→C), T₃ (transfer C→A).
Answer: Cascadeless Schedule:
T₁: read(A), write(A), read(B), write(B), COMMIT
T₂: read(B), write(B), read(C), write(C), COMMIT
T₃: read(C), write(C), read(A), write(A), COMMIT
Why Cascadeless: Each transaction reads only committed data. If any transaction aborts, others don't need to abort.
3. A social media platform allows users to "like" posts. Analyze what happens under different isolation levels when two users like the same post simultaneously.
Answer: Scenario: Post has 99 likes, User A and User B both click "like"
Read Uncommitted:
-
Both read 99, both increment to 100
-
Final count: 100 (lost update) ✗
Read Committed:
- Same problem as above ✗
Repeatable Read:
-
Both read 99, write attempts conflict
-
One succeeds → 100, other retries → 101 ✓
Serializable:
-
Operations fully serialized
-
Final count: 101 ✓
Real Solution: Use atomic increment: UPDATE posts SET likes = likes + 1
Technical Questions
1. Construct precedence graph for the following schedule and determine if it's conflict serializable:
T₁: read(A), write(B)
T₂: write(A), read(B)
T₃: read(A), write(A)
Answer: Precedence Graph:
-
T₁ → T₂ (T₁ reads A before T₂ writes A)
-
T₂ → T₁ (T₂ reads B before T₁ writes B)
-
T₂ → T₃ (T₂ writes A before T₃ reads A)
-
T₂ → T₃ (T₂ writes A before T₃ writes A)
Result: Cycle T₁ ⟷ T₂ exists → NOT conflict serializable
2. Explain why two-phase locking guarantees serializability but can cause deadlocks.
Answer: Why It Guarantees Serializability:
-
Growing phase ensures all locks acquired before any release
-
Creates strict ordering between conflicting operations
-
Equivalent to some serial execution
Why Deadlocks Occur:
T₁: lock(A), [growing phase], needs lock(B) ← waits for T₂
T₂: lock(B), [growing phase], needs lock(A) ← waits for T₁
Circular wait = Deadlock
Prevention: Order all locks (always acquire A before B)
3. Compare timestamp-based concurrency control with two-phase locking.
Answer:
Aspect | Timestamp | Two-Phase Locking |
|---|---|---|
Deadlock | Never occurs | Can happen |
Starvation | Possible (repeatedly aborted transactions) | Less likely |
Overhead | Timestamps + rollbacks | Lock management |
Concurrency | Higher (no waiting) | Lower (blocking) |
Recovery | Complex (cascade aborts) | Simpler |
Application Questions
1. Design a transaction for online shopping cart checkout ensuring all ACID properties.
Answer:
BEGIN TRANSACTION
-- Atomicity ensured by transaction boundaries
-- Check inventory (Consistency)
FOR each item IN cart:
IF inventory[item] < quantity THEN
ROLLBACK -- maintains consistency
END IF
-- Reserve inventory
UPDATE inventory SET quantity = quantity - cart_quantity
WHERE product_id IN cart_items
-- Process payment (Isolation through locking)
CALL payment_gateway(total_amount, card_details)
IF payment_failed THEN
ROLLBACK -- Atomicity
END IF
-- Create order record
INSERT INTO orders (customer_id, items, total)
-- Durability ensured by commit to stable storage
COMMIT TRANSACTION
-- External actions only after commit
SEND confirmation_email()
CALL shipping_service()
2. Why can't file systems provide proper transaction atomicity? Give a concrete example.
Answer: File System Limitations:
-
No built-in rollback mechanism
-
No write-ahead logging
-
No coordination between multiple file updates
Concrete Example: Student enrollment system
1. Add student to students.txt ✓
2. System crashes
3. Add student to course_roster.txt ✗ (never executed)
Result: Student in system but not enrolled in course
No automatic way to undo step 1
Database Solution: Transaction automatically rolls back all changes if any step fails.
3. A stock trading system processes 1000 trades/second. Compare serial vs concurrent execution impact.
Answer: Serial Execution:
-
One trade at a time
-
Average trade processing: 10ms
-
Throughput: 100 trades/second
-
Problem: 90% of trades wait in queue
Concurrent Execution (with proper isolation):
-
Multiple trades processed together
-
Non-conflicting trades (different stocks) run parallel
-
Conflicting trades (same stock) still serialized
-
Effective throughput: 800-900 trades/second
-
Result: Better resource utilization, lower latency
Numerical and Advanced Questions
Performance Analysis
1. Transaction Throughput Calculation
Question: System processes transactions with distribution:
-
70% SELECT queries (avg: 5ms)
-
20% UPDATE operations (avg: 15ms)
-
10% Complex reports (avg: 100ms)
Calculate expected transaction time and maximum transactions per second with 95% confidence interval.
Answer: Expected Time: E(T) = 0.7×5 + 0.2×15 + 0.1×100 = 3.5 + 3 + 10 = 16.5ms
Variance: Var(T) = 0.7×5² + 0.2×15² + 0.1×100² - 16.5² = 17.5 + 45 + 1000 - 272.25 = 790.25
Standard Deviation: σ = √790.25 ≈ 28.1ms
95% Confidence Interval: 16.5 ± 1.96×28.1 = [−38.6, 71.6]ms (taking max as 71.6ms)
Maximum Throughput: 1000ms / 16.5ms ≈ 60.6 transactions/second
2. Deadlock Probability in Banking System
Question: Banking system with 20 concurrent transactions. Each transaction needs 2 random accounts from pool of 10 accounts. Probability of deadlock if lock holding time ~ Exponential(λ = 0.2/sec)?
Answer: Using birthday paradox approach:
-
Total account pairs needed: 20 × 2 = 40 locks
-
Available account pairs: C(10,2) = 45 pairs
-
Resource contention ratio: 40/45 = 0.89
Using approximation for high contention: P(deadlock) ≈ 1 - e^(-40²/(2×45)) ≈ 1 - e^(-17.78) ≈ 99.998%
Conclusion: System will deadlock almost immediately! Need better design.
3. Lock Compatibility Matrix Analysis
Question: Database with shared (S) and exclusive (X) locks. If request pattern follows Poisson(λ=50/sec) with 80% reads, 20% writes:
a) Average waiting time for write operations b) System utilization under heavy load
Answer: Request Rates:
-
Read requests: 40/sec (S-locks)
-
Write requests: 10/sec (X-locks)
Using M/M/1/K queueing model:
a) Write Wait Time:
-
Service rate for writes: μ = 1/0.015 = 66.7/sec (assuming 15ms avg hold time)
-
Utilization for writes: ρ = 10/66.7 = 0.15
-
Average wait: W = ρ/(μ(1-ρ)) = 0.15/(66.7×0.85) ≈ 2.65ms
b) System Utilization:
-
Read utilization: ρₛ = 40/(1/0.005) = 40/200 = 0.20
-
Write utilization: ρₓ = 0.15 (calculated above)
-
Combined utilization: max(ρₛ, ρₓ) = 20%
4. Concurrency Control Comparison
Question: Compare performance of three concurrency control methods for TPC-C benchmark:
-
Two-Phase Locking (2PL)
-
Timestamp Ordering (TO)
-
Optimistic Concurrency Control (OCC)
Transaction mix: 45% Payment, 43% New-Order, 4% Order-Status, 4% Delivery, 4% Stock-Level
Answer:
Method | Throughput (tps) | Abort Rate | Avg Response Time |
|---|---|---|---|
2PL | 1,200 | 2% | 45ms |
TO | 1,800 | 15% | 35ms |
OCC | 2,500 | 25% | 28ms |
Analysis:
-
2PL: Stable, low abort rate, moderate performance
-
TO: Better throughput, higher abort rate due to cascading
-
OCC: Best performance under moderate contention, but abort rate increases exponentially with high contention
Recommendation: 2PL for high-consistency systems, OCC for read-heavy workloads
5. Buffer Pool Impact on Transaction Performance
Question: Database with 1GB buffer pool, 100GB database. Transaction access pattern follows 80-20 rule (80% accesses to 20% of data). Calculate expected I/O operations per transaction.
Answer: Hot Data Analysis:
-
Hot data size: 20% × 100GB = 20GB
-
Buffer can hold: 1GB (5% of hot data)
-
Hit rate for hot data: 1GB/20GB = 5%
Access Pattern:
-
80% accesses to hot data: Hit rate = 5%
-
20% accesses to cold data: Hit rate ≈ 0%
Expected I/O per Access: E(I/O) = 0.8 × (1 - 0.05) + 0.2 × (1 - 0) = 0.8 × 0.95 + 0.2 × 1 = 0.76 + 0.2 = 0.96 I/O operations per access
If average transaction has 10 data accesses: 9.6 I/O operations per transaction
6. Distributed Transaction Coordination
Question: Distributed system with 5 nodes using Two-Phase Commit. Each node has 99% availability. Calculate: a) Probability of successful commit b) Expected commit latency with network delay N(50ms, 10ms)
Answer: a) Successful Commit Probability: All nodes must be available: P(success) = (0.99)⁵ = 0.951 or 95.1%
b) Commit Latency:
-
Phase 1 (Prepare): Coordinator → All nodes → Coordinator
-
Phase 2 (Commit): Coordinator → All nodes
Expected latency:
-
Max of 5 network delays + processing time
-
Using order statistics: E[max of 5 samples] ≈ μ + σ × 1.16 = 50 + 10 × 1.16 = 61.6ms
-
Total: 2 × 61.6ms = 123.2ms expected commit time
7. Lock Granularity Optimization
Question: Database table with 1M rows, 100 bytes per row. Compare locking strategies:
-
Row-level locking
-
Page-level locking (8KB pages, ~80 rows per page)
-
Table-level locking
For workload: 60% single-row updates, 40% full-table scans.
Answer: Row-Level Locking:
-
Lock overhead: 1M locks for full scan
-
Memory: 1M × 32 bytes = 32MB lock table
-
Pros: High concurrency for updates
-
Cons: High overhead for scans
Page-Level Locking:
-
Lock overhead: 12,500 page locks for full scan
-
Memory: 12,500 × 32 bytes = 400KB lock table
-
Pros: Balanced concurrency and overhead
-
Cons: False conflicts (locking unrelated rows on same page)
Table-Level Locking:
-
Lock overhead: 1 lock regardless of scan size
-
Memory: 32 bytes total
-
Pros: Minimal overhead
-
Cons: No concurrency for updates during scans
Recommendation: Page-level for this workload (best balance)
8. Phantom Read Prevention Analysis
Question: Database with isolation level "Repeatable Read". Transaction T₁ counts records WHERE salary > 50000. During T₁ execution, T₂ inserts employee with salary = 60000. Analyze what happens and design prevention mechanism.
Answer: Problem:
T₁: SELECT COUNT(*) WHERE salary > 50000 → Result: 100
T₂: INSERT employee (salary = 60000), COMMIT
T₁: SELECT COUNT(*) WHERE salary > 50000 → Result: 101 (phantom!)
Prevention Mechanisms:
-
Predicate Locking: Lock all records satisfying "salary > 50000"
-
Gap Locking: Lock gaps in salary index where new records could fit
-
Serializable Isolation: Full serializability prevents phantoms
Implementation: Most databases use gap locking in B+ tree indexes.
9. Recovery Log Analysis
Question: Transaction log contains:
<START T₁>
<T₁, A, 100, 150>
<START T₂>
<T₂, B, 200, 250>
<T₁, C, 300, 350>
<COMMIT T₁>
<T₂, A, 150, 175>
<SYSTEM CRASH>
Determine final database state using UNDO/REDO recovery.
Answer: Analysis:
-
T₁: Committed before crash → REDO all T₁ operations
-
T₂: Not committed → UNDO all T₂ operations
Recovery Steps:
-
REDO T₁: A = 150, C = 350
-
UNDO T₂: B = 200 (restore original), A = 150 (T₁'s value preserved)
Final State: A = 150, B = 200, C = 350
10. Multi-Version Concurrency Control (MVCC)
Question: Database uses MVCC with timestamp ordering. Three transactions start:
-
T₁ (ts=100): read(A), write(A)
-
T₂ (ts=200): read(A), write(B)
-
T₃ (ts=150): read(A), read(B)
Initial: A₀ (ts=50), B₀ (ts=75). Determine which operations succeed.
Answer: Version Chain Analysis:
T₁ Operations:
-
read(A): ts=100 > A₀.ts=50 → SUCCESS, reads A₀
-
write(A): Creates A₁(ts=100)
T₂ Operations:
-
read(A): ts=200 > A₁.ts=100 → SUCCESS, reads A₁
-
write(B): Creates B₁(ts=200)
T₃ Operations:
-
read(A): ts=150, sees A₁(ts=100) → SUCCESS
-
read(B): ts=150 < B₁(ts=200) → reads B₀
Result: All operations succeed, no conflicts in MVCC!
Advanced Numerical Problems
11. Cache-Conscious Transaction Processing
Question: Database buffer pool has 3 cache levels:
-
L1: 64KB, 1ns access, 64B cache lines
-
L2: 1MB, 10ns access
-
L3: 8MB, 50ns access
-
RAM: 100ns access
Transaction accesses 1000 random records (100B each). Calculate expected access time with 90% L1 hit rate, 95% L2 hit rate, 98% L3 hit rate.
Answer: Hit Rate Calculation:
-
L1 hits: 90% → 900 accesses × 1ns = 900ns
-
L2 hits: 10% × 95% = 9.5% → 95 accesses × 10ns = 950ns
-
L3 hits: 10% × 5% × 98% = 0.49% → 4.9 accesses × 50ns = 245ns
-
RAM hits: 10% × 5% × 2% = 0.01% → 0.1 accesses × 100ns = 10ns
Total Expected Time: (900 + 950 + 245 + 10) / 1000 = 2.105ns per access
Total Transaction Time: 1000 × 2.105ns = 2.105μs
12. Distributed Consensus with Byzantine Failures
Question: Distributed database with 7 nodes using PBFT (Practical Byzantine Fault Tolerance). System can tolerate f Byzantine nodes. Calculate:
a) Maximum Byzantine nodes tolerable b) Message complexity for one consensus round c) Latency assuming 50ms network delay
Answer: a) Byzantine Tolerance: PBFT requires n ≥ 3f + 1
-
With n=7: 7 ≥ 3f + 1 → f ≤ 2
-
Maximum Byzantine nodes: 2
b) Message Complexity:
-
Pre-prepare: 1 → n-1 = 6 messages
-
Prepare: (n-1) × (n-1) = 6 × 6 = 36 messages
-
Commit: (n-1) × (n-1) = 36 messages
-
Total: 6 + 36 + 36 = 78 messages per consensus
c) Latency: 3 sequential message phases × 50ms = 150ms
13. Workload Skew Analysis Using Zipf Distribution
Question: OLTP system with 1M customers. Access pattern follows Zipf distribution with parameter α=1.2. Top 1% customers generate what percentage of total traffic?
Answer: Zipf Distribution: P(rank k) ∝ k^(-α)
Calculation:
-
Total customers: N = 1,000,000
-
Top 1%: k = 10,000 customers
-
Zipf parameter: α = 1.2
Cumulative probability for top 1%: P(top 1%) = Σ(k=1 to 10,000) k^(-1.2) / Σ(k=1 to 1,000,000) k^(-1.2)
Using Zipf approximation: P(top 1%) ≈ (10,000^(1-1.2) - 1) / (1,000,000^(1-1.2) - 1) = (10,000^(-0.2) - 1) / (1,000,000^(-0.2) - 1)
Result: Top 1% customers generate approximately 31.6% of total traffic
14. Query Optimization with Join Selectivity
Question: Three-way join query:
SELECT * FROM A JOIN B ON A.id = B.id
JOIN C ON B.key = C.key
WHERE A.status = 'active' AND C.value > 1000
Table statistics:
-
A: 1M rows, 10% active status
-
B: 500K rows
-
C: 2M rows, 5% have value > 1000
Calculate optimal join order and estimated result size.
Answer: Selectivity Analysis:
-
A after filter: 1M × 0.1 = 100K rows
-
C after filter: 2M × 0.05 = 100K rows
-
B: 500K rows (no filter)
Join Cost Estimation (assuming nested loop):
-
A⋈B⋈C: 100K × 500K × 100K = 5×10¹⁵ operations
-
A⋈C⋈B: Need correlation between A.id and C.key (assume 10%)
-
C⋈B⋈A: 100K × 500K × 100K = 5×10¹⁵ operations
Optimal Strategy: Use hash joins with smallest filtered tables first: Order: Filter A and C first, then A⋈B, finally ⋈C
Estimated Result: 100K × (1/500K) × 100K ≈ 20 rows
15. Snapshot Isolation Anomaly Detection
Question: Two transactions under Snapshot Isolation:
T₁: r₁(x=100), r₁(y=100), w₁(x=x+y), w₁(y=x+y)
T₂: r₂(x=100), r₂(y=100), w₂(x=x*2), w₂(y=y*2)
Both start simultaneously with snapshot: x=100, y=100. Determine: a) Final database state b) Whether write skew anomaly occurs
Answer: a) Transaction Execution:
T₁'s View: x=100, y=100 (snapshot)
-
w₁(x=100+100=200)
-
w₁(y=100+100=200)
T₂'s View: x=100, y=100 (same snapshot)
-
w₂(x=100×2=200)
-
w₂(y=100×2=200)
Final State: Depends on commit order
-
If T₁ commits first: x=200, y=200
-
If T₂ commits first: x=200, y=200
-
Result: x=200, y=200
b) Write Skew Analysis: No write skew occurs because both transactions write to same variables with same final values. No anomaly detected.
16. Lock Escalation Threshold Optimization
Question: Database system escalates from row-level to table-level locking when lock count exceeds threshold. Given:
-
Row lock overhead: 64 bytes per lock
-
Table lock overhead: 128 bytes per lock
-
Lock memory limit: 100MB
-
Average rows per table: 50,000
Calculate optimal escalation threshold to minimize memory usage.
Answer: Memory Usage Models:
Row-level: Memory = (number of locked rows) × 64 bytes
Table-level: Memory = (number of locked tables) × 128 bytes
Break-even point for single table: Row memory = Table memory n × 64 = 128 n = 2 rows
But considering lock memory limit:
-
Maximum row locks: 100MB / 64B = 1.64M locks
-
If average table has 50K rows, can lock 32 complete tables
-
Optimal threshold: 50,000 / 32 = 1,563 rows per table
Recommendation: Set escalation threshold at 1,500 rows to stay within memory limits.
17. Optimistic Concurrency Control Performance Model
Question: System uses OCC with validation phase. Transaction conflict probability follows Poisson distribution. Given:
-
Average active transactions: λ = 20
-
Transaction duration: μ = 100ms
-
Validation time: 5ms per transaction
Calculate expected abort rate and effective throughput.
Answer: Conflict Analysis:
-
System load: ρ = λ × μ = 20 × 0.1 = 2.0
-
Conflict probability per transaction: P(conflict) = 1 - e^(-ρ) = 1 - e^(-2) ≈ 0.865
Abort Rate Calculation: With restart overhead, abort rate stabilizes at: Abort Rate = P(conflict) × (1 - restart efficiency) ≈ 0.865 × 0.8 ≈ 69.2%
Effective Throughput:
-
Theoretical throughput: 1/0.105s = 9.52 tps per transaction slot
-
With aborts: 9.52 × (1 - 0.692) = 2.93 transactions/second per slot
-
Total system throughput: 20 × 2.93 = 58.6 tps
Conclusion: System is overloaded under OCC; 2PL would perform better.
18. Multi-Granularity Locking Protocol
Question: Hierarchical locking system: Database → Table → Page → Row. Transaction needs:
-
Shared access to 1000 rows in Table A
-
Exclusive access to 50 rows in Table B
-
Shared access to entire Table C
Design optimal locking strategy and calculate lock overhead.
Answer: Optimal Lock Strategy:
Table A (1000 rows shared):
-
Acquire IS (Intent Shared) lock on Database
-
Acquire IS lock on Table A
-
Acquire S locks on affected pages (assume 10 rows/page = 100 page locks)
-
Total: 1 + 1 + 100 = 102 locks
Table B (50 rows exclusive):
-
Database already has IX (Intent Exclusive)
-
Acquire IX lock on Table B
-
Acquire X locks on affected pages (5 page locks)
-
Total: 1 + 5 = 6 locks
Table C (entire table shared):
-
Acquire S lock on Table C directly
-
Total: 1 lock
Total Lock Overhead: 102 + 6 + 1 = 109 locks Memory Usage: 109 × 64 bytes = 6.97KB
Alternative (without multi-granularity): 1000 + 50 + (entire table C) ≈ 50,000+ locks Savings: 99.8% reduction in lock overhead!