1. Physical Data Independence - Why Crucial for Large Organizations?
Answer: Physical data independence allows changing storage methods without affecting applications.
Real Scenario: Amazon migrates from traditional hard drives to SSDs for faster access. With physical data independence, all applications (website, mobile app, admin tools) continue working unchanged. Without it, they'd need to rewrite thousands of programs - costing millions and months of downtime.
2. Bank Database Crash During Money Transfer - ACID Properties
Answer:
-
Without ACID: ₹1000 debited from Account A, system crashes before crediting Account B. Money disappears forever.
-
With ACID:
-
Atomicity: Both debit and credit happen together or transaction rolls back
-
Consistency: Total money in system remains same
-
Isolation: Other transactions don't see partial state
-
Durability: Once completed, transfer survives system crashes
-
3. Procedural vs Declarative DML
Answer:
-
Procedural: "Go to student table, check each row, if age > 20 AND city = 'Delhi', add to result"
-
Declarative: "SELECT * FROM students WHERE age > 20 AND city = 'Delhi'"
Why Declarative Preferred: System optimizes the "how" part automatically. Developer focuses on "what" they want, not implementation details.
4. Hospital Management System - Three-Level Architecture
Answer:
-
Physical Level: Disk blocks, indexes, storage optimization
-
Logical Level: Tables (Patient, Doctor, Treatment), relationships, constraints
-
View Level:
-
PatientView: Own medical history only
-
DoctorView: Patients under their care
-
AdminView: Hospital statistics, all data
-
Scenario-Based Questions
1. E-commerce File System Problems vs DBMS Solutions
Answer:
Problem | File System Issue | DBMS Solution |
|---|---|---|
Redundancy | Customer name in orders.txt AND customers.txt | Single customer table referenced by orders |
Inconsistency | Customer updates name, only customers.txt updated | Automatic referential integrity |
Access Difficulty | Need custom program for "orders from Delhi customers" | Simple SQL query |
Security | Anyone with file access sees all data | Role-based access control |
Concurrent Access | Two users editing same file = data corruption | Transaction management |
2. Concurrent Purchase of Last Item
Answer: Problems Without Control:
-
Both customers see "1 item available"
-
Both click "buy" simultaneously
-
Both get confirmation, but only 1 item exists
-
Result: Overselling and angry customers
Solution: Use locking mechanism - first customer locks item record, second customer waits or gets "out of stock" message.
3. University Enrollment Constraint
Answer: File System Difficulty: Constraint must be coded in EVERY program (registration program, add/drop program, transfer program). Missing it in one place breaks the rule.
DBMS Easy: Add constraint once in database definition: CHECK (enrolled_courses <= 6). System automatically enforces it everywhere.
Technical Questions
1. Schema vs Instance - Social Media Example
Answer:
-
Schema: User table structure (ID, Name, Email, JoinDate)
-
Instance: Actual data ("user123, John Doe, john@email.com, 2024-01-15")
Adding Stories Feature:
-
Schema changes: Add Stories table, modify User table structure
-
Instance remains: Existing user data unchanged
-
Applications need updates to handle new schema
2. DDL vs DML Processing
Answer:
-
DDL Processing: Creates/modifies database structure, updates data dictionary (metadata)
-
DML Processing: Manipulates actual data, uses data dictionary to understand structure
-
Data Dictionary Role: Stores all metadata (table definitions, constraints, user permissions). DML queries check dictionary first to understand what data exists and how to access it.
3. Database Views - Convenience and Security
Answer: Same Logical Schema, Different Views:
-
StudentView:
SELECT student_id, name, grades FROM students WHERE student_id = current_user -
TeacherView:
SELECT student_id, name, grades FROM students WHERE course_id IN (teacher_courses) -
AdminView:
SELECT * FROM students, courses, grades(everything)
Benefits: Each user sees only relevant data, reducing complexity and ensuring security.
Application Questions
1. Movie Ticket Booking Transaction
Answer:
BEGIN TRANSACTION
1. Check seat availability
2. Reserve seat (lock it)
3. Process payment
4. Generate ticket
5. Update seat status to "sold"
COMMIT TRANSACTION
If Payment Fails: Entire transaction rolls back - seat becomes available again, no partial booking exists.
2. Atomicity in File-Based Systems
Answer: Why Impossible: File systems don't have built-in rollback mechanisms.
Example: Library book checkout:
-
Update books.txt (mark book as borrowed) ✓
-
Update members.txt (add book to member's list) ✗ (system crash)
Result: Book shows as borrowed but member has no record. No automatic way to undo step 1.
3. File System "Simplicity" Debate
Answer: File System Seems Simple Initially:
-
Direct file operations
-
No complex setup
Reality Check:
-
Maintenance Costs: Every new requirement needs custom programming
-
Data Consistency: Manual effort to maintain, error-prone
-
Scalability: Adding new data types requires redesigning everything
DBMS: Higher initial complexity, but exponentially lower long-term costs.
Advanced Thinking Questions
1. Space Mission DBMS - Modified ACID
Answer: Challenges: Communication delays mean immediate consistency impossible.
Modified ACID:
-
Atomicity: Local transactions must be atomic, but global synchronization delayed
-
Consistency: Eventual consistency instead of immediate
-
Isolation: Local isolation guaranteed, global conflicts resolved later
-
Durability: Multiple redundant copies across different locations
2. Three-Level Abstraction for Mobile + Web Apps
Answer:
-
Physical Level: Same database storage for both
-
Logical Level: Same tables and relationships
-
View Level:
-
Mobile app gets simplified views (less data, optimized for small screens)
-
Web app gets comprehensive views (detailed data, complex operations)
-
Both apps work independently without knowing about each other's existence.
Numerical and Probability-Based Questions
Database Performance and Statistics
1. Buffer Hit Ratio Calculation
Question: A database buffer has 1000 pages. In one hour, there are 50,000 page requests. 45,000 pages were found in buffer (hits), 5,000 required disk reads (misses). Calculate the buffer hit ratio and expected response time if buffer access = 0.1ms, disk access = 10ms.
Answer:
-
Hit Ratio = 45,000/50,000 = 90%
-
Expected Response Time = 0.9 × 0.1ms + 0.1 × 10ms = 0.09 + 1.0 = 1.09ms
2. Transaction Throughput Analysis
Question: A system processes transactions with the following distribution:
-
60% READ operations (avg time: 2ms)
-
30% UPDATE operations (avg time: 8ms)
-
10% COMPLEX queries (avg time: 50ms)
Calculate expected transaction time and maximum throughput per second.
Answer:
-
Expected Time = 0.6 × 2 + 0.3 × 8 + 0.1 × 50 = 1.2 + 2.4 + 5.0 = 8.6ms
-
Maximum Throughput = 1000ms / 8.6ms ≈ 116 transactions/second
3. Deadlock Probability
Question: In a system with 10 concurrent transactions, each needing 2 random resources from a pool of 5 resources. If resource holding time follows exponential distribution (λ = 0.1/sec), estimate the probability of deadlock.
Answer: Using birthday paradox approach:
-
Total resource pairs needed = 10 × 2 = 20
-
Available resource pairs = C(5,2) = 10
-
Probability of conflict ≈ 1 - e^(-20²/(2×10)) ≈ 1 - e^(-20) ≈ 100%
4. Index Efficiency Calculation
Question: A table has 1 million records. Without index, average search examines 500,000 records. With B+ tree index (fan-out = 100), calculate: a) Tree height b) Average disk reads for search c) Performance improvement ratio
Answer: a) Height = log₁₀₀(1,000,000) = 3 levels b) Disk reads = 3 (tree traversal) + 1 (data page) = 4 reads c) Improvement = 500,000 / 4 = 125,000× faster
5. Replication Consistency Model
Question: A distributed database has 5 replicas. Write operation succeeds if acknowledged by majority (≥3 replicas). If each replica has 95% availability, calculate: a) Probability of successful write b) Expected consistency delay if sync time ~ Normal(μ=10ms, σ=2ms)
Answer: a) P(≥3 available) = C(5,3)(0.95)³(0.05)² + C(5,4)(0.95)⁴(0.05)¹ + C(5,5)(0.95)⁵ ≈ 0.9988 b) Expected delay for 3rd fastest = μ + σ × Φ⁻¹(0.6) ≈ 10 + 2 × 0.25 = 10.5ms
6. Query Optimization Cost Model
Question: For joining two tables A (10,000 rows) and B (5,000 rows):
-
Nested loop: Cost = |A| × |B| = 50M operations
-
Hash join: Cost = 3(|A| + |B|) = 45K operations
-
Sort-merge: Cost = |A|log|A| + |B|log|B| + |A| + |B| ≈ 200K operations
If system can do 1M operations/second, calculate time for each method and savings.
Answer:
-
Nested Loop: 50M / 1M = 50 seconds
-
Hash Join: 45K / 1M = 0.045 seconds
-
Sort-Merge: 200K / 1M = 0.2 seconds
-
Best Savings: (50 - 0.045) / 50 = 99.91% improvement
7. Storage Utilization Probability
Question: Database pages are 8KB each. Record sizes follow normal distribution N(200 bytes, 50 bytes). Page has 100-byte header. Calculate: a) Expected records per page b) Probability of page overflow (>40 records)
Answer: a) Available space = 8192 - 100 = 8092 bytes Expected records = 8092 / 200 ≈ 40.46 records b) For 40 records: Total size ~ N(40×200, √40×50) = N(8000, 316) P(size > 8092) = P(Z > (8092-8000)/316) = P(Z > 0.29) ≈ 0.386
8. Backup Strategy Optimization
Question: Database generates 500MB/hour of transaction logs. Full backup takes 2 hours, incremental backup takes 20 minutes. System tolerance: max 1 hour data loss. Design optimal backup strategy minimizing total backup time per day.
Answer:
-
Data loss constraint: Need backup every hour
-
Strategy: 1 full backup/day + 23 incremental backups
-
Total time = 2 hours + 23 × (20/60) hours = 2 + 7.67 = 9.67 hours/day
-
Alternative strategies would require more time to meet the 1-hour loss constraint
9. Concurrency Control Analysis
Question: Two-phase locking system with 20 transactions, each holding locks for exponentially distributed time (mean = 100ms). Calculate expected number of transactions waiting and average wait time.
Answer: Using M/M/1 queueing model:
-
Arrival rate λ = 20 transactions
-
Service rate μ = 10 transactions/second
-
Utilization ρ = λ/μ = 2.0 (overloaded system)
-
Expected waiting = ρ²/(1-ρ) = 4/(-1) = undefined (system unstable)
-
Need to reduce load or increase service rate
10. Database Partitioning Efficiency
Question: Table with 10M records partitioned into 4 equal partitions. Query selectivity follows Zipf distribution (80% queries hit partition 1, 15% hit partition 2, 4% hit partition 3, 1% hit partition 4). Calculate load imbalance factor and suggest optimization.
Answer:
-
Load distribution: [80%, 15%, 4%, 1%]
-
Perfect balance: 25% each
-
Load imbalance = max(load)/avg(load) = 80%/25% = 3.2
-
Optimization: Redistribute data so hot partition 1 becomes multiple partitions, reducing maximum load closer to 25%