Problem with Static Hashing
Static hashing uses a fixed number of buckets, causing problems as databases grow:
Problems:
-
Too few buckets initially → Many overflows → Poor performance
-
Too many buckets initially → Wasted space
-
Fixed structure → Can't adapt to growth
Solution: Dynamic Hashing
Adjust hash structure as database grows or shrinks.
Extendable Hashing Concept
Key Idea: Use variable number of bits from hash value
Example Hash Values (4-bit for simplicity):
"Physics" → hash = 0011
"Math" → hash = 1001
"Chemistry" → hash = 0101
"Biology" → hash = 1100
Dynamic Bit Usage:
-
Initially use 1 bit (0 or 1)
-
As needed, increase to 2 bits (00, 01, 10, 11)
-
Then 3 bits, 4 bits, etc.
Data Structure Components
1. Bucket Address Table
-
Array indexed by first i bits of hash value
-
i = global depth (starts small, grows as needed)
2. Buckets
-
Actual storage containers for records
-
Each has local depth ij (bits needed for records in this bucket)
3. Hash Function
-
Generates uniform b-bit values (usually b=32)
-
Use first i bits for bucket address table lookup
Example: Building Extendable Hash
Initial State (i=1, using 1 bit):
Global depth i = 1
Bucket Address Table:
0 → Bucket A (local depth = 1)
1 → Bucket B (local depth = 1)
Buckets:
Bucket A: [empty] (for hash values starting with 0)
Bucket B: [empty] (for hash values starting with 1)
Step 1: Insert "Physics" (hash = 0011, starts with 0)
Bucket A: [Physics]
Bucket B: [empty]
Step 2: Insert "Math" (hash = 1001, starts with 1)
Bucket A: [Physics]
Bucket B: [Math]
Step 3: Insert "Chemistry" (hash = 0101, starts with 0)
Bucket A: [Physics][Chemistry]
Bucket B: [Math]
Step 4: Insert "Biology" (hash = 1100, starts with 1)
Bucket A: [Physics][Chemistry]
Bucket B: [Math][Biology]
Bucket Split Example
Assume bucket size = 2, and we insert "History" (hash = 0110)
Bucket A is full! Need to split.
Split Process:
-
Increase global depth: i = 1 → i = 2
-
Double bucket address table size
-
Split Bucket A using 2 bits
Before Split (i=1):
0 → Bucket A: [Physics(0011)][Chemistry(0101)]
1 → Bucket B: [Math(1001)][Biology(1100)]
After Split (i=2):
Global depth i = 2
00 → Bucket A: [Physics(0011)] (local depth = 2)
01 → Bucket C: [Chemistry(0101)][History(0110)] (local depth = 2)
10 → Bucket B: [Math(1001)] (local depth = 1)
11 → Bucket B: [Biology(1100)] (local depth = 1)
Notice: Buckets 10 and 11 both point to Bucket B because Biology and Math only need 1 bit to distinguish.
Search Operation
To find "Chemistry" (hash = 0101):
-
Take first i=2 bits: 01
-
Look up bucket address table[01] → Bucket C
-
Search Bucket C linearly
Insertion Algorithm
To insert record with key K:
-
Compute hash h(K)
-
Take first i bits → bucket address j
-
If bucket j has space → insert record
-
If bucket j is full:
-
If i = ij (global depth = local depth):
-
Double bucket address table (i = i + 1)
-
Split bucket using i bits
-
-
If i > ij:
-
Split bucket without doubling table
-
Increase local depth: ij = ij + 1
-
-
Advantages of Extendable Hashing
Compared to Static Hashing:
-
✅ Adapts to database growth
-
✅ No performance degradation
-
✅ Minimal space overhead
-
✅ No periodic reorganization needed
Minor Disadvantages:
-
Slightly more complex implementation
-
Extra indirection through bucket address table
-
But indirection cost is negligible compared to disk I/O
When to Use Each Method
Static Hashing: Good when database size is predictable and stable
Extendable Hashing: Best for growing databases with unpredictable sizes
B+ Trees: Best overall choice for most applications due to:
-
Excellent range query support
-
Predictable performance
-
Good for both growth and mixed workloads
Summary and Key Takeaways
File Organization Methods Comparison
Method | Best For | Advantages | Disadvantages |
|---|---|---|---|
Heap | Insertions | Simple, fast inserts | Slow searches |
Sequential | Range queries | Fast ranges, ordered | Slow insertions |
Hash | Exact matches | Very fast lookups | Poor ranges |
B+ Tree | Mixed workload | Balanced performance | More complex |
Index Types Summary
Index Type | Use Case | Performance |
|---|---|---|
Dense | Small tables, frequent access | Fast but more storage |
Sparse | Large tables, clustering index | Less storage, still fast |
Multilevel | Very large indices | Manageable memory usage |
Hash Methods Comparison
Method | Adaptability | Complexity | Performance |
|---|---|---|---|
Static | Fixed size | Simple | Degrades with growth |
Extendable | Dynamic | Moderate | Consistent |
Choosing the Right Approach
Consider these factors:
-
Query patterns: Range queries → B+ trees, Exact matches → Hashing
-
Data growth: Predictable → Static, Unpredictable → Dynamic
-
Update frequency: Many updates → Consider overflow handling
-
Storage constraints: Limited space → Sparse indices, Plenty of space → Dense indices
Most Common Choice: B+ trees offer the best balance for most database applications, which is why they're the default in most commercial database systems.