Page 9: Dynamic Hashing - Extendable Hashing

Problem with Static Hashing

Static hashing uses a fixed number of buckets, causing problems as databases grow:

Problems:

Too few buckets initially → Many overflows → Poor performance
Too many buckets initially → Wasted space
Fixed structure → Can't adapt to growth

Solution: Dynamic Hashing

Adjust hash structure as database grows or shrinks.

Extendable Hashing Concept

Key Idea: Use variable number of bits from hash value

Example Hash Values (4-bit for simplicity):

"Physics" → hash = 0011
"Math" → hash = 1001  
"Chemistry" → hash = 0101
"Biology" → hash = 1100

Dynamic Bit Usage:

Initially use 1 bit (0 or 1)
As needed, increase to 2 bits (00, 01, 10, 11)
Then 3 bits, 4 bits, etc.

Data Structure Components

1. Bucket Address Table

Array indexed by first i bits of hash value
i = global depth (starts small, grows as needed)

2. Buckets

Actual storage containers for records
Each has local depth ij (bits needed for records in this bucket)

3. Hash Function

Generates uniform b-bit values (usually b=32)
Use first i bits for bucket address table lookup

Example: Building Extendable Hash

Initial State (i=1, using 1 bit):

Global depth i = 1
Bucket Address Table:
0 → Bucket A (local depth = 1)
1 → Bucket B (local depth = 1)

Buckets:
Bucket A: [empty] (for hash values starting with 0)
Bucket B: [empty] (for hash values starting with 1)

Step 1: Insert "Physics" (hash = 0011, starts with 0)

Bucket A: [Physics]
Bucket B: [empty]

Step 2: Insert "Math" (hash = 1001, starts with 1)

Bucket A: [Physics]
Bucket B: [Math]

Step 3: Insert "Chemistry" (hash = 0101, starts with 0)

Bucket A: [Physics][Chemistry] 
Bucket B: [Math]

Step 4: Insert "Biology" (hash = 1100, starts with 1)

Bucket A: [Physics][Chemistry]
Bucket B: [Math][Biology]

Bucket Split Example

Assume bucket size = 2, and we insert "History" (hash = 0110)

Bucket A is full! Need to split.

Split Process:

Increase global depth: i = 1 → i = 2
Double bucket address table size
Split Bucket A using 2 bits

Before Split (i=1):

0 → Bucket A: [Physics(0011)][Chemistry(0101)]
1 → Bucket B: [Math(1001)][Biology(1100)]

After Split (i=2):

Global depth i = 2
00 → Bucket A: [Physics(0011)] (local depth = 2)  
01 → Bucket C: [Chemistry(0101)][History(0110)] (local depth = 2)
10 → Bucket B: [Math(1001)] (local depth = 1)
11 → Bucket B: [Biology(1100)] (local depth = 1)

Notice: Buckets 10 and 11 both point to Bucket B because Biology and Math only need 1 bit to distinguish.

Search Operation

To find "Chemistry" (hash = 0101):

Take first i=2 bits: 01
Look up bucket address table[01] → Bucket C
Search Bucket C linearly

Insertion Algorithm

To insert record with key K:

Compute hash h(K)
Take first i bits → bucket address j
If bucket j has space → insert record
If bucket j is full:
- If i = ij (global depth = local depth):
  - Double bucket address table (i = i + 1)
  - Split bucket using i bits
- If i > ij:
  - Split bucket without doubling table
  - Increase local depth: ij = ij + 1

Advantages of Extendable Hashing

Compared to Static Hashing:

✅ Adapts to database growth
✅ No performance degradation
✅ Minimal space overhead
✅ No periodic reorganization needed

Minor Disadvantages:

Slightly more complex implementation
Extra indirection through bucket address table
But indirection cost is negligible compared to disk I/O

When to Use Each Method

Static Hashing: Good when database size is predictable and stable

Extendable Hashing: Best for growing databases with unpredictable sizes

B+ Trees: Best overall choice for most applications due to:

Excellent range query support
Predictable performance
Good for both growth and mixed workloads

Summary and Key Takeaways

File Organization Methods Comparison

Method	Best For	Advantages	Disadvantages
Heap	Insertions	Simple, fast inserts	Slow searches
Sequential	Range queries	Fast ranges, ordered	Slow insertions
Hash	Exact matches	Very fast lookups	Poor ranges
B+ Tree	Mixed workload	Balanced performance	More complex

Index Types Summary

Index Type	Use Case	Performance
Dense	Small tables, frequent access	Fast but more storage
Sparse	Large tables, clustering index	Less storage, still fast
Multilevel	Very large indices	Manageable memory usage

Hash Methods Comparison

Method	Adaptability	Complexity	Performance
Static	Fixed size	Simple	Degrades with growth
Extendable	Dynamic	Moderate	Consistent

Choosing the Right Approach

Consider these factors:

Query patterns: Range queries → B+ trees, Exact matches → Hashing
Data growth: Predictable → Static, Unpredictable → Dynamic
Update frequency: Many updates → Consider overflow handling
Storage constraints: Limited space → Sparse indices, Plenty of space → Dense indices

Most Common Choice: B+ trees offer the best balance for most database applications, which is why they're the default in most commercial database systems.