Page 8: Static Hashing

What is Hashing?

Hashing directly computes where to store/find records using a mathematical function, eliminating the need for index structures.

Basic Concept

Hash Function: h(key) = bucket address

Example:

h(dept_name) = sum of character codes % 8

"Physics" → P(16) + h(8) + y(25) + s(19) + i(9) + c(3) + s(19) = 99
99 % 8 = 3 → Store in bucket 3

"Math" → M(13) + a(1) + t(20) + h(8) = 42  
42 % 8 = 2 → Store in bucket 2

Hash Operations

Insertion:

Compute h(search_key)
Store record in that bucket

Search:

Compute h(search_key)
Search only that bucket

Deletion:

Compute h(search_key)
Find and remove record from bucket

Hash Function Requirements

Good Hash Function Properties:

Uniform Distribution: Spreads keys evenly across buckets
Random Distribution: No correlation with external patterns

Bad Hash Function Example:

Map department names by first letter:
A-departments → Bucket 1
B-departments → Bucket 2
...

Problem: More departments start with 'C' than 'X'

Better Hash Function for Strings:

hash = 0
for each character c in string:
    hash = hash * 31 + ascii_value(c)
return hash % number_of_buckets

Example: Instructor File Hashing

Data:

Record 1: Physics → h("Physics") = 3 → Bucket 3
Record 2: Math → h("Math") = 2 → Bucket 2  
Record 3: Chemistry → h("Chemistry") = 1 → Bucket 1
Record 4: Physics → h("Physics") = 3 → Bucket 3 (collision!)

Result:

Bucket 0: [empty]
Bucket 1: [Chemistry record]
Bucket 2: [Math record]  
Bucket 3: [Physics record 1][Physics record 4]

Bucket Overflow Problem

Causes of Overflow:

1. Insufficient Buckets

Too few buckets for number of records
Example: 100 records, 5 buckets → Average 20 records per bucket

2. Skew

Uneven distribution due to:
- Multiple records with same key
- Poor hash function

Example of Skew:

University with many Computer Science instructors:
Bucket 0: [empty]
Bucket 1: [Comp. Sci.][Comp. Sci.][Comp. Sci.][Comp. Sci.]...overflow!
Bucket 2: [Physics]
Bucket 3: [Math]

Handling Overflow: Overflow Chaining

Solution: Chain overflow buckets together

Example:

Primary Bucket 1: [Record A][Record B] → Overflow Bucket 1a
Overflow Bucket 1a: [Record C][Record D] → Overflow Bucket 1b  
Overflow Bucket 1b: [Record E] → NULL

Search with Overflow:

Hash to primary bucket
Search primary bucket
Follow chain through all overflow buckets

Preventing Overflow

Formula: Number of buckets = (nr/fr) × (1 + d)

nr = number of records
fr = records per bucket
d = fudge factor (usually 0.2)

Example:

1000 records, 10 records per bucket
Buckets needed = (1000/10) × (1 + 0.2) = 100 × 1.2 = 120 buckets
20% waste but much fewer overflows

Hash Indices

Hash functions can also organize indices (not just files).

Example: Hash index on instructor ID

ID 12345 → h(12345) = 2 → Bucket 2: [12345 → pointer to record]
ID 67890 → h(67890) = 5 → Bucket 5: [67890 → pointer to record]

Benefits:

Very fast exact-match queries: O(1) average time
No need to traverse tree structures

Limitations:

Poor for range queries
Performance degrades with poor hash function