1. What is a Distributed Database?
A distributed database is a database stored across multiple sites/computers, connected through a network.
Each site:
-
Has its own DBMS
-
Can work independently
-
Can store part of the data or full copies
Example:
A bank has branches in Mumbai, Delhi, and Chennai.
Each branch stores account information of its local customers, but can also access data from other cities when needed.
Difference from parallel databases:
-
Parallel DB = tightly connected processors, one database
-
Distributed DB = loosely connected sites, multiple DBMSs
2. Types of Distributed Databases
a. Homogeneous Distributed DB
-
All sites run the same DBMS
-
All sites agree to cooperate
-
Same schema everywhere
Example:
All branches of a bank use MySQL with the same schema.
b. Heterogeneous Distributed DB
-
Sites use different DBMS and different schemas
-
Harder to integrate
Example:
One site uses PostgreSQL, another uses Oracle.
3. Distributed Data Storage
A relation can be stored at multiple sites using:
-
Replication
-
Fragmentation
-
A combination of both
3.1 Replication
The relation is stored as copies at multiple sites.
Advantages
-
High availability
If one site fails, another has the data. -
Faster read performance
Read queries can be processed at multiple places. -
Less data transfer
Data is usually available locally.
Disadvantages
-
Update overhead
All copies must be updated. -
Complex concurrency control
Must keep all replicas consistent.
Example:
Account details are copied at 3 branches.
If one branch goes offline, the data is still available.
3.2 Fragmentation
A relation is split into parts, and each part is stored at different sites.
Two types:
a. Horizontal Fragmentation
Split rows based on a condition.
Example:
account table split by branch:
-
account1 = accounts of Mumbai branch
-
account2 = accounts of Delhi branch
To reconstruct original:
account = account1 ∪ account2
b. Vertical Fragmentation
Split columns, but keep a key in each fragment for joining later.
Example:
employee_info split into:
-
employee_public_info (id, name, designation)
-
employee_private_info (id, salary)
Reconstruction:
employee_info = employee_public_info ⋈ employee_private_info
3.3 Transparency
The user must not know:
-
Where data is stored
-
How many fragments/replicas exist
-
How data is accessed
Types:
-
Fragmentation transparency
-
Replication transparency
-
Location transparency
Example:
User just writes:
SELECT * FROM account WHERE account_number = 20;
The system finds the right site automatically.
4. Distributed Transactions
A transaction that accesses data at multiple sites.
Types:
-
Local transactions: run on single site
-
Global transactions: run on multiple sites
To maintain ACID across sites, all sites must coordinate.
4.1 Transaction Manager and Coordinator
Each site has:
-
Transaction Manager (TM)
Handles local execution, logging, and concurrency. -
Transaction Coordinator (TC)
Starts transactions, sends subtransactions to sites, coordinates commit or abort.
5. Two-Phase Commit Protocol (2PC)
Guarantees:
A transaction either commits at all sites, or aborts at all sites.
Phase 1: Prepare Phase
Coordinator asks: "Are you ready to commit?"
Each site replies:
-
YES (ready T) → writes log, prepares
-
NO (abort T) → transaction will be aborted
Phase 2: Commit Phase
-
If all sites say YES → commit everywhere
-
If any site says NO → abort everywhere
Drawback:
If coordinator fails, sites may wait → blocking problem
Example:
Booking a flight + hotel:
If one site fails before commit, both must rollback.
6. Three-Phase Commit (3PC)
Adds an extra phase to avoid blocking.
More complex and slower, rarely used.
7. Concurrency Control in Distributed Systems
Goal: ensure correctness when multiple transactions run at different sites.
Main approaches:
7.1 Locking Methods
a. Single Lock Manager
One site controls all locks.
Simple but bottleneck and single point of failure.
b. Distributed Lock Manager
Each site manages locks for its own data.
Requires global deadlock detection.
c. Primary Copy
Only one site controls lock for a replicated item.
Example:
For account A, Mumbai branch is primary.
All locks for A go to Mumbai.
d. Majority Protocol
To read or write, lock majority of replicas.
Example:
If object has 3 replicas → need 2 for permission.
e. Biased Protocol
Shared locks are easy, exclusive locks require locking all replicas.
f. Quorum Consensus
Uses weights for sites, flexible read/write balancing.
7.2 Timestamp Ordering
Each transaction gets a unique timestamp.
Operations are ordered by timestamps.
Uses logical clocks or physical clocks.
7.3 Replica Consistency (Weak forms)
Master–slave replication
-
Updates at primary
-
Read from replicas
-
Replicas updated periodically
Multimaster (update-anywhere)
-
Writes can happen anywhere
-
Conflict resolution needed
-
Harder to maintain consistency
7.4 Deadlock Handling
Each site keeps a local wait-for graph.
A global deadlock may not show in local graphs.
Coordinator collects all graphs and checks for cycles.
Problems:
-
False deadlocks
-
Delayed messages may create false cycles
8. Availability in Distributed Databases
Goal: database should keep working despite failures.
Failures may include:
-
Site failure
-
Link failure
-
Network partition
-
Lost messages
System must:
-
Detect failure
-
Reconfigure
-
Recover
8.1 Majority-Based Approach for Availability
Reads require majority of replicas.
Writes update majority and carry version numbers.
After failure, recovering sites catch up automatically.
8.2 Read-One Write-All Available
Reads any one replica.
Writes all available replicas.
High availability but risk of inconsistencies during partitions.
8.3 Site Reintegration
When a site recovers:
-
Must update catalog
-
Must get latest replica values
-
May need to replay missed updates
9. Coordinator Selection
If a coordinator fails, a new one must be chosen.
Bully Algorithm
-
Highest-numbered active site becomes coordinator
-
A recovering site with higher ID can "bully" its way to leadership
Problem:
Each partition may elect its own coordinator unless majority checking is enforced.
10. CAP Theorem (Consistency, Availability, Partition Tolerance)
A distributed system can guarantee only two of the following:
-
Consistency
-
Availability
-
Partition Tolerance
Partition tolerance is unavoidable → must choose between:
-
Consistency (banks, finance)
-
Availability (social networks, caches)
Example:
Social network prefers availability → shows stale data instead of failing.