Consistent Hashing: The Math Behind Load Balancing
Hey there! π
Imagine: you have 1000 servers and millions of requests. How do you distribute load evenly so that adding a new server doesn’t require rehashing everything?
Regular hash(key) % servers becomes a disaster when scaling. Add one server and 90% of data needs to be moved!
But there’s an elegant solution: Consistent Hashing. The mathematics that saves you from chaos in distributed systems.
Let’s dive into the algorithm used by Amazon DynamoDB, Cassandra, and Redis Cluster π
1. The Problem with Regular Hashing
Classic Approach (and why it fails)
// Naive load balancer
func getServer(key string, servers []string) string {
hash := fnv.New32a()
hash.Write([]byte(key))
return servers[hash.Sum32() % uint32(len(servers))]
}
// Problem: added server - everything breaks!
servers := []string{"server1", "server2", "server3"}
key := "user123"
// Before: user123 -> server2
fmt.Println(getServer(key, servers)) // server2
// After adding server4:
servers = append(servers, "server4")
fmt.Println(getServer(key, servers)) // server1 (!)
Result: when the number of servers changes, most keys end up on different servers!
Mathematics of the Problem
When changing server count from n to n+1:
- Percentage of keys staying on same servers:
n/(n+1) - Percentage of keys to move:
1/(n+1)
For 10 servers: 90% keys stay, 10% move β
For 100 servers: 99% stay, 1% moves β
But! In reality, due to modulo operation, much more gets moved.
2. Consistent Hashing: The Concept
Core Idea
Instead of direct mapping key -> server, we create a hash ring:
- Servers are placed on the ring by their hash values
- Keys are also hashed and placed on the ring
- Each key is served by the first server clockwise
server1(100)
|
key1(50) -----> server1
|
server3(200)
|
key2(150) ----> server3
|
server2(300)
Mathematical Foundation
Hash Ring is a circle with coordinates [0, 2^32-1].
For key k and server set S:
server(k) = min{s β S : hash(s) β₯ hash(k)} βͺ {min(S)}
Where min(S) is the server with minimum hash (for wrap-around).
3. Go Implementation
Basic Structure
type ConsistentHash struct {
ring map[uint32]string // hash -> server
sortedHashes []uint32 // sorted hashes
replicas int // virtual nodes
}
func New(replicas int) *ConsistentHash {
return &ConsistentHash{
ring: make(map[uint32]string),
replicas: replicas,
}
}
Adding Servers
func (ch *ConsistentHash) Add(servers ...string) {
for _, server := range servers {
for i := 0; i < ch.replicas; i++ {
hash := ch.hashKey(fmt.Sprintf("%s:%d", server, i))
ch.ring[hash] = server
ch.sortedHashes = append(ch.sortedHashes, hash)
}
}
sort.Slice(ch.sortedHashes, func(i, j int) bool {
return ch.sortedHashes[i] < ch.sortedHashes[j]
})
}
func (ch *ConsistentHash) hashKey(key string) uint32 {
h := fnv.New32a()
h.Write([]byte(key))
return h.Sum32()
}
Finding Server
func (ch *ConsistentHash) Get(key string) string {
if len(ch.ring) == 0 {
return ""
}
hash := ch.hashKey(key)
// Binary search for first server >= hash
idx := sort.Search(len(ch.sortedHashes), func(i int) bool {
return ch.sortedHashes[i] >= hash
})
// Wrap around if we reached the end
if idx == len(ch.sortedHashes) {
idx = 0
}
return ch.ring[ch.sortedHashes[idx]]
}
4. Virtual Nodes
Uneven Distribution Problem
With one hash per server, distribution can be uneven:
server1(50) -> 25% load
server2(100) -> 25% load
server3(300) -> 50% load (!)
Solution: Multiple Hashes
Each server is placed on the ring multiple times with different hashes:
// Instead of single hash(server1)
// Create: hash(server1:0), hash(server1:1), ..., hash(server1:N)
for i := 0; i < replicas; i++ {
hash := hashKey(fmt.Sprintf("%s:%d", server, i))
ring[hash] = server
}
Result: more even load distribution.
Optimal Virtual Node Count
Rule of thumb: replicas = 150-200
- Less than 100: uneven distribution
- More than 500: excessive computation
5. Performance and Optimizations
Basic Implementation Benchmark
func BenchmarkConsistentHash(b *testing.B) {
ch := New(150)
ch.Add("server1", "server2", "server3")
b.ResetTimer()
for i := 0; i < b.N; i++ {
ch.Get(fmt.Sprintf("key%d", i))
}
}
// Result: ~200ns per operation
Optimization: Precomputed Hashes
type OptimizedHash struct {
servers []string
hashes []uint32
cache map[string]string // key -> server cache
}
func (oh *OptimizedHash) Get(key string) string {
if cached, ok := oh.cache[key]; ok {
return cached
}
hash := hashKey(key)
idx := sort.Search(len(oh.hashes), func(i int) bool {
return oh.hashes[i] >= hash
})
if idx == len(oh.hashes) {
idx = 0
}
server := oh.servers[idx]
oh.cache[key] = server // Cache result
return server
}
6. Real Example: Redis Cluster
How Redis Uses Consistent Hashing
// Redis uses 16384 slots (2^14)
const RedisSlots = 16384
func getRedisSlot(key string) int {
return int(crc16(key) % RedisSlots)
}
// Each node handles a range of slots
type RedisNode struct {
addr string
slots []int // [0-5460], [5461-10922], [10923-16383]
}
Slot Migration When Adding Node
func redistributeSlots(nodes []RedisNode, newNode RedisNode) {
slotsPerNode := RedisSlots / len(nodes)
for i := range nodes[:len(nodes)-1] {
// Move some slots to new node
moveSlots := slotsPerNode / len(nodes)
// Atomic slot migration...
}
}
7. Practical Applications
Load Balancer with Consistent Hashing
type LoadBalancer struct {
ch *ConsistentHash
servers map[string]*Server
}
func (lb *LoadBalancer) Route(request *http.Request) *Server {
// Use session ID or client IP for stickiness
key := request.Header.Get("Session-ID")
if key == "" {
key = request.RemoteAddr
}
serverName := lb.ch.Get(key)
return lb.servers[serverName]
}
Distributed Cache
type DistributedCache struct {
ch *ConsistentHash
clients map[string]*redis.Client
}
func (dc *DistributedCache) Set(key, value string) error {
serverName := dc.ch.Get(key)
client := dc.clients[serverName]
return client.Set(key, value, 0).Err()
}
func (dc *DistributedCache) Get(key string) (string, error) {
serverName := dc.ch.Get(key)
client := dc.clients[serverName]
return client.Get(key).Result()
}
8. Monitoring and Metrics
Key Metrics for Consistent Hashing
type HashMetrics struct {
KeyDistribution map[string]int // server -> key count
LoadBalance float64 // imbalance coefficient
RehashOperations int // rehash count
}
func (ch *ConsistentHash) GetMetrics() HashMetrics {
distribution := make(map[string]int)
// Simulate 10000 keys
for i := 0; i < 10000; i++ {
key := fmt.Sprintf("key%d", i)
server := ch.Get(key)
distribution[server]++
}
// Calculate load balance coefficient
loadBalance := calculateLoadBalance(distribution)
return HashMetrics{
KeyDistribution: distribution,
LoadBalance: loadBalance,
}
}
Conclusion: When to Use Consistent Hashing
β Use when:
- Frequent server addition/removal
- Need session affinity
- Distributed caching
- Database sharding
β Don’t use when:
- Static server count
- Need perfectly even distribution
- Simple round-robin is sufficient
Main advantage: when adding/removing servers, only 1/n of data moves instead of majority.
P.S. Using consistent hashing in your projects? Share your experience! π
// Additional resources:
// - "Consistent Hashing and Random Trees" (Karger et al.)
// - Amazon DynamoDB Paper
// - Redis Cluster Specification