Gossip/Epidemic Protocols
The problem these protocols solve is the Multicast problemMulticast
1. Get information out to other members in the group(only few in entire network), in contrast to broadcast which is to send information to everyone(all nodes) on the network.2. Fault tolerance and Scalability
a. Nodes may crash
b. Packets may be dropped
c. Scale to 1000's of nodes, with overhead per node which doesn't grow very quickly as number of nodes grows into 1000s or even tens of thousands
Protocol that is executed at each of the nodes, including the sender is called Multicast protocol
Multicast protocol typically sits at the application layer (not the network layer).
Network layer has IP multicast and is implemented by routers and switches, but typically applications don't rely on it since not all switches/routers might implement it.
Multicast protocols:
1. Centralized : There is a sender and it sends to UDP/TCP packets to all the receiver nodes in the group.Simplest implementation, but issues with fault tolerance (sender is single point of failure) and sender overhead affecting scalability and hence latency
2. Tree-based : IPmultipcast, SRM, RMTP, TRAM, TMTP.
Sender sends to a few nodes and those nodes in turn send to other nodes.
Sender is not single point of failure and doesn't have the overhead of sending the packet to all receivers helping in scalability and latencies, especially when the tree is well-balanced
Issues : Setup and maintain the tree, initially on bringup and on node failures (especially the non-leaf node failures which are responsible for transmitting to all nodes in its subtree)
a. Build a spanning tree among the processes of the multicast group
b. Use spanning tree to disseminate multicasts (might use IP multicasts for dissemination)
c. Use either ACKs or NAKs to repair multicasts not received
d. SRM (scalable Reliable Multicast)
- Uses NAKs
- But adds random relays, and uses exponential backoff to avoid NAK storms
e. RMTP (Reliable Multicast Transport Protocol)
- Uses ACKs
- But ACKs only sent to designated receivers, which then re-transmit missing multicasts
f. These protocols still cause an O(n) ACK/NAK overhead. So as the number of nodes increase, ACK/NAK overhead increases linearly and hence these protocols are not as scalable as they are thought to be. Hence, the need for gossip-based or epidemic-based protocols.
The Gossip Protocol
1. The sender periodically (every 5 secs or so) transmits gossip message(using UDP) to "b" (gossip fan-out, typically b=2) random targets.2. Once the receiver receives a gossip message, it is said to be infected and now it periodically sends gossip messages to "b" targets. It's possible a node receives the same multicast message from different senders. In that sense, gossip protocol uses slightly higher number of messages, but the overhead is negligible when considering the advantages
3. Gossip protocol is not synchronized on all nodes and runs on its local clock
Types of Gossip protocols:
1. Push gossip : Once you have a multicast message, you start gossiping about it. If there are multiple messages, gossip a random subset of them, or recently-received ones, or higher priority ones2. Pull gossip : Periodically poll a few randomly selected processes for new multicast messages that you haven't received and get those messages. In this type, all nodes (infected or uninfected) keep sending gossip messages
3. Hybrid variant or push-pull : In the initial query when you have the pull query going out, you also receive other recently-received push gossip messages making it "push-pull" variant.
Gossip Analysis:
Properties of simple push protocol1. Lightweight in large groups
2. Spreads a multicast quickly (latency) : O(log n)
3. Highly fault-tolerant (reliable) : Gossip protocol is similar to how rumors spread in the society, diseases spread in human population and how virus/worm spread in computer systems. Once the entity is out, it's hard to contain or kill it.
In all forms of gossip(push or pull), it takes O(log n) rounds before about n/2 gets the gossip because that's the fastest you can spread a message - a spanning tree with fanout of constant degress has O(log n) total nodes
Thereafter, pull gossip is faster than push gossip because gossip in subsequent round proceeds super-exponentially (not just exponentially). Second half of pull gossip finishes in time O(log(log n))
Topology-aware Gossip for when crossing subnets or racks within datacenter. The load on the router will be high of O(n).
The fix is to gossip within subnet/rack with a higher probability and with lower probability for inter-subnet/inter-rack nodes. Once the gossip message crosses the subnet, within that subnet it gossips with higher probability.
Thus router load becomes O(1) and dissemination time of o(log n)
Gossip Implementations
1. Clearinghouse and Bayou projects by Xerox : email and database transaction systems [PODC 87]2. refDBMS system [Usenix 94]
3. Bimodal Multicast
4. Sensor and wireless networks
5. AWS EC2 and S3 cloud (rumored, no design details published)
6. Cassandra key-value store (and others) use gossip for maintaining membership lists
7. Usenet NNTP (Network News Transport Protocol)
In summary, Gossip is a solution to multicast problem. There are centralized and tree-based solutions too, but gossip scales much better and is more fault-tolerant than other solutions.
No comments:
Post a Comment