Monday, December 8, 2008

Publish/Subscribe Message Delivery

In a previous post, I argued that publish/subscribe was the only tricky thing needed for a totally flexible interest management based online game system.

Publish/subscribe (producer/consumer) based message systems give semantics similar to multicast. A producer sends a message to a channel. All current consumers on that channel receive a copy of that message. To avoid becoming broadcast (where every consumer receives every message sent), the messages are decomposed into channels using a Category, one per channel (so you can think of it as a channel id). A Category is an integer so they are trivial to deal with at the lower level (as opposed to strings or something). For simplicity, each message is only sent to one Category.

This system is very loosely coupled giving it a lot of flexibility and extensibility. A producer does not need to know the existence of any of the consumers. The set of consumers and their implementation can change without touching the producer. For example, a logging system could be attached to a channel without affecting the system, and would give good data for debugging.

To implement the publish/subscribe system efficiently, we must manage the producer and consumer subscription requests efficiently. Broadcasting that a consumer is interested in some Category to each producer is too inefficient. So we introduce the notion of a channel manager that keeps track of the interests of all producers and consumers.

The channel manager is responsible for redistributing each data message. A producer sends a message to the channel manager. The channel manager maintains the list of interested consumers, and forwards a copy of the producer's message to each consumer. We have exchanged the non-scalable broadcast of subscription messages for an extra hop of latency for each data message.

The channel manager can easily be made scalable. The simplest approach is to use the integer Category value and a simple a modulus operation to load balance across any number of channel manager processes. Both producers and consumers use the same computation. And all subscriber messages and all data messages on one Category travel through a single channel manager.

This architecture is the obvious one. There are more sophisticated approaches that can reduce the two hop latency by using a direct connections between producers and consumers. The subscription messages still need to route through a channel manager, but the producers need to maintain the list of interested consumers. This adds the requirement that producers subscribe to produce, and adds more subscription messages and more latency on a subscription. There are also subtle data message ordering problems.

If you want to go nuts, you could use real multicast. The challenge there is that there are limited numbers of multicast groups. So you have to solve the problem of multiple channels sharing one multicast group.

So you get to choose. Easy implementation or optimized but tricky implementation. Like most code. In this case I argue that the simple approach has good enough performance for the needs of online games. The producers and channel manager live in a data center on hosts attached to a high speed switch, so network latency is minuscule.

The design philosophy of this system is to minimize unnecessary computation due to unwanted messages arriving on a host that are just thrown away. Hosts cost money. Bandwidth inside the data center is free. So good interest management is key.

So. We have sliced off the publish/subscribe problem. All we have left is how to approach interest management policies which are application specific.

No comments:

Post a Comment