Sunday, January 16, 2011

Topics are not Message Types

I periodically have an unproductive conversation about how to use Topics/Categories vs how to use Message Types. Hopefully this time will be better.

Both things appear to be used to "subscribe", and both wind up filtering what a message handler has to process and gets to process. If they can be used for exactly the same purposes, it is "just" policy as to what you use each one for. That has to be wrong, otherwise there would not be *two* concepts. Tus, there has to be a useful distinction. So let's define what they are and what their responsibilities are.

First a definition or two:
  • Hierarchical: a name is defined hierarchically if the parent context is needed to ensure the child is distinct from children of other parents when the children have the same name. The parents provide the namespace in which the child is defined.
  • Orthogonal: names are independent of one another, like dimensions or axes in mathematics.

Categories are names (or numbers) that are used to decompose a stream of messages into groups. In JMS they are called Topics, but I'm going to avoid that term in case the implementation of Topics in JMS implies something I don't mean. A message is sent on, or "to" a single Category. A consumer subscribes to one or more Categories. Sophisticated message publish/subscribe or producer/consumer implementations can support wildcards or bitmasking to optimize subscription to large sets of Categories. (While not very germane to this discussion, I believe JMS can only have wildcards at the end of a Topic, and only at a dot that separates portions of the Topic. My view of wildcards and Category masking does not have that limitation. But that shouldn't affect my arguments.)

It is critical to have a mechanism that efficiently filters network messages so that a consuming process is not "bothered" by messages arriving that are immediately discarded. Running the TCP stack, for example, can wind up consuming large fractions of the CPU, and if the message is discarded, even after a simple inspection by your message framework, that is totally wasted processing. Further, if the messages are traveling over a low bandwidth link to a player, for example, it can badly affect their experience as it steals network resources from more important traffic. So we want the sender, or some intermediary to filter the messages earlier.

Early distributed simulation implementations (DIS) used multicast groups, and relied on the Network Interface hardware to filter out any messages in groups that the consumer had not subscribed to. Ethernet Multicast tends to broadcast all messages, and rely on the NIC of each host to inspect and filter unwanted messages. That is better than having the kernel do it. Switches get into the picture, but are very simplistic when it comes to multicast. When there are more than a few groups, switches and NICs will become promiscuous, and all messages get broadcast anyway, and wind up in each destination's kernel. They are filtered there, but much of the network stack has already executed. To get around that, physical network segmentation with intelligent bridges were built to copy a message from one segment to another. The bridge or rebroadcaster or smart-router would crack open each message and send it into another segment based on configuration, or a control protocol (subscription request messages).

Ancient history. However, it formed the origin of the concept of numeric Categories. A message is sent to a single Category. A consumer subscribes. The Channel/Category/Subscription manager maintains the declared connectivity and routes the messages.

So. Categories are used to optimize routing. They minimize the arrival of a message to a process. So far, this has nothing to do with what code is run when it arrives.

Message types are also names but are used to identify the meaning of a message; what the message is telling or requesting of the destination; what code should run when the message arrives (or what code should not run). Without a message type, there would be only one generic handler. In the old days, that master-handler would be a switch statement, branching on some field(s) of the message (lets call that field the message type, and be done with it).

There is some coded, static binding of a message type to a piece of code; the message handler. Handler X is for handling messages of type Y. A piece of code cannot process fields of a message different than what it was coded for. There is little reason to make that binding dynamic or data-driven. Static binding is "good". It leads to fewer errors, and those error can be caught much earlier in the development cycle. Distributed systems are hard. You don't really want to catch a message-to-code mismatch after you've launched. One way to think about this static binding is as a Remote Procedure Call. You are telling a remote process to run the code bound to message Y. In fact, you can simplify your life by making the handler have the same name as the message type, and not even register the binding.

A message can be sent to any Category regardless of the message's type. There is no checking in code that a choice is "legal". The Category can be computed, and the message is bound to that value dynamically. Instances of the same message type can be sent to one of any number of Categories. Consumers can subscribe to any Category whether they know how to process all the message types it contains or not.

So. Back to the distinction. When code is declared to be able to handle messages of type Y, that does not imply that all message instances of type Y should arrive at the process with that handler. You may want to do something like load balancing where half the messages of type Y go to one process, and the other half go to a tandem process. So message types are independent of Categories. The two concepts are orthogonal.

When a process is subscribed to a Category, there is no guarantee to the subscriber about the message types that a producer sends to that Category. It is easy to imagine a process receiving messages it does not know how to handle. The sender can't force the receiver to write code, but the sender can put any Category on a message it wants. So Categories are independent of message types. The two concepts are orthogonal.

Now. With respect to hierarchy. Message type names can be declared within a hierarchical namespace. That can be pretty useful. At the end of the day, however, they are simply some strings, or bit strings. In a sophisticated system that maps message types to message classes (code), the class hierarchy may mirror the type name hierarchy, and have interesting semantics (like a handler for a base message class being able to handle a derived message class). But mostly, message type name hierarchy is useful to avoid collisions.

In systems like JMS, Categories (Topics) are also hierarchical. This is also done to avoid collisions in the topic namespace, and for organization. But it is also useful for wildcard subscription.

Now "the" question: are Categories within the Message Type Hierarchy, or are Message Types within the Category hierarchy? Or are they orthogonal to one another? I submit that a message of a given type means the same thing no matter which Category it arrived on. Further, the same message type can be sent to any Category and a Category can transport any number of different message types.

Since there is only one message exchange system, Categories cannot be reused for two purposes without merging the message streams. That leads to inefficiency. If you reuse a message type name for two different purposes, you run the risk of breaking handler code with what appears to be a malformed message. That leads to crashes. You could permit that kind of reuse, and institute policy and testing to keep those things from mingling (e.g. reuse message types, but only on different topics), but it is a looming disaster. I would put in some coordination mechanism or name spacing to keep the mingling from happening at all.

So what are the consequences:
  • There is no need to include Category when registering a message handler. 
  • Category subscription occurs separately from handler-to-message-type mapping, and affects the entire process.
  • There is no need to build a message dispatcher that looks at Categories.
Well. That was pretty long winded. For those of you still here, I have an analogy. I haven't thought it through a lot, but it looks like it fits (although it is about a pull system, not a push system). URLs. The hostname and domain name represent a hierarchical Category or Topic. The path portion is the message type and identifies the handler (web service), and is also hierarchical. You can host your web site on any host on any domain, and the functionality would be the same. You can host any web site on your host. You can host any number of web sites on your host, provided the paths don't collide. If they do collide, you are going to get strange behavior as links refer to the wrong services, or pass the wrong parameters. One would need more hierarchy. Or you don't host the colliding web sites together. You put them on different addresses. But the service code doesn't care what address you choose.

Unless you talk about virtual hosts, or virtual processes, multiple independent connections to the message system, thread-local subscriptions. You can do *anything* in software. But should you?

Wednesday, December 22, 2010

The Real Priorities of Online Game Engineering

I was trying to communicate to management that server developers have different priorities than game developers. As a means to show the importance of laying in administrative infrastructure, and other software engineering "overhead", I put this list together. Hope it helps you to think about making the right investment in making the system sustainable, and make those points to the powers that be.


This is a list of priorities in absolute order of importance. While it is good to address all of them, if we don’t have one of the higher priority requirements solved to a reasonable degree, there is not much point in having the lower ones.

I made this to help us focus on what is important, what order to do things, and what we might cut initially. I’d love to debate this over lunch with anyone. I’m hoping others think of more of these kind of driving requirements.
  1. Don’t get sued. In particular, always deal with child safety. We also need to abide by our IP contract obligations (sometimes including the shipping date). Better to not ship than get sued into oblivion or go to jail.
  2. Protect the game’s reputation. Even if the game is awesome, if the public thinks it isn’t or the service is poor, then we lose. This is especially important early in the lifecycle. This implies not shipping too early.
  3. Be able to collect money. Even if there is no game.
  4. Be able to roll out a new version periodically. Even if the game is broken or not finished, this means we can fix it. This implies:
    1. You can make a build
    2. You can QA it
    3. You can deploy it without destroying what is already there, or at least roll back
  5. Support effort is sustainable. If the game and system are perfect, but it needs so much handholding that our staff burns out, or we don’t have the resources to extend it, we still fail. This implies lots of stuff:
    1. It is stable enough that staff is not working night and day to hold its hand.
    2. There is enough automated maintenance to limit busy work
    3. There is enough automated logging, metrics and alarms to limit time spent hovering
  6. The cost of operating is not excessive. I.e. it sits lightly enough on the hardware that we don’t need massive amounts, or exotic types. (Special warning to engineers: it is all the way down here before we start to care about performance. And the only reason to care about performance is operating cost.)
  7. Enough players can connect. This implies lots of stuff:
    1. The cluster hardware exists at all, the network is set up, etc
    2. There is a web site
    3. Key platform and login features exist
    4. There are enough related server and game features
  8. The server is sufficiently stable that players can remain connected long enough. This implies lots of stuff:
    1. It stays up.
    2. There are no experience-ruining bugs or tuning problems.
    3. Not too much lost progress when things crash.
    4. The load is not allowed to get too high (population caps)
      • This is probably about where we need to get before Closed Beta.
  9. Revenues exceed cost of operation. And eventually, cost of development. This implies not shipping too late. Note that you don't *have* get to this point immediately. And that this is more important than having a fun game.
  10. The game is fun. This implies so much stuff, I won’t write it all down. Note that the requirement to not ruin the game's reputation can move some of this stuff earlier. But don't fool yourself. If you are making money on a game that is not fun, is that bad? I'm sure you can think of some examples of this. Here are some server-specific implications:
    1. You aren’t put in a login queue for too long. You don’t have trouble finding a good time to play.
    2. You aren’t dropped out of the game too often.
    3. The feeling of lag is not that bad.
    4. You can find people to play with. It is an online game, after all.
  11. Players cannot ruin one another’s fun. Note that making the game cheat proof is not the requirement here. The only reason you care about cheating is if other players perceive it badly enough (reputation), or if the players are keeping you from making money.
    1. They cannot grief one another, especially newbs.
    2. They cannot bring down the server
    3. They cannot ruin the gameplay or economy, making swatches of gameplay pointless or boring.
  12. The server can scale to very large numbers of players. This is the profit multiplier.
Be honest with yourself. Are you over engineering? Solving fun technical problems that don't actually address any of these Real Priorities? Doing things in the right order? Remember, as the online engineer, you represent these priorities to management. They may not (yet) understand why this order is important.

Tuesday, December 14, 2010

"Restraint in what you ask for is the key to success" - Arnold's Aphorism

A good friend of mine sent me an email about our current sprint (we use a relatively formal Scrum process), encouraging us to define our stories such that the team could experience a success. He is also a great student of military history. I assumed the quote below was from some past general or philosopher. He claims not. So I've named it after him and am spreading the word.

"Restraint in what you ask for is the key to success" - Arnold's Aphorism

I can picture Napoleon fighting with himself over asking his men to achieve a little too much and fail. Or ask less and wind up with better morale. Applying this to software engineering teams makes a lot of sense. If you always ask for more than can be successfully completed, the team may feel unsuccessful/underachieving, and you may feel disappointed. When in fact, they are doing almost the same work whether you expected more or not.

Here's the thing, though. It's been said before. In all things, moderation. Time management, and interpersonal relationship books talk about expectation management. "Under promise and over deliver". The pessimist is never disappointed and often surprised. Live beneath your means. "Let go". Thou shalt not covet.

This is probably more true of things we ask of ourselves: get this, do that, convince them...

What struck me in the phrasing of this old-new aphorism is how clearly it shows that you have the *choice* in setting the expectations. And yet, that goal is the very thing that defines success. And for many goal-oriented folks like us, that success defines our happiness. Ergo, we choose to be happy. Or not.

But it requires self discipline. Not so much in the exercise of effort, but in the restraining of wanting.

You may have heard analyses of the great American marketing machine. It generates unrequited desires, while offering to fulfill them (for a price). The promise of happiness. Oddly, it is more commonly the restraint of those desires that leads to happiness, not the fulfillment.

I think you are going to see many more instances of this wisdom over then next while. See if you can't identify the aspect that you control.

Wednesday, November 10, 2010

Big load testing

I am a real fan of using the real client to do load testing. Your QA engineers will spend a lot of time building regression tests that verify the behavior of the game is still the same and no new bugs have been introduced. That entails adding scripting or player behavior "simulation" to the game code, but also includes creating the scripts that test the functionality of the game. Those test cases are really important and ideally cover almost all of the game functionality. And they have to be kept up to date as the code in the game changes.

Why not reuse all that work to help load test the server? The scaffolding, client hooks, and test cases?

One of my favorite ways of doing this is to have a test driver that picks random test cases and throws them at the server as fast as possible. Even if the test case involves sleeping or waiting for something like the character walking across some area of the game, if you run enough of them at the same time, you can generate significant load. And it is going to be more realistic than any other kind of test prior to having zillions of real players. It also saves you from having to reproduce the protocol and behavior of the real client and maintain it as the game team evolves everything.

Why not? Even if you take the time to make a headless version of the client, it is probably going to be so resource heavy that you will have trouble finding enough machinery to really ramp up. Most games are designed to tick as fast as possible to give the best framerate, but a headless client doesn't draw, so that is a waste of CPU. Some games rely on the timing intrinsic in animations to control walk speed or action/reaction times for interactions. But you want to strip out as much content as possible to save memory. Clearly there is a bunch of work needed to reduce the footprint of even a headless client. But they really are useful.

One thing you can do to make them more useful is construct a mini server cluster and see how it stands up to as many clients as you can scavenge.

You can get hold of more hardware than you might think by "borrowing" it at night from the corporate pool of workstations. You will need permission, and you will want a fool proof packaging so your clients can be installed (and auto-updated) without manual intervention or a sophisticated user. There is nothing like a robot army to bring your server to its knees. IT doesn't like this idea very much because they like to use night time network bandwidth for doing backup and stuff.

Another important trick is to observe the *slope* of performance changes relative to the change in load you throw at the server. If the marginal effect (incremental server load divided by incremental client load) is > 1 you have a problem. Some people call this non-linear or non-scalable performance. Although, to be technical, it is non-unitary. Non-linear means it is even worse that y = a*x + b. E.g. polynomial (x^2), or exponential (y = a^x). Generally you can find the low hanging fruit pretty easily. If the first 500 connected clients caused a memory increase of 100 MB, but the second 500 caused consumed 200 MB you have a problem. Obviously this applies to CPU, bandwidth and latency. And don't forget to observe DB latency as you crank up both the number of clients and the amount of data already in the DB. You may have forgotten an index.

But you may still not have enough insight, even given all this. The next step could be what I call a light-weight replay client or a wedge-client. The idea is to instrument the headless client, or graphical client and record the parameters being passed into key functions like message send, or web service calls. You are inserting a wedge between the game code and the message passing code. The real client can then be used to create a log of all the interesting data that is needed to stress the server. You would then create a replay client that uses only the lower level libraries. It would read the logs, passing the recorded parameters into a generic function that reproduces the message traffic and web requests. It doesn't have to understand what it is doing. The next step is to replace the values of key parameters to simulate a variety of players. You could use random player ids, or spend some more time having the replay client understand the sequences of logs and server responses. E.g. it could copy a session ID from a server response into all further requests.

Since you are wedging into existing source code, this approach is way easier than doing a network level recording and playback. That would require writing packet parsing code, and creating a state machine to try to simulate what the real client was doing. Very messy.

You might still not be able to replay enough load. Perhaps you don't have enough front end boxes purchased yet, but you want to stress your core server. The DB or event processing system. We use a JMS bus (it is great for publish/subscribe semantics that allows for loose coupling between components) to tie most things together on the back end. We built a record/replay system that pulls apart the JMS messages and does parameters replacement much like the wedge client described above. It is pretty simple to simulate thousands of players banging away. Not every client event results in a back end event that affects the DB.

So what we are planning on doing is:
a) build a mini-cluster with just a few front end boxes
b) use QA's regression test cases to drive them to their knees looking for bad marginal resource usage
c) use wedge recordings and replay if needed for even more load on the front end boxes
d) use the JMS message replay system to drive the event system and DB to its knees, also looking for bad marginal usage.
e) do some shady arithmetic to convince ourselves that the simulated client count that resulted in X% utilization of our test cluster will allow us to get to our target client count in the remaining 100-X% utilization available and the new hardware we plan to have in production.

Thursday, October 28, 2010

Use of public/private keys to avoid a race condition

I inherited an interesting technique when I took over the server of SHSO. The use of signatures to avoid a race condition that can occur when a player is handed off between hosts. The use case is: the client requests that the Matchmaker connect them to a game "room" server that has a new game session starting. That room needs correct information about the player, what level they are, what they own, etc. How do you get it to the room before the client connects to that room and the data is needed? A couple of approaches:

  1. You don't. You just leave the client connected but in limbo until their data arrives through the back end asynchronously. This is a pretty normal approach. Sometimes the data arrives before the client makes the connection. So you have to cover both of those cases.
  2. You don't. You leave the client in limbo while the room fetches the data through the back end synchronously. This is also pretty normal, but blocks the thread the room is running on, which can suck, especially if other rooms on the same host and process also get blocked. Yes, you could multithread everything, but that is not easy (see my manifesto of multithreading!). Or you could create a little state machine that tries to remember what kind of limbo the client is in: just-connected, connected-with-data-fetch-in-progress, etc. Personally, I don't allow the processes that run the game logic to open a connection directly to the DB and do blocking queries. DB performance is pretty erratic in practice, and that makes for uneven user experience.
  3. Or have the data arrive *with* the connection. From the client. Interesting idea. But can you trust the client? That is where signed data comes in.

A quick review of cryptography. Yes, I am totally oversimplifying it, and skipping a lot of the interesting bits, and optimization and stuff. However, it is fun to talk about...

A public/private key works by taking advantage of mathematics that is easy to compute in one direction, but really hard to compute in the other direction. The most common is factoring very large integers that are the product of two very large prime numbers. There is only one way to factor the product, but you have to compute and try pretty much every prime number up to the square root of the product, and that can take a looong time.

A public key can be used to encrypt plain text, and the private key is the only thing that can be used to unencrypt it. That means only the owner of the private key can read the plain text (including any else that had access to the public key).

On the other hand, a signature is created by *unencrypting* the plain text using the private key. The public key can then be used to *encrypt* the signature and test if the result equals the plain text again, thereby verifying the signature, and proving that the signature and plain text came from the owner of the private key exactly as they sent it.

Back to the story...the player data is signed using the private key by the Matchmaker and delivered to the client when the Matchmaker directs the player to the correct room. The client cannot tamper with their data without getting caught. The client then sends the data to the game room with the signature when it connects. The room server checks the signature using the public key, and can tell that the data is unmodified and came indirectly from the Matchmaker.

Why not just encrypt the data? The room server could have been set up to be able to unencrypt. Answer:  The client wants to read and use its player's data. It wouldn't be able to do that if it were encrypted. And sending it twice (plain, and encrypted) is a waste.

One interesting thing to note is that the client never saw the public, nor the private key.

OK. I know it seems pretty wasteful to send all this data down to the client, just to have it sent back up to a different host in the server. After all, we are talking about bandwidth in and out of the data center, and more importantly, bandwidth in and out of the client's home ISP connection. Only half bad. The client needed the data anyway. It is not the greatest approach, but it is what we have. As Patton is quoted: A good solution applied with vigor now is better than a perfect solution applied ten minutes later.

BTW, another reason this was built this way was that originally the system couldn't predict which room the player would wind up in, so the client needed to bring their data with them.

And that is a segue to the topic of scaling across multiple data centers. It might start to make sense of that extra bandwidth question.

Monday, October 25, 2010

Architect it for Horizontal DB Scalability

The performance of the database in an MMO tends to be the most common limiting factor in determining how many simultaneous players an individual Cluster can support. Beyond that number, the common approach is to start creating exact copies of the Cluster, and call them Shards or Realms. The big complaint about sharding is that two friends may not be able to play together if their stuff happens to be on different shards. Behind the scenes, what is going on is that their stuff is in separate database instances. Each shard only accesses one DB instance because the DB engine can only handle so much load.

There are a couple of approaches that can almost completely get rid of these limits, both of which depend on creating many DB instances, and routing requests to the right instance


The first is an "automated" version of a character transfer. When a player logs in, they are assigned to an arbitrary cluster of machines, and all their stuff is transferred to the DB for that cluster. The player has no idea which cluster they were connected to and they don't have names. There are a couple of problems with this:
1) you can't do this if you want the state of your world around the player to be dynamic and persist; the stuff you throw on the ground; whether that bridge or city has been burned. A player would be confused if each time they logged in, the dynamic state of the world was different. This isn't all that common these days, however. Interesting problem, though.
2) the player might not be able to find their friends. "Hey I'm beside the fountain that looks like a banana-slug. Yeah, me too. Well, I can't see you!"
You might be able to deal with this by automatically reconnecting and transferring the friends to the same cluster and DB, but that gets tricky. In the worst case, you might wind up transferring *everyone* to the same cluster if they are all friendly.

Another approach provides horizontal scalability and is one that doesn't assume anything about how you shard your world, do dungeon instancing, what DB engine you use, or many other complications. That is a nice property, and makes the DB system loosely coupled, and useful across a broad spectrum of large scale persistent online games.

What dynamic data are you persisting anyway? The stuff you want to come back after a power-failure. Most games these days only care about the state of the player, and their stuff. They may have multiple characters, lots of data properties, inventory of items, skills learned, quests in progress, friend relationships, ... If you sort through all your data, you'll find that you have "tuning" or design data that is pretty much static between releases. And you have data that is "owned" by a player.

To a DB programmer, that fact means that the bulk of the data can be indexed by the player_id for at least part of its primary key. So here is the obvious trick:
Put all data that belongs to a given player into a single DB. It doesn't matter which DB or how many there are. You have thousands or millions of players. Spreading them horizontally across a large number of DB instances is trivial. You could use modulus (even player_ids to the left, odd to the right). Better would be to put the first 100,000 players in the first DB, then the next 100,000 into a second. As your game gets more successful, you add new DB instances.

A couple of simple problems to solve:
1) you have to find the right DB. Given that any interaction a player has is with a game server (not the DB directly), your server code can compute which DB to consult based on the player_id it is currently processing. Once it decides, it will use an existing connection to the correct DB. (It is hard to imagine a situation where you would need to maintain connections to 100 DB instances, but that isn't really a very large number of file descriptors in any case.)
2) If players interact and exchange items, or perform some sort of "transaction", you have to co-persist both sides of the transaction, and the two players might be on different DB instances. It is easy to solve the transactional exchange of items using an escrow system. A third party "manager" takes ownership of the articles in the trade from both parties. Only when that step is complete, will the escrow object give the articles back to the other parties. The escrow object is persisted as necessary, and can pick up the transaction after a failure. The performance of this system is not great. But this kind of interaction should be rare. You could do a lot of this sort of trade through an auction house, or in-game email where ownership of an item is removed from a player and their DB and transferred to a whole different system.
3) High-speed exchange of stuff like hit points, or buffs, doesn't seem like the kind of thing players would care about if there was a catastrophic server failure. They care about whether they still have the sword-of-uberness, but not whether they are at full health after a server restart.

Some people might consider functional-decomposition to get better DB performance. E.g. split the DB's by their function: eCommerce, inventory, player state, quest state, ... But that only gets you maybe 10 or 12 instances. And the inventory DB will have half the load, making 90% of the rest of the hardware a waste of money. On the other hand, splitting the DB with data-decomposition (across player_id), you get great parallelism, and scale up the cost of your hardware linearly to the number of players that are playing. And paying.

Another cool thing about this approach is that you don't have to use expensive DB tech, nor expensive DB hardware. You don't have to do fancy master-master replication. Make use of the application knowledge that player to player interaction is relatively rare, so you don't need transactionality on every request. Avoid that hard problem. It costs money and time to solve.

There is a phrase I heard from a great mentor thirty years ago: "embarrassingly parallel". You have an incredible number of players attached, and most actions that need to be persisted are entirely independent. Embarrassing, isn't it?

Now your only problem is how big to make the rest of the cluster around this monster DB. Where is the next bottleneck? I'll venture to say it is the game design. How many players do you really want all stuffed into one back alley or ale house? And how much content can your team produce? If you admit that you have a fixed volume of content, and a maximum playable density of players, what then?

Thursday, October 21, 2010

A lightweight MMO using web tech

I've been really head's down on Super Hero Squad Online. It has been getting really great reviews, and is very fun to play. Beta is coming up, and we are expecting to go live early next year.

So I thought I'd try to cut some time free to record my thoughts about its architecture. But not today. We are getting ready for an internal release tomorrow.

But here are some topics I'd like to pursue:
  • scaling across multiple data centers
  • horizontal DB scalability
  • an approach that avoids back end race conditions when handing off a client between front end boxes
  • big load testing