Thursday, June 2, 2011

Techniques for Handling Cheating (Part 1)

Cheating is fun for some people. It is a game on top of your game. "Can I find a path through the maze of security mechanisms you have laid in my path?"

First, why does a developer, care about cheating in online games?
  • They spent a lot of effort making content so they want to make sure players experience it instead of skipping over it and "stealing" the reward. The idea being that the players will have more fun facing the challenges and beating them. They'll appreciate it more if they have to work for it. Maybe. Some people are weird, and get a sense of appreciation out of working through the cheats.
  • Cheating can directly interfere with other player's enjoyment of the content. E.g. griefing, stealing their stuff,...
  • The perception of unfairness (everyone else has all the goodies, and you don't; you can't win PvP without also cheating; ...). Players can get frustrated by this and leave, and the developer loses money.
  • It can interfere with the operation of the servers, and that interferes with other players' enjoyment of the game.
  • Cheaters can actually steal something of value. If they sell it (e.g. gold farming), that can affect in game economy, or more directly, affect the profitability of the company.
If players cheat and no one else notices but them, you probably don't care, let them have their fun. But if they cheat and stop paying you money it matters even if they don't bother anyone else. That might happen if they get bored because they've maxed out their account easily, or they get everything they need without having a subscription (e.g. with free account).

The interaction between cheaters and developers has been called an arms race. And there are a lot more players than developers. Developers can't really hope to keep up and close every possible issue. So at some point it becomes a cost benefit thing. There will always be some cheating. You'll want to hit the big ones, and pick your battles.

There are a number of aspects to consider:
  • Detection: what is a cheat? Maybe it is gaining XP or loot too quickly. Test for this on the fly by adding logic to the game server? Run metrics queries against the DB or event logs periodically?
  • Reporting: put something in the server logs; send an alert email; weekly report out of the metrics system?
  • Mitigation: take away what they gained? ban them (and lose their subscription money)? Reimburse other players that have been harmed?
  • Prevention: do your best to secure the attack points of your system; check all client requests for sanity; do summary level real time rate limiting (detects your own bugs cheaters might exploit, speed hacks, bots/farming, aim-bots...); don't trust the client
Because this is an arms race, the enemy will find the edges of your detection and prevention system. E.g. they will fake a head shot just often enough not to get caught; they will farm gold just below the detection rate; ... So what you as a developer need to do is decide what rate of cheating is acceptable, and meets the goals of not letting cheaters ruin the fun of your game, or make you broke. Some titles have capped progress per day.

I think one of best mitigation strategies is public shaming. It leaves cheaters thinking that "everyone" is watching them, and it lets non-cheaters see that you as a developer are paying attention. You can let players report on other players. Ban the egregious cheaters, especially if they are greifing other players. Of course, they will be back with a different email address if their goal in life is to cause trouble. But this is a slippery slope susceptible to gaming as well. If you provide a means for the community to use social pressure against perceived cheaters, it can also be exploited by cheaters for greifing. E.g. if you show the community the number of reports against a player, you might think it would highlight those that should be avoided. But some might consider it a badge of honor (among thieves), or worse will use it for extortion against unempowered innocents.

You will want some form of "ignore", however, that each player can apply to those they consider a cheater. It could be used to make sure a player never gets matched into a dungeon instance or PvP match with someone, or have to listen to their obnoxious chat. Ideally, it would stop them from interacting with your character at all, and make them invisible. Just imagine being in kindergarten, and all the other kids ignored you. You aren't kicking them out of the game, but almost. Again, this might be exploited. What if someone ignored every player that was better than them at PvP. It would artificially inflate their win rating, and your leaderboards would be unfair.

But let's talk about the technical aspects of cheat prevention. (Let's ignore server intrusion problems.) Ultimately, the way a player manipulates the system is through the messages their client sends to the server. If your client is bug free, and has not been tampered with, all is well. The messages are a result of a human operating the UI as the designers intended. The difference between two players is their skill and knowledge of the game. But how can the server be sure all is well. It can only look at the messages and try to differentiate between an untampered client and one that is tampered with or replaced with a script.

I'll post this and come back later with a discussion of different kinds of attacks and ways to deal with them.

Sunday, May 15, 2011

Super hero Squad (our latest title) is now live

Things have been quiet here because all my attention was focused on Super Hero Squad (www.heroup.com). It is a Marvel title developed at The Amazing Society in Seattle, a studio of Gazillion. It is a light weight MMO, uses the Unity graphics engine, Smartfox, Apache, some Java apps on the back end, and MySQL. It is shardless, and the architecture scales horizontally with the number of concurrent players, including the database. The back end components are loosely coupled based on JMS publish/subscribe.

It has definitely been a fun project, and I'm working with a team with lots of deep experience. Load is ramping up, but not yet near the load tests we ran ahead of time. So I'm paying attention, but not anxious about it.

Along the way, we found ways to ship early and still have a fun and stable game. But as with all MMO's that actually launch, there is a lot of work left to do when you are "done". The context switch is challenging right now to go from: "we have to ship; we are not going to do that", to "remember those things we cut to simplify things; its time to put them back on the table". Now we have the fun of changing things without breaking a running service. And monitoring and fixing the service cuts into development. So things slow down at the same time they get more reactionary.

Sunday, February 27, 2011

Running branches for continuous publishing

I am a very strong proponent of what are called running branches for development of software, and for the stabilization and publication of online games. One of the more important features of large scale online games is that they live a long time, and have new content, bug fixes and new features added over time. It is very difficult to manage that much change with a relatively large amount of code and content. And since you continue to develop more after any release, you will want your developers to be able to continue working on the next release while the current one is still baking in QA, and rolling toward production.

I will skip the obvious first step of making the argument that version control systems (aka source code change control, revision control) are a good idea. I like Perforce. It has some nice performance advantages over Subversion for large projects, and has recently incorporated ease of use features like shelving and sandbox development. I like to call the main line of development mainline. I also like to talk about the process of cutting a release and deploying it into production as a "train". It makes you think about a long slow moving object that is really hard to stop, and really difficult to add things to and practically impossible to pull out and pass. And if you get in the way, it will run you down, and someone will lose a leg. Plus it helps with my analogy of mainline and branch lines.

So imagine you are preparing your first release. You make a build called Release Candidate 1 (RC1), and hand it off to QA. You don't want your developers to go idle, so you have two choices, they can pitch in on testing, or they can start working on release 2. You will probably do a bit of each, especially early in the release cycle, since you often dig up some obvious bugs, and can keep all your developers busy fixing those. But at some point they will start peeling off and need something to do. So you sic them on Release 2 features, and they start checking in code.

Then you find a bug. A real showstopper. It takes a day to find and fix. Then you do another build and you have RC1.1. But you don't want any code from Release 2 that has been being checked in for several days. It has new features you don't want to release, and has probably introduced bugs of its own. So you want to use your change control system to make a branch. And this is where the philosophy starts. You either make a new branch for every release, or you make a single Release Candidate branch and for each release, branch on top of it.

Being prepared ahead of time for branching can really save you time, and confusion, especially during the high stress periods of pushing a release, or making a hotfix to production. So I'm really allergic to retroactive branching, where you only make a branch if you find a bug and have to go back a patch something.

Here's why: the build system has to understand where this code is coming from, or you will be doing a lot manual changes right when things are the most stressed. If you have already decided to make branches, you will also have your build system prepared and tested to know how to build off the branch. You will also have solved little problems like how to name versions, prepare unambiguous version strings so you can track back from a build to the source it came from, and many more little surprises.

The build system is another reason why I prefer running branches as opposed to a new branch per release. You don't have to change any build configuration when a new release comes along. The code for RC2 is going to be in exactly the same place as RC1. You just hit the build button. That kind of automation and repeatability is key to avoiding "little" mistakes. Like accidentally shipping the DB schema from last release, or wasting time testing the old level up mechanism, or missing the new mission descriptions.

And then there is the aesthetic reason. If you cut a branch for every release, your source control depot is going to start looking pretty ugly. You are planning on continuous release, right? Every month. After 5 years that would be 60 complete copies of the source tree. Why not just 2: ML and RC (and maybe LIVE, but let's save that for another time).

Finally, as a developer, if you are lucky enough to be the one making the hotfix, you will want to get a copy of the branch onto your machine. Do you really want another full copy for each release that comes along? Or do you just want to do an update to the one RC branch you've prepared ahead of time? It sure makes it easier to switch back and forth.

An aside about labels: You might argue you could label the code than went into a particular build, and that is a good thing. But one problem with labels that has always made me very nervous is that labels themselves are not change controlled. Someone might move a label to a different version of a file, or accidentally delete it or reuse it, and then you would lose all record of what actually went into a build. You can't do that with a branch. And if you tried, you would at least have the change control records to undo it.

One more minor thought: if you want to compare all the stuff that changed between RC1 and RC2, it is much easier to do in a running branch. You simply look at the file history on the RC branch and see what new stuff came in. To do that when using a branch per release requires a custom diff each time you want to know: e.g. drag a file from one branch onto the same file on the other. Pretty clumsy.

Also note that these arguments don't apply as well for a product that has multiple versions shipped and in the wild simultaneously. An online game pretty universally replaces the previous version with the new one at some point in time. The concurrency of their existence is only during the release process.

Summary:
  • You want to branch so you can stabilize without stopping ongoing work for the next release
  • You want a branch so you are ready to make hot fixes
  • You want a running branch so your build system doesn't have to get all fancy, and so your repo looks simpler.


I may revisit the topic of branching in the form of sandbox development which is useful for research projects and sharing between developers without polluting the mainline.

Sunday, January 16, 2011

Topics are not Message Types

I periodically have an unproductive conversation about how to use Topics/Categories vs how to use Message Types. Hopefully this time will be better.

Both things appear to be used to "subscribe", and both wind up filtering what a message handler has to process and gets to process. If they can be used for exactly the same purposes, it is "just" policy as to what you use each one for. That has to be wrong, otherwise there would not be *two* concepts. Tus, there has to be a useful distinction. So let's define what they are and what their responsibilities are.

First a definition or two:
  • Hierarchical: a name is defined hierarchically if the parent context is needed to ensure the child is distinct from children of other parents when the children have the same name. The parents provide the namespace in which the child is defined.
  • Orthogonal: names are independent of one another, like dimensions or axes in mathematics.

Categories are names (or numbers) that are used to decompose a stream of messages into groups. In JMS they are called Topics, but I'm going to avoid that term in case the implementation of Topics in JMS implies something I don't mean. A message is sent on, or "to" a single Category. A consumer subscribes to one or more Categories. Sophisticated message publish/subscribe or producer/consumer implementations can support wildcards or bitmasking to optimize subscription to large sets of Categories. (While not very germane to this discussion, I believe JMS can only have wildcards at the end of a Topic, and only at a dot that separates portions of the Topic. My view of wildcards and Category masking does not have that limitation. But that shouldn't affect my arguments.)

It is critical to have a mechanism that efficiently filters network messages so that a consuming process is not "bothered" by messages arriving that are immediately discarded. Running the TCP stack, for example, can wind up consuming large fractions of the CPU, and if the message is discarded, even after a simple inspection by your message framework, that is totally wasted processing. Further, if the messages are traveling over a low bandwidth link to a player, for example, it can badly affect their experience as it steals network resources from more important traffic. So we want the sender, or some intermediary to filter the messages earlier.

Early distributed simulation implementations (DIS) used multicast groups, and relied on the Network Interface hardware to filter out any messages in groups that the consumer had not subscribed to. Ethernet Multicast tends to broadcast all messages, and rely on the NIC of each host to inspect and filter unwanted messages. That is better than having the kernel do it. Switches get into the picture, but are very simplistic when it comes to multicast. When there are more than a few groups, switches and NICs will become promiscuous, and all messages get broadcast anyway, and wind up in each destination's kernel. They are filtered there, but much of the network stack has already executed. To get around that, physical network segmentation with intelligent bridges were built to copy a message from one segment to another. The bridge or rebroadcaster or smart-router would crack open each message and send it into another segment based on configuration, or a control protocol (subscription request messages).

Ancient history. However, it formed the origin of the concept of numeric Categories. A message is sent to a single Category. A consumer subscribes. The Channel/Category/Subscription manager maintains the declared connectivity and routes the messages.

So. Categories are used to optimize routing. They minimize the arrival of a message to a process. So far, this has nothing to do with what code is run when it arrives.

Message types are also names but are used to identify the meaning of a message; what the message is telling or requesting of the destination; what code should run when the message arrives (or what code should not run). Without a message type, there would be only one generic handler. In the old days, that master-handler would be a switch statement, branching on some field(s) of the message (lets call that field the message type, and be done with it).

There is some coded, static binding of a message type to a piece of code; the message handler. Handler X is for handling messages of type Y. A piece of code cannot process fields of a message different than what it was coded for. There is little reason to make that binding dynamic or data-driven. Static binding is "good". It leads to fewer errors, and those error can be caught much earlier in the development cycle. Distributed systems are hard. You don't really want to catch a message-to-code mismatch after you've launched. One way to think about this static binding is as a Remote Procedure Call. You are telling a remote process to run the code bound to message Y. In fact, you can simplify your life by making the handler have the same name as the message type, and not even register the binding.

A message can be sent to any Category regardless of the message's type. There is no checking in code that a choice is "legal". The Category can be computed, and the message is bound to that value dynamically. Instances of the same message type can be sent to one of any number of Categories. Consumers can subscribe to any Category whether they know how to process all the message types it contains or not.

So. Back to the distinction. When code is declared to be able to handle messages of type Y, that does not imply that all message instances of type Y should arrive at the process with that handler. You may want to do something like load balancing where half the messages of type Y go to one process, and the other half go to a tandem process. So message types are independent of Categories. The two concepts are orthogonal.

When a process is subscribed to a Category, there is no guarantee to the subscriber about the message types that a producer sends to that Category. It is easy to imagine a process receiving messages it does not know how to handle. The sender can't force the receiver to write code, but the sender can put any Category on a message it wants. So Categories are independent of message types. The two concepts are orthogonal.

Now. With respect to hierarchy. Message type names can be declared within a hierarchical namespace. That can be pretty useful. At the end of the day, however, they are simply some strings, or bit strings. In a sophisticated system that maps message types to message classes (code), the class hierarchy may mirror the type name hierarchy, and have interesting semantics (like a handler for a base message class being able to handle a derived message class). But mostly, message type name hierarchy is useful to avoid collisions.

In systems like JMS, Categories (Topics) are also hierarchical. This is also done to avoid collisions in the topic namespace, and for organization. But it is also useful for wildcard subscription.

Now "the" question: are Categories within the Message Type Hierarchy, or are Message Types within the Category hierarchy? Or are they orthogonal to one another? I submit that a message of a given type means the same thing no matter which Category it arrived on. Further, the same message type can be sent to any Category and a Category can transport any number of different message types.

Since there is only one message exchange system, Categories cannot be reused for two purposes without merging the message streams. That leads to inefficiency. If you reuse a message type name for two different purposes, you run the risk of breaking handler code with what appears to be a malformed message. That leads to crashes. You could permit that kind of reuse, and institute policy and testing to keep those things from mingling (e.g. reuse message types, but only on different topics), but it is a looming disaster. I would put in some coordination mechanism or name spacing to keep the mingling from happening at all.

So what are the consequences:
  • There is no need to include Category when registering a message handler. 
  • Category subscription occurs separately from handler-to-message-type mapping, and affects the entire process.
  • There is no need to build a message dispatcher that looks at Categories.
Well. That was pretty long winded. For those of you still here, I have an analogy. I haven't thought it through a lot, but it looks like it fits (although it is about a pull system, not a push system). URLs. The hostname and domain name represent a hierarchical Category or Topic. The path portion is the message type and identifies the handler (web service), and is also hierarchical. You can host your web site on any host on any domain, and the functionality would be the same. You can host any web site on your host. You can host any number of web sites on your host, provided the paths don't collide. If they do collide, you are going to get strange behavior as links refer to the wrong services, or pass the wrong parameters. One would need more hierarchy. Or you don't host the colliding web sites together. You put them on different addresses. But the service code doesn't care what address you choose.

Unless you talk about virtual hosts, or virtual processes, multiple independent connections to the message system, thread-local subscriptions. You can do *anything* in software. But should you?