Wednesday, August 15, 2012

Time Management and Synchronization

A key principle of online games is synchronization. Keeping the various players in synch means giving them practically identical and consistent experiences of the virtual world. The reason this is hard is that they are separated by network delays, and worse, those delays are variable.

The ideal situation would be that all players saw exactly the same thing, and that there were no delays. How do we approach that ideal? Consider a use case: there is an observer that is watching two other players (A and B) racing toward a finish line. The observer should see the correct player cross first, and see that happen at the correct time delay from the beginning of the race. To make that sentence make sense, we need a consistent notion of time. I call this Virtual Time, and my views are drawn from computer science theory of the same name, and from the world of Distributed Interactive Simulation.

Let's say the start of the race is time zero. Both racers' clients make the start signal go green at time zero, and they begin accelerating. They send position updates to each other and the observer throughout the race, and try to render the other's position "accurately". What would that look like? Without any adjustment, on A's screen, he would start moving forward. After some network latency, B would start moving forward, and remain behind A all the way to the finish line. But on B's screen, B would start moving forward. After some network latency, B would start receiving updates from A, and would render him back at the start line, then moving forward, always behind. Who won? The Observer would see A's and B's updates arrive at approximately the same time, and pass the finish line at approximately the same time (assuming they accelerate nearly equally). Three different experiences. Unacceptable.

Instead, we add time stamps to every update message. When A starts moving the first update message has a virtual time stamp of zero. Instead of just a position, the update has velocity, and maybe acceleration information as well as the time stamp. When it arrives at B, some time has passed. Let's say it is Virtual Time 5 when it arrives. B will dead reckon A using the initial position, time stamp, and velocity of that time zero update to predict A's position for time 5. B will see himself and A approximately neck and neck. Much better. The Observer would see the same thing. As would A (by dead reckoning B's position).

This approach still has latency artifacts, but they are much reduced. For example, if A veered off from a straightline acceleration to the finish line, B would not know until after a little network delay. In the mean time, B would have rendered A "incorrectly". We expect this. When B gets the post-veering update, it will use it, dead reckon A's position to B's current Virtual Time, and compute the best possible approximation of A's current position. But the previous frame, B would have rendered A as being still on a straight line course. To adjust for this unexpected turn, B will have to start correcting A's rendered position smoothly. The dead reckoned position becomes a Goal or Target position that the consumer incrementally corrects for. If the delta is very large, you may want to "pop" the position to the correct estimate and get it over with. You can use the vehicle physics limits (or a little above) to decide how rapidly to correct its position. In any case, there are lots of ways to believably correct the rendered position to the estimated position.

Note that during the start of the race, at first B would still see A sitting still for a period of time, since it won't get the first update until after a network delay. But after the first delay, it will see that the estimated position should be up near where B has already moved itself. The correction mechanism will quickly match A's rendered position to the dead reckoned one. This may be perceived as impossibly large acceleration for that vehicle. But incorrect acceleration is less easily detected by a person than incorrect position, or teleporting.

Another use case is if A and B are firing at each other. Imagine that B is moving across in front of A, and A fires a missile right when B appears directly in A's path. Let's assume that the client that  owns the target always determines if a missile hits. If we were not using dead reckoning, by the time the missile update arrived back at B, B would have been 2 network delays off to the side, and the missile would surely miss. If we are using dead reckoning, A would be firing when B's estimated position was on target. There might be some discrepancy if B was veering from the predictable path. But there is still one more network latency to be concerned about. By the time the firing message arrives at B, B would be one network latency off to the side. B could apply some dead reckoning for the missile, flying it out a few Virtual Time ticks, but the angle from the firing point might be unacceptable, and would definitely not be the same as what A experienced.

A better, more synchronized, more fair experience would be to delay the firing of the missile on A, while sending the firing message immediately. In other words, when A presses the fire button, schedule the execution of the fire event 5 Virtual Time units in the future. Send that future event to B, and to itself. On both A and B, the fire event will occur simultaneously. I call this a "warm up" animation. The mage winds up, the missile smokes a bit before it fires, or whatever. The player has to learn to account for that little delay if they want the missile to hit.

If you have a game where firing is instantaneous (a rifle, or laser), this is a pretty much impossible problem. You have to put in custom code for this. The shooter decides ahead of time where the bullet will hit, and tells the target. By the time the target is told, both shooter and target have moved, so the rendering of the bullet fly out is not in the same location in space, and will possibly go through a wall. The point of impact is also prone to cheating, of course. I don't like games that have aiming for this reason. They are very difficult to make fair and cheat proof. I prefer target based games. You select a target, then use your weapons on it. The distributed system can then determine whether it worked. Aiming, and determining hits, shooting around corners, etc. is a tough problem. FPS engines are really tough. You should seriously consider whether your game needs to bite off that problem, or if you can make it fun enough using a target based system, and a little randomness. Then add decorative effects that express what the dice said.

The key take away with Time Management, Virtual Time, and synchronization is that whichever client actions are occurring on (A, B, or Observer), they occur in the same time axis. There are nice time synchronization algorithms (look up NTP), that rely on periodically exchanging time stamps using direct messages. And you can solve a lot of problems using events that are scheduled to occur in the future based on Virtual Time timestamps. One challenge though is how to keep the simulation that is being run using Virtual Time in synch with animation and physics that you might be tempted to run using real time. Or what you think is real time. You may have a game that you can pause. How does that work? Do you freeze the animation or not? You are already dealing with different concepts of time. Consider adding Virtual Time. And the notion of Goal positions.

Tuesday, May 29, 2012

Let's Call it MLT.

I need a name for the Ultimate Online Game Framework we've been discussing so people know what I'm refering to (crazy idea, or something for that project). I want it to be the greatest thing in the world. And as everyone knows, that is a nice MLT (Mutton, Lettuce and Tomato sandwich). I was toying with calling it Quantum, because I like the thought of Quantum Entanglement to represent entity state replication. So maybe the Quantum Distributed Entity System (QDES), and later the Quantum this and that.

I'm also really stuck on what language to write it in. C++ or Java. I think it is critical that a game development team use the same language on client and server. Otherwise the team gets split in half, and winds up not talking enough. Being able to swap people onto what ever needs work is really important for keeping to a schedule. And hiring from two piles of expertise is just that much harder. Java has so many great libraries. But C++ is so much more popular in the game industry. On mobile, you've got Objective-C. If you're using Unity, you've got C#.

To be honest, most of your game logic should probably be written (at least initially) in your favorite scripting language. Java and C++ both support Python, Lua, Javascript for example. IKVM appears to allow Mono to be embedded in Java, so you could maybe even do C# on both sides. This argument (http://www.mono-project.com/Scripting_With_Mono) is certainly compelling. Ever heard of Gecko? It embeds a C++ compiler so you can compile and load/run C++ "scripts" at run time. The reason I bring up all these options is: you could implement client and server in different languages, but embed the same scripting engine in both sides, so the game developers get the benefit of reusing their staff on both sides.

If the architecture and design of QDES is awesome enough, someone might reimplement it in their favorite language anyway. Or port it to a very different platform that doesn't support whatever language choice we make. So maybe the choice I'm stuck on isn't that serious right now. Maybe I should just start with the architecture and design.

I'm drafting some high level requirements and am thinking I'll post them here on a "page" instead of a blog article. Not sure how comments and such would work with that however. But I wanted something stickier than a post and something I could update and have a single reference to. Same with the designs as they start to come.

Anyway. Thanks for being a sounding board. Maybe some of you and your friends will have time to contribute to (or try out) all this. (And I haven't completely forgotten about doing that tech survey to see what other similar projects are out there in the open source world. I found a few. Some are a little dusty. Some are a little off track from what I'm thinking.)

Friday, May 25, 2012

Essential Components of the New Framework

What are the components that are absolutely essential to get right in the Ultimate MMO Framework (still need a catchy name)? If had those components today, you'd be using them in your current game, even if you had to integrate them to a bunch of other stuff you have to get a full solution. If we could build those essential pieces, and they were free, I think they would slowly take over the industry, shouldering less effective solutions out of the way. If each piece can be made independent of the others, there is a better chance the system will be adopted even if someone thinks it isn't all a perfect fit.

It is tempting to start with message passing. After all, it isn't an online game without that. But there are a lot of great message passing systems. There might even be some decent event handling systems. And there are definitely some good socket servers that combine the two. While I see problems with these that could be improved, and I see missing features, I think message passing is a second priority. You could make do with an existing system, and swap it out later. Although, I don't know how you can get away from the assumption that you have a publish/subscribe api that the higher levels can use.

More essential is the Entity System. It is where game play programmers interact with your system, and when done well, much of the rest of the system gets abstracted away. The decisions made here are what enable a server to scale, a client to stay in synch, content to get reused in interesting ways, and game play to be developed efficiently. BTW, what I mean by an Entity is a game object. Something that represents a thing or concept in the game world. The term comes from discrete event simulation. Jumping ahead a bit, the Entity System needs to be Component oriented such that an Entity is aggregated from a collection of Components. The Entity State system is the basis of replication to support distributed computation and load balancing, and of persistence. Done well, the Entity System can be used for more than just visible game objects, but could support administrative and configuration objects as well.

Related, but probably separatable is the Behavior system. How do get Entities to do something, and how do they interact with one another? I don't mean AI, I mean behavior in the OO encapsulated-state-and-behavior sense. It will be interesting to distinguish between and tie together AI "plans", complex sequences of actions, and individual behavior "methods". And, of course, the question of languages, scripting and debugging land right here. (A second priority that relates to Behaviors is time management. How do you synchronize client and server execution, can you pause, can you persist an in-flight behavior?)

Content Tools are the next big ticket item. If the Framework is stable enough, a new game could be made almost exclusively using these tools. Close to 100% of your budget would be spent pushing content through them in some imaginary perfect world. These tools allow for rapid iteration and debugging across multiple machines, and multiple platforms. It is not clear how much of the tools can be made independent of Entity state and behavior choices.

What other systems are absolutely essential? They drive everything else? They don't already exist, and you feel like you always have to rebuild them?

What do you think? Wouldn't it be cool to have a standalone Entity System that you could use in your next game? What if it came with a collection of Components that solved a lot of major problems like movement, collision, visibility, containment?

Saturday, May 19, 2012

Why I don't like threads

People say I'm crazy because I don't like threads in this day and age of multicore processors. But I have good reason based on years of experience, so hear me out.

Doing threading well is harder than most people think. Making a decision to multithread your application imposes on all developers, including those less experienced with such things than the ones normally making that decision. I like to think that framework and server developers are much more experienced with such things than game developers who might be plugging stuff into the server. And framework developers are building their framework to be used by those game play programmers. Play to your audience. Wouldn't it be better if they never had to know about threading issues? I believe it is possible to build a framework that insulates regular developers from such concerns (the communicating sequential process model).

Now I'm a big fan of using threading for what it is good for: dealing with interrupts that should be serviced asap without polluting your main application with lots of polling and checking. Examples of this would be a background thread for servicing the network (you want to drain the network buffers quickly so they don't overflow and back things up and cause retransmissions); background file or keyboard I/O which needs to rarely wake up and service an incoming or outgoing IO buffer; remote requests that block for a long time and would otherwise stall the app (like a DB request, or http request). Note in particular that none of these are high performance computations. They are really dominated by the blocking/waiting time. The use of a thread in this case is really all about an easier programming model. The background thread can be written as a loop that reads or writes using blocking, and since it is not done on the main thread, the main thread doesn't have checks and polling sprinkled in.

Most of the time, when you have some heavy computation to do, it will eventually scale up to require more than a single machine anyway. So you are going to have to write your system to be distributed and communicate between the machines using messages anyway. If you have already done that work, you can easily use it within a single machine that has many cores. If you try to build a system that makes use of the many cores within a machine by using threads, and you also solve the distributed case, you've doubled your work and maintenance effort. One of the best ways to decompose a problem to be solved by worker threads is to deliver the work in a task queue and have them contend for it. As each is pulled off, it is processed by a handler function. This is exactly the same approach you would use for the distributed case. So the only difference is that in one case you have messages passing between processes on the same machine, or task-messages passing between threads. Yes, I understand the performance difference. But if your app is that performance sensitive, the inter-process message passing can be implemented using shared memory and avoid the kernel switches when delivering a message to the current machine. The intent here is to save you from the double implementation, save your framework users from having to deal with thread programming, and the performance difference is pretty small.

There is also a big problem with heavily threaded apps in production. There are pretty lousy tools for helping you figure out thread related performance problems. When you are only dealing with background interrupt-handling threads, there are not going to be serious performance problems unless one of the background threads starts polling wildly and consuming 100% cpu. But if a highly multithreaded app starts using too much CPU, or starts being unresponsive, how do you tell what is actually happening among the various threads? They don't have names, and the kernel isn't very good about helping you keep track of what work happens on each thread. You wind up having to build your own instrumentation into the application every time there is such a problem. And doing a new build and getting it into production is a lot of effort. On the other hand, if you follow the distributed model, you can easily see which process is spiking CPU. You can easily instrument the message flows between processes to see if there is too much or too little inter-process communication. Often you wind up logging all such traffic for post-mortem analysis anyway. Remember, you are not likely to have the luxury of attaching a debugger to a production process to grab stack traces, or what not. So you are going to be staring at a monolithic multithreaded app and trying to guess what is going on inside.

Problems in threaded apps tend to be subtle, they wind up being hard to debug, and often only show up after the app has been running at production loads for quite a while. Writing good threaded software is the responsibility of every programmer that touches an app that adopts it, and the least experienced programmer in that app is the one you have to worry about. Operating and debugging the app in production is not easy. These are the reasons I don't like threads. I'm not afraid of them. I understand them all too well. I think there are better ways (CSP) that are actually easier and faster to develop for in the first place. And you are likely to have to adopt those ways in any case as you scale beyond a single machine.

More thoughts on this subject here: http://onlinegametechniques.blogspot.com/2009/02/manifesto-of-multithreading-for-high.html

(Any position statement like this is going to sound a little nuts if you try to apply to every conceivable situation. What I was thinking about when I wrote this was a server application, in particular, something large scale, and event oriented. If, for example, you are writing a graphical client on a 360, using multiple processes would be looney. Multiple processes listening on the same socket can be a problem (with some workarounds). You might not be able to abide even a shared memory message passing delay between components, like in a rendering pipeline. Interestingly, these examples are all amenable to the same analysis: what is the response time required, what resources are being shared, how much will the computation have to scale up, is the physical hardware intrinsically distributed anyway? My point is the default answer should be to encapsulate any threading you *have to* do so that the bulk of your development doesn't have to pay the daily overhead of always asking: is that line of code accessing anything shared; is this data structure thread safe? It slows down important conversations, and it leaves shadows of a doubt everywhere.)

Tuesday, May 15, 2012

Ultimate Open Source Game Development System

I'm between jobs again. It happens a lot in this industry. One of the frustrating things about that is you leave behind investments you've made in building infrastructure that was supposed to save you effort on future projects. If the company you're leaving fails, that infrastructure investment won't help anyone. So what do you do? Just build another one for the next company? Rinse, repeat. Buy something? Can't influence that much. Build a company that sells infrastructure? Pretty hard to make a profit selling to us picky developers. Open source? Let's think about that...

If I was to build an open source game development system, what would I focus on? I'm not a graphics whiz, but I know about simulation, OO, distributed systems, MMOs, and picky developers. Let's see here...
  • Pick a primary development language. But realize that not all developers will love it. Is there a way to support multiple languages?
  • Rapid iteration is key, so game logic must be able to be written in a scripting language. But it must also be possible to hard code parts of it for performance.
  • Tools are key. In a good game development project the majority of the effort of the team should feed through the content tools, not the compiler. Wouldn't it be ideal if you could build a great game without any programmers?
  • It must be debuggable. In a distributed system, this requires some thought.
  • The world size must be able to scale. A lot of projects are bending their game design to avoid this problem, and that is definitely the least expensive approach. But if your infrastructure supports large scale on day one, what could you do?
  • You want reuse of game logic, realizing different developers/designers have different skills. This means you want a hierachy of game elements that are developed by appropriate experts, and snapped together by others. This game object and level design effort should be efficient and fun.
  • The team size must be able to scale; both up and down. This is a matter of content management. You don't want central locked files everyone contends for, and you don't want burdensome processes if you are a small team.
  • It should be runtime efficient in space and time. This enables use on games that have huge numbers of game object instances.
  • It is easy to ask for dynamic object definition, but that can work against performance. How often is that used? And what other ways are there realize that effect?
  • The framework should be intuitive to as many people as possible. This means being very careful about terminology.
  • Now we enter an area I don't know much about. How do you structure the infrastructure development project itself? You want buy in from lots of people, but you also need a single vision so the result is consistent. You need a means to adjust the vision without blowing up the whole effort, and without encouraging people to fork the effort.
  • What will be the relationship to other open source projects? We can choose to use various utility libraries. But what about using other game projects for graphics? Issues will arise at the boundaries.
  • The system should be useful and get some adoption even without being finished. Because it will never be finished.
  • It should be modular enough. That allows a developer to use the parts they want, and replace the parts they must. It allows broken parts to be replaced.
  • We will need a demo game. How ambitious should it be?
  • We need a name. And a vision statement.
Is this kind of thing really needed by our industry? Why is so much infrastruture rewritten all the time? Is it just the projects I've hit? Do we always push the envelope such that if this existed today we'd spend our entire engineering budget extending it? Or is it a fundamental truth that there can be no Ultimate infrastructure because every game is too different? Lost cause, or an idea whose time has come.

I think the first thing to do is a survey the current state of open source game systems.

Thursday, June 2, 2011

Techniques for Handling Cheating (Part 1)

Cheating is fun for some people. It is a game on top of your game. "Can I find a path through the maze of security mechanisms you have laid in my path?"

First, why does a developer, care about cheating in online games?
  • They spent a lot of effort making content so they want to make sure players experience it instead of skipping over it and "stealing" the reward. The idea being that the players will have more fun facing the challenges and beating them. They'll appreciate it more if they have to work for it. Maybe. Some people are weird, and get a sense of appreciation out of working through the cheats.
  • Cheating can directly interfere with other player's enjoyment of the content. E.g. griefing, stealing their stuff,...
  • The perception of unfairness (everyone else has all the goodies, and you don't; you can't win PvP without also cheating; ...). Players can get frustrated by this and leave, and the developer loses money.
  • It can interfere with the operation of the servers, and that interferes with other players' enjoyment of the game.
  • Cheaters can actually steal something of value. If they sell it (e.g. gold farming), that can affect in game economy, or more directly, affect the profitability of the company.
If players cheat and no one else notices but them, you probably don't care, let them have their fun. But if they cheat and stop paying you money it matters even if they don't bother anyone else. That might happen if they get bored because they've maxed out their account easily, or they get everything they need without having a subscription (e.g. with free account).

The interaction between cheaters and developers has been called an arms race. And there are a lot more players than developers. Developers can't really hope to keep up and close every possible issue. So at some point it becomes a cost benefit thing. There will always be some cheating. You'll want to hit the big ones, and pick your battles.

There are a number of aspects to consider:
  • Detection: what is a cheat? Maybe it is gaining XP or loot too quickly. Test for this on the fly by adding logic to the game server? Run metrics queries against the DB or event logs periodically?
  • Reporting: put something in the server logs; send an alert email; weekly report out of the metrics system?
  • Mitigation: take away what they gained? ban them (and lose their subscription money)? Reimburse other players that have been harmed?
  • Prevention: do your best to secure the attack points of your system; check all client requests for sanity; do summary level real time rate limiting (detects your own bugs cheaters might exploit, speed hacks, bots/farming, aim-bots...); don't trust the client
Because this is an arms race, the enemy will find the edges of your detection and prevention system. E.g. they will fake a head shot just often enough not to get caught; they will farm gold just below the detection rate; ... So what you as a developer need to do is decide what rate of cheating is acceptable, and meets the goals of not letting cheaters ruin the fun of your game, or make you broke. Some titles have capped progress per day.

I think one of best mitigation strategies is public shaming. It leaves cheaters thinking that "everyone" is watching them, and it lets non-cheaters see that you as a developer are paying attention. You can let players report on other players. Ban the egregious cheaters, especially if they are greifing other players. Of course, they will be back with a different email address if their goal in life is to cause trouble. But this is a slippery slope susceptible to gaming as well. If you provide a means for the community to use social pressure against perceived cheaters, it can also be exploited by cheaters for greifing. E.g. if you show the community the number of reports against a player, you might think it would highlight those that should be avoided. But some might consider it a badge of honor (among thieves), or worse will use it for extortion against unempowered innocents.

You will want some form of "ignore", however, that each player can apply to those they consider a cheater. It could be used to make sure a player never gets matched into a dungeon instance or PvP match with someone, or have to listen to their obnoxious chat. Ideally, it would stop them from interacting with your character at all, and make them invisible. Just imagine being in kindergarten, and all the other kids ignored you. You aren't kicking them out of the game, but almost. Again, this might be exploited. What if someone ignored every player that was better than them at PvP. It would artificially inflate their win rating, and your leaderboards would be unfair.

But let's talk about the technical aspects of cheat prevention. (Let's ignore server intrusion problems.) Ultimately, the way a player manipulates the system is through the messages their client sends to the server. If your client is bug free, and has not been tampered with, all is well. The messages are a result of a human operating the UI as the designers intended. The difference between two players is their skill and knowledge of the game. But how can the server be sure all is well. It can only look at the messages and try to differentiate between an untampered client and one that is tampered with or replaced with a script.

I'll post this and come back later with a discussion of different kinds of attacks and ways to deal with them.

Sunday, May 15, 2011

Super hero Squad (our latest title) is now live

Things have been quiet here because all my attention was focused on Super Hero Squad (www.heroup.com). It is a Marvel title developed at The Amazing Society in Seattle, a studio of Gazillion. It is a light weight MMO, uses the Unity graphics engine, Smartfox, Apache, some Java apps on the back end, and MySQL. It is shardless, and the architecture scales horizontally with the number of concurrent players, including the database. The back end components are loosely coupled based on JMS publish/subscribe.

It has definitely been a fun project, and I'm working with a team with lots of deep experience. Load is ramping up, but not yet near the load tests we ran ahead of time. So I'm paying attention, but not anxious about it.

Along the way, we found ways to ship early and still have a fun and stable game. But as with all MMO's that actually launch, there is a lot of work left to do when you are "done". The context switch is challenging right now to go from: "we have to ship; we are not going to do that", to "remember those things we cut to simplify things; its time to put them back on the table". Now we have the fun of changing things without breaking a running service. And monitoring and fixing the service cuts into development. So things slow down at the same time they get more reactionary.