<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-1757856588756891210</id><updated>2011-10-15T06:24:49.453-07:00</updated><category term='Content Development'/><category term='Interest Management'/><category term='Architecture'/><category term='Software Engineering'/><category term='Security'/><category term='Load balancing'/><category term='Industry'/><category term='Fault Tolerance'/><category term='Off topic'/><category term='Networking'/><category term='Processing models'/><category term='Scalability'/><title type='text'>Online Game Techniques</title><subtitle type='html'>Small and large scale online game techniques. Parallel and distributed systems. Technical philosophy. Some software engineering.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>51</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-6154038161662410405</id><published>2011-06-02T23:31:00.000-07:00</published><updated>2011-06-02T23:31:38.906-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Security'/><category scheme='http://www.blogger.com/atom/ns#' term='Fault Tolerance'/><title type='text'>Techniques for Handling Cheating (Part 1)</title><content type='html'>Cheating is fun for some people. It is a game on top of your game. "Can I find a path through the maze of security mechanisms you have laid in my path?"&lt;br /&gt;&lt;br /&gt;First, why does a developer, care about cheating in online games?&lt;br /&gt;&lt;ul&gt;&lt;li&gt;They spent a lot of effort making content so they want to make sure players experience it instead of skipping over it and "stealing" the reward. The idea being that the players will have more fun facing the challenges and beating them. They'll appreciate it more if they have to work for it.  Maybe. Some people are weird, and get a sense of appreciation out of working through the cheats.&lt;/li&gt;&lt;li&gt;Cheating can directly interfere with other player's enjoyment of the content. E.g. griefing, stealing their stuff,... &lt;/li&gt;&lt;li&gt;The perception of unfairness (everyone else has all the goodies, and you don't; you can't win PvP without also cheating; ...). Players can get frustrated by this and leave, and the developer loses money.&lt;/li&gt;&lt;li&gt;It can interfere with the operation of the servers, and that interferes with other players' enjoyment of the game.&lt;/li&gt;&lt;li&gt;Cheaters can actually steal something of value. If they sell it (e.g. gold farming), that can affect in game economy, or more directly, affect the profitability of the company. &lt;/li&gt;&lt;/ul&gt;If players cheat and no one else notices but them, you probably don't care, let them have their fun. But if they cheat and stop paying you money it matters even if they don't bother anyone else. That might happen if they get bored because they've maxed out their account easily, or they get everything they need without having a subscription (e.g. with free account).&lt;br /&gt;&lt;br /&gt;The interaction between cheaters and developers has been called an arms race. And there are a lot more players than developers. Developers can't really hope to keep up and close every possible issue. So at some point it becomes a cost benefit thing. There will always be some cheating. You'll want to hit the big ones, and pick your battles.&lt;br /&gt;&lt;br /&gt;There are a number of aspects to consider:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Detection: what is a cheat? Maybe it is gaining XP or loot too quickly. Test for this on the fly by adding logic to the game server? Run metrics queries against the DB or event logs periodically?&lt;/li&gt;&lt;li&gt;Reporting: put something in the server logs; send an alert email; weekly report out of the metrics system?&lt;/li&gt;&lt;li&gt;Mitigation: take away what they gained? ban them (and lose their subscription money)? Reimburse other players that have been harmed?&lt;/li&gt;&lt;li&gt;Prevention: do your best to secure the attack points of your system; check all client requests for sanity; do summary level real time rate limiting (detects your own bugs cheaters might exploit, speed hacks, bots/farming, aim-bots...); don't trust the client&lt;/li&gt;&lt;/ul&gt;Because this is an arms race, the enemy will find the edges of your detection and prevention system. E.g. they will fake a head shot just often enough not to get caught; they will farm gold just below the detection rate; ... So what you as a developer need to do is decide what rate of cheating is acceptable, and meets the goals of not letting cheaters ruin the fun of your game, or make you broke. Some titles have capped progress per day.&lt;br /&gt;&lt;br /&gt;I think one of best mitigation strategies is public shaming. It leaves cheaters thinking that "everyone" is watching them, and it lets non-cheaters see that you as a developer are paying attention. You can let players report on other players. Ban the egregious cheaters, especially if they are greifing other players. Of course, they will be back with a different email address if their goal in life is to cause trouble. But this is a slippery slope susceptible to gaming as well. If you provide a means for the community to use social pressure against perceived cheaters, it can also be exploited by cheaters for greifing. E.g. if you show the community the number of reports against a player, you might think it would highlight those that should be avoided. But some might consider it a badge of honor (among thieves), or worse will use it for extortion against unempowered innocents.&lt;br /&gt;&lt;br /&gt;You will want some form of "ignore", however, that each player can apply to those they consider a cheater. It could be used to make sure a player never gets matched into a dungeon instance or PvP match with someone, or have to listen to their obnoxious chat. Ideally, it would stop them from interacting with your character at all, and make them invisible. Just imagine being in kindergarten, and all the other kids ignored you. You aren't kicking them out of the game, but almost. Again, this might be exploited. What if someone ignored every player that was better than them at PvP. It would artificially inflate their win rating, and your leaderboards would be unfair.&lt;br /&gt;&lt;br /&gt;But let's talk about the technical aspects of cheat prevention. (Let's ignore server intrusion problems.) Ultimately, the way a player manipulates the system is through the messages their client sends to the server. If your client is bug free, and has not been tampered with, all is well. The messages are a result of a human operating the UI as the designers intended. The difference between two players is their skill and knowledge of the game. But how can the server be sure all is well. It can only look at the messages and try to differentiate between an untampered client and one that is tampered with or replaced with a script.&lt;br /&gt;&lt;br /&gt;I'll post this and come back later with a discussion of different kinds of attacks and ways to deal with them.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-6154038161662410405?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/6154038161662410405/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2011/06/techniques-for-handling-cheating-part-1.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/6154038161662410405'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/6154038161662410405'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2011/06/techniques-for-handling-cheating-part-1.html' title='Techniques for Handling Cheating (Part 1)'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-2633146060069956738</id><published>2011-05-15T01:14:00.000-07:00</published><updated>2011-05-15T01:14:12.990-07:00</updated><title type='text'>Super hero Squad (our latest title) is now live</title><content type='html'>Things have been quiet here because all my attention was focused on Super Hero Squad (&lt;a href="http://www.heroup.com/"&gt;www.heroup.com&lt;/a&gt;). It is a Marvel title developed at The Amazing Society in Seattle, a studio of Gazillion. It is a light weight MMO, uses the Unity graphics engine, Smartfox, Apache, some Java apps on the back end, and MySQL. It is shardless, and the architecture scales horizontally with the number of concurrent players, including the database. The back end components are loosely coupled based on JMS publish/subscribe.&lt;br /&gt;&lt;br /&gt;It has definitely been a fun project, and I'm working with a team with lots of deep experience. Load is ramping up, but not yet near the load tests we ran ahead of time. So I'm paying attention, but not anxious about it.&lt;br /&gt;&lt;br /&gt;Along the way, we found ways to ship early and still have a fun and stable game. But as with all MMO's that actually launch, there is a lot of work left to do when you are "done". The context switch is challenging right now to go from: "we have to ship; we are not going to do that", to "remember those things we cut to simplify things; its time to put them back on the table". Now we have the fun of changing things without breaking a running service. And monitoring and fixing the service cuts into development. So things slow down at the same time they get more reactionary.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-2633146060069956738?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/2633146060069956738/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2011/05/super-hero-squad-our-latest-title-is.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/2633146060069956738'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/2633146060069956738'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2011/05/super-hero-squad-our-latest-title-is.html' title='Super hero Squad (our latest title) is now live'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-7806504724622413361</id><published>2011-02-27T20:18:00.000-08:00</published><updated>2011-02-27T20:18:06.388-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Software Engineering'/><title type='text'>Running branches for continuous publishing</title><content type='html'>I am a very strong proponent of what are called running branches for development of software, and for the stabilization and publication of online games. One of the more important features of large scale online games is that they live a long time, and have new content, bug fixes and new features added over time. It is very difficult to manage that much change with a relatively large amount of code and content. And since you continue to develop more after any release, you will want your developers to be able to continue working on the next release while the current one is still baking in QA, and rolling toward production.&lt;br /&gt;&lt;br /&gt;I will skip the obvious first step of making the argument that version control systems (aka source code change control, revision control) are a good idea. I like Perforce. It has some nice performance advantages over Subversion for large projects, and has recently incorporated ease of use features like shelving and sandbox development. I like to call the main line of development mainline. I also like to talk about the process of cutting a release and deploying it into production as a "train". It makes you think about a long slow moving object that is really hard to stop, and really difficult to add things to and practically impossible to pull out and pass. And if you get in the way, it will run you down, and someone will lose a leg. Plus it helps with my analogy of mainline and branch lines.&lt;br /&gt;&lt;br /&gt;So imagine you are preparing your first release. You make a build called Release Candidate 1 (RC1), and hand it off to QA. You don't want your developers to go idle, so you have two choices, they can pitch in on testing, or they can start working on release 2. You will probably do a bit of each, especially early in the release cycle, since you often dig up some obvious bugs, and can keep all your developers busy fixing those. But at some point they will start peeling off and need something to do. So you sic them on Release 2 features, and they start checking in code.&lt;br /&gt;&lt;br /&gt;Then you find a bug. A real showstopper. It takes a day to find and fix. Then you do another build and you have RC1.1. But you don't want any code from Release 2 that has been being checked in for several days. It has new features you don't want to release, and has probably introduced bugs of its own. So you want to use your change control system to make a branch. And this is where the philosophy starts. You either make a new branch for every release, or you make a single Release Candidate branch and for each release, branch on top of it.&lt;br /&gt;&lt;br /&gt;Being prepared ahead of time for branching can really save you time, and confusion, especially during the high stress periods of pushing a release, or making a hotfix to production. So I'm really allergic to retroactive branching, where you only make a branch if you find a bug and have to go back a patch something.&lt;br /&gt;&lt;br /&gt;Here's why: the build system has to understand where this code is coming from, or you will be doing a lot manual changes right when things are the most stressed. If you have already decided to make branches, you will also have your build system prepared and tested to know how to build off the branch. You will also have solved little problems like how to name versions, prepare unambiguous version strings so you can track back from a build to the source it came from, and many more little surprises.&lt;br /&gt;&lt;br /&gt;The build system is another reason why I prefer running branches as opposed to a new branch per release. You don't have to change any build configuration when a new release comes along. The code for RC2 is going to be in exactly the same place as RC1. You just hit the build button. That kind of automation and repeatability is key to avoiding "little" mistakes. Like accidentally shipping the DB schema from last release, or wasting time testing the old level up mechanism, or missing the new mission descriptions.&lt;br /&gt;&lt;br /&gt;And then there is the aesthetic reason. If you cut a branch for every release, your source control depot is going to start looking pretty ugly. You are planning on continuous release, right? Every month. After 5 years that would be 60 complete copies of the source tree. Why not just 2: ML and RC (and maybe LIVE, but let's save that for another time).&lt;br /&gt;&lt;br /&gt;Finally, as a developer, if you are lucky enough to be the one making the hotfix, you will want to get a copy of the branch onto your machine. Do you really want another full copy for each release that comes along? Or do you just want to do an update to the one RC branch you've prepared ahead of time? It sure makes it easier to switch back and forth.&lt;br /&gt;&lt;br /&gt;An aside about labels: You might argue you could label the code than went into a particular build, and that is a good thing. But one problem with labels that has always made me very nervous is that labels themselves are not change controlled. Someone might move a label to a different version of a file, or accidentally delete it or reuse it, and then you would lose all record of what actually went into a build. You can't do that with a branch. And if you tried, you would at least have the change control records to undo it.&lt;br /&gt;&lt;br /&gt;One more minor thought: if you want to compare all the stuff that changed between RC1 and RC2, it is much easier to do in a running branch. You simply look at the file history on the RC branch and see what new stuff came in. To do that when using a branch per release requires a custom diff each time you want to know: e.g. drag a file from one branch onto the same file on the other. Pretty clumsy.&lt;br /&gt;&lt;br /&gt;Also note that these arguments don't apply as well for a product that has multiple versions shipped and in the wild simultaneously. An online game pretty universally replaces the previous version with the new one at some point in time. The concurrency of their existence is only during the release process.&lt;br /&gt;&lt;br /&gt;Summary:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;You want to branch so you can stabilize without stopping ongoing work for the next release&lt;/li&gt;&lt;li&gt;You want a branch so you are ready to make hot fixes&lt;/li&gt;&lt;li&gt;You want a running branch so your build system doesn't have to get all fancy, and so your repo looks simpler.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;I may revisit the topic of branching in the form of sandbox development which is useful for research projects and sharing between developers without polluting the mainline.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-7806504724622413361?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/7806504724622413361/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2011/02/running-branches-for-continuous.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/7806504724622413361'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/7806504724622413361'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2011/02/running-branches-for-continuous.html' title='Running branches for continuous publishing'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-2906368106187448206</id><published>2011-01-16T00:50:00.000-08:00</published><updated>2011-01-16T00:50:00.790-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Architecture'/><category scheme='http://www.blogger.com/atom/ns#' term='Software Engineering'/><category scheme='http://www.blogger.com/atom/ns#' term='Networking'/><category scheme='http://www.blogger.com/atom/ns#' term='Interest Management'/><title type='text'>Topics are not Message Types</title><content type='html'>I periodically have an unproductive conversation about how to use Topics/Categories vs how to use Message Types. Hopefully this time will be better.&lt;br /&gt;&lt;br /&gt;Both things appear to be used to "subscribe", and both wind up filtering what a message handler has to process and gets to process. If they can be used for exactly the same purposes, it is "just" policy as to what you use each one for. That has to be wrong, otherwise there would not be *two* concepts. Tus, there has to be a useful distinction. So let's define what they are and what their responsibilities are.&lt;br /&gt;&lt;br /&gt;First a definition or two:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Hierarchical: a name is defined hierarchically if the parent context is needed to ensure the child is distinct from children of other parents when the children have the same name. The parents provide the namespace in which the child is defined.&lt;/li&gt;&lt;li&gt;Orthogonal: names are independent of one another, like dimensions or axes in mathematics.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Categories are names (or numbers) that are used to decompose a stream of messages into groups. In JMS they are called Topics, but I'm going to avoid that term in case the implementation of Topics in JMS implies something I don't mean. A message is sent on, or "to" a single Category. A consumer subscribes to one or more Categories. Sophisticated message publish/subscribe or producer/consumer implementations can support wildcards or bitmasking to optimize subscription to large sets of Categories. (While not very germane to this discussion, I believe JMS can only have wildcards at the end of a Topic, and only at a dot that separates portions of the Topic. My view of wildcards and Category masking does not have that limitation. But that shouldn't affect my arguments.)&lt;br /&gt;&lt;br /&gt;It is critical to have a mechanism that efficiently filters network messages so that a consuming process is not "bothered" by messages arriving that are immediately discarded. Running the TCP stack, for example, can wind up consuming large fractions of the CPU, and if the message is discarded, even after a simple inspection by your message framework, that is totally wasted processing. Further, if the messages are traveling over a low bandwidth link to a player, for example, it can badly affect their experience as it steals network resources from more important traffic. So we want the sender, or some intermediary to filter the messages earlier.&lt;br /&gt;&lt;br /&gt;Early distributed simulation implementations (DIS) used multicast groups, and relied on the Network Interface hardware to filter out any messages in groups that the consumer had not subscribed to. Ethernet Multicast tends to broadcast all messages, and rely on the NIC of each host to inspect and filter unwanted messages. That is better than having the kernel do it. Switches get into the picture, but are very simplistic when it comes to multicast. When there are more than a few groups, switches and NICs will become promiscuous, and all messages get broadcast anyway, and wind up in each destination's kernel. They are filtered there, but much of the network stack has already executed. To get around that, physical network segmentation with intelligent bridges were built to copy a message from one segment to another. The bridge or rebroadcaster or smart-router would crack open each message and send it into another segment based on configuration, or a control protocol (subscription request messages).&lt;br /&gt;&lt;br /&gt;Ancient history. However, it formed the origin of the concept of numeric Categories. A message is sent to a single Category. A consumer subscribes. The Channel/Category/Subscription manager maintains the declared connectivity and routes the messages.&lt;br /&gt;&lt;br /&gt;So. Categories are used to optimize routing. They minimize the arrival of a message to a process. So far, this has nothing to do with what code is run when it arrives.&lt;br /&gt;&lt;br /&gt;Message types are also names but are used to identify the meaning of a message; what the message is telling or requesting of the destination; what code should run when the message arrives (or what code should not run). Without a message type, there would be only one generic handler. In the old days, that master-handler would be a switch statement, branching on some field(s) of the message (lets call that field the message type, and be done with it).&lt;br /&gt;&lt;br /&gt;There is some coded, static binding of a message type to a piece of code; the message handler. Handler X is for handling messages of type Y. A piece of code cannot process fields of a message different than what it was coded for. There is little reason to make that binding dynamic or data-driven. Static binding is "good". It leads to fewer errors, and those error can be caught much earlier in the development cycle. Distributed systems are hard. You don't really want to catch a message-to-code mismatch after you've launched. One way to think about this static binding is as a Remote Procedure Call. You are telling a remote process to run the code bound to message Y. In fact, you can simplify your life by making the handler have the same name as the message type, and not even register the binding.&lt;br /&gt;&lt;br /&gt;A message can be sent to any Category regardless of the message's type. There is no checking in code that a choice is "legal". The Category can be computed, and the message is bound to that value dynamically. Instances of the same message type can be sent to one of any number of Categories. Consumers can subscribe to any Category whether they know how to process all the message types it contains or not.&lt;br /&gt;&lt;br /&gt;So. Back to the distinction. When code is declared to be able to handle messages of type Y, that does not imply that all message instances of type Y should arrive at the process with that handler. You may want to do something like load balancing where half the messages of type Y go to one process, and the other half go to a tandem process. So message types are independent of Categories. The two concepts are orthogonal.&lt;br /&gt;&lt;br /&gt;When a process is subscribed to a Category, there is no guarantee to the subscriber about the message types that a producer sends to that Category. It is easy to imagine a process receiving messages it does not know how to handle. The sender can't force the receiver to write code, but the sender can put any Category on a message it wants. So Categories are independent of message types. The two concepts are orthogonal.&lt;br /&gt;&lt;br /&gt;Now. With respect to hierarchy. Message type names can be declared within a hierarchical namespace. That can be pretty useful. At the end of the day, however, they are simply some strings, or bit strings. In a sophisticated system that maps message types to message classes (code), the class hierarchy may mirror the type name hierarchy, and have interesting semantics (like a handler for a base message class being able to handle a derived message class). But mostly, message type name hierarchy is useful to avoid collisions.&lt;br /&gt;&lt;br /&gt;In systems like JMS, Categories (Topics) are also hierarchical. This is also done to avoid collisions in the topic namespace, and for organization. But it is also useful for wildcard subscription.&lt;br /&gt;&lt;br /&gt;Now "the" question: are Categories within the Message Type Hierarchy, or are Message Types within the Category hierarchy? Or are they orthogonal to one another? I submit that a message of a given type means the same thing no matter which Category it arrived on. Further, the same message type can be sent to any Category and a Category can transport any number of different message types.&lt;br /&gt;&lt;br /&gt;Since there is only one message exchange system, Categories cannot be reused for two purposes without merging the message streams. That leads to inefficiency. If you reuse a message type name for two different purposes, you run the risk of breaking handler code with what appears to be a malformed message. That leads to crashes. You could permit that kind of reuse, and institute policy and testing to keep those things from mingling (e.g. reuse message types, but only on different topics), but it is a looming disaster. I would put in some coordination mechanism or name spacing to keep the mingling from happening at all.&lt;br /&gt;&lt;br /&gt;So what are the consequences:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;There is no need to include Category when registering a message handler.&amp;nbsp;&lt;/li&gt;&lt;li&gt;Category subscription occurs separately from handler-to-message-type mapping, and affects the entire process.&lt;/li&gt;&lt;li&gt;There is no need to build a message dispatcher that looks at Categories.&lt;/li&gt;&lt;/ul&gt;Well. That was pretty long winded. For those of you still here, I have an analogy. I haven't thought it through a lot, but it looks like it fits (although it is about a pull system, not a push system). URLs. The hostname and domain name represent a hierarchical Category or Topic. The path portion is the message type and identifies the handler (web service), and is also hierarchical. You can host your web site on any host on any domain, and the functionality would be the same. You can host any web site on your host. You can host any number of web sites on your host, provided the paths don't collide. If they do collide, you are going to get strange behavior as links refer to the wrong services, or pass the wrong parameters. One would need more hierarchy. Or you don't host the colliding web sites together. You put them on different addresses. But the service code doesn't care what address you choose.&lt;br /&gt;&lt;br /&gt;Unless you talk about virtual hosts, or virtual processes, multiple independent connections to the message system, thread-local subscriptions. You can do *anything* in software. But should you?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-2906368106187448206?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/2906368106187448206/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2011/01/topics-are-not-message-types.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/2906368106187448206'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/2906368106187448206'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2011/01/topics-are-not-message-types.html' title='Topics are not Message Types'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-3735330882437850445</id><published>2010-12-22T23:58:00.000-08:00</published><updated>2010-12-22T23:58:12.401-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Industry'/><category scheme='http://www.blogger.com/atom/ns#' term='Software Engineering'/><title type='text'>The Real Priorities of Online Game Engineering</title><content type='html'>I was trying to communicate to management that server developers have different priorities than game developers. As a means to show the importance of laying in administrative infrastructure, and other software engineering "overhead", I put this list together. Hope it helps you to think about making the right investment in making the system sustainable, and make those points to the powers that be.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;This is a list of priorities in absolute order of importance. While it is good to address all of them, if we don’t have one of the higher priority requirements solved to a reasonable degree, there is not much point in having the lower ones.&lt;br /&gt;&lt;br /&gt;I made this to help us focus on what is important, what order to do things, and what we might cut initially. I’d love to debate this over lunch with anyone. I’m hoping others think of more of these kind of driving requirements.&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Don’t get sued. In particular, always deal with child safety. We also need to abide by our IP contract obligations (sometimes including the shipping date). Better to not ship than get sued into oblivion or go to jail.&lt;/li&gt;&lt;li&gt;Protect the game’s reputation. Even if the game is awesome, if the public thinks it isn’t or the service is poor, then we lose. This is especially important early in the lifecycle. This implies not shipping too early.&lt;/li&gt;&lt;li&gt;Be able to collect money. Even if there is no game.&lt;/li&gt;&lt;li&gt;Be able to roll out a new version periodically. Even if the game is broken or not finished, this means we can fix it. This implies:&lt;/li&gt;&lt;ol&gt;&lt;li&gt;You can make a build&lt;/li&gt;&lt;li&gt;You can QA it&lt;/li&gt;&lt;li&gt;You can deploy it without destroying what is already there, or at least roll back&lt;/li&gt;&lt;/ol&gt;&lt;li&gt;Support effort is sustainable. If the game and system are perfect, but it needs so much handholding that our staff burns out, or we don’t have the resources to extend it, we still fail. This implies lots of stuff:&lt;/li&gt;&lt;ol&gt;&lt;li&gt;It is stable enough that staff is not working night and day to hold its hand.&lt;/li&gt;&lt;li&gt;There is enough automated maintenance to limit busy work&lt;/li&gt;&lt;li&gt;There is enough automated logging, metrics and alarms to limit time spent hovering&lt;/li&gt;&lt;/ol&gt;&lt;li&gt;The cost of operating is not excessive. I.e. it sits lightly enough on the hardware that we don’t need massive amounts, or exotic types. (Special warning to engineers: it is all the way down here before we start to care about performance. And the only reason to care about performance is operating cost.)&lt;/li&gt;&lt;li&gt;Enough players can connect. This implies lots of stuff:&lt;/li&gt;&lt;ol&gt;&lt;li&gt;The cluster hardware exists at all, the network is set up, etc&lt;/li&gt;&lt;li&gt;There is a web site&lt;/li&gt;&lt;li&gt;Key platform and login features exist&lt;/li&gt;&lt;li&gt;There are enough related server and game features&lt;/li&gt;&lt;/ol&gt;&lt;li&gt;The server is sufficiently stable that players can remain connected long enough. This implies lots of stuff:&lt;/li&gt;&lt;ol&gt;&lt;li&gt;It stays up.&lt;/li&gt;&lt;li&gt;There are no experience-ruining bugs or tuning problems.&lt;/li&gt;&lt;li&gt;Not too much lost progress when things crash.&lt;/li&gt;&lt;li&gt;The load is not allowed to get too high (population caps)&lt;/li&gt;&lt;/ol&gt;&lt;ul&gt;&lt;ul&gt;&lt;li&gt;&lt;b&gt;This is probably about where we need to get before Closed Beta.&lt;/b&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;li&gt;Revenues exceed cost of operation. And eventually, cost of development. This implies not shipping too late. Note that you don't *have* get to this point immediately. And that this is more important than having a fun game.&lt;/li&gt;&lt;li&gt;The game is fun. This implies so much stuff, I won’t write it all down. Note that the requirement to not ruin the game's reputation can move some of this stuff earlier. But don't fool yourself. If you are making money on a game that is not fun, is that bad? I'm sure you can think of some examples of this. Here are some server-specific implications: &lt;/li&gt;&lt;ol&gt;&lt;li&gt;You aren’t put in a login queue for too long. You don’t have trouble finding a good time to play.&lt;/li&gt;&lt;li&gt;You aren’t dropped out of the game too often.&lt;/li&gt;&lt;li&gt;The feeling of lag is not that bad.&lt;/li&gt;&lt;li&gt;You can find people to play with. It is an online game, after all.&lt;/li&gt;&lt;/ol&gt;&lt;li&gt;Players cannot ruin one another’s fun. Note that making the game cheat proof is not the requirement here. The only reason you care about cheating is if other players perceive it badly enough (reputation), or if the players are keeping you from making money.&lt;/li&gt;&lt;ol&gt;&lt;li&gt;They cannot grief one another, especially newbs.&lt;/li&gt;&lt;li&gt;They cannot bring down the server&lt;/li&gt;&lt;li&gt;They cannot ruin the gameplay or economy, making swatches of gameplay pointless or boring.&lt;/li&gt;&lt;/ol&gt;&lt;li&gt;The server can scale to very large numbers of players. This is the profit multiplier.&lt;/li&gt;&lt;/ol&gt;Be honest with yourself. Are you over engineering? Solving fun technical problems that don't actually address any of these Real Priorities? Doing things in the right order? Remember, as the online engineer, you represent these priorities to management. They may not (yet) understand why this order is important.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-3735330882437850445?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/3735330882437850445/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2010/12/real-priorities-of-online-game.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/3735330882437850445'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/3735330882437850445'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2010/12/real-priorities-of-online-game.html' title='The Real Priorities of Online Game Engineering'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-5765201225184559345</id><published>2010-12-14T23:30:00.000-08:00</published><updated>2010-12-14T23:30:10.860-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Off topic'/><title type='text'>"Restraint in what you ask for is the key to success" - Arnold's Aphorism</title><content type='html'>A good friend of mine sent me an email about our current sprint (we use a relatively formal Scrum process), encouraging us to define our stories such that the team could experience a success. He is also a great student of military history. I assumed the quote below was from some past general or philosopher. He claims not. So I've named it after him and am spreading the word.&lt;br /&gt;&lt;br /&gt;"Restraint in what you ask for is the key to success" - Arnold's Aphorism&lt;br /&gt;&lt;br /&gt;I can picture Napoleon fighting with himself over asking his men to achieve a little too much and fail. Or ask less and wind up with better morale. Applying this to software engineering teams makes a lot of sense. If you always ask for more than can be successfully completed, the team may feel unsuccessful/underachieving, and you may feel disappointed. When in fact, they are doing almost the same work whether you expected more or not.&lt;br /&gt;&lt;br /&gt;Here's the thing, though. It's been said before. In all things, moderation. Time management, and interpersonal relationship books talk about expectation management. "Under promise and over deliver". The pessimist is never disappointed and often surprised. Live beneath your means. "Let go". Thou shalt not covet.&lt;br /&gt;&lt;br /&gt;This is probably more true of things we ask of ourselves: get this, do that, convince them...&lt;br /&gt;&lt;br /&gt;What struck me in the phrasing of this old-new aphorism is how clearly it shows that you have the *choice* in setting the expectations. And yet, that goal is the very thing that defines success. And for many goal-oriented folks like us, that success defines our happiness. Ergo, we choose to be happy. Or not.&lt;br /&gt;&lt;br /&gt;But it requires self discipline. Not so much in the exercise of effort, but in the restraining of wanting.&lt;br /&gt;&lt;br /&gt;You may have heard analyses of the great American marketing machine. It generates unrequited desires, while offering to fulfill them (for a price). The promise of happiness. Oddly, it is more commonly the restraint of those desires that leads to happiness, not the fulfillment.&lt;br /&gt;&lt;br /&gt;I think you are going to see many more instances of this wisdom over then next while. See if you can't identify the aspect that you control.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-5765201225184559345?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/5765201225184559345/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2010/12/restraint-in-what-you-ask-for-is-key-to.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/5765201225184559345'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/5765201225184559345'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2010/12/restraint-in-what-you-ask-for-is-key-to.html' title='&quot;Restraint in what you ask for is the key to success&quot; - Arnold&apos;s Aphorism'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-2096777264171125793</id><published>2010-11-10T20:12:00.000-08:00</published><updated>2010-11-10T20:12:31.615-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Processing models'/><category scheme='http://www.blogger.com/atom/ns#' term='Scalability'/><category scheme='http://www.blogger.com/atom/ns#' term='Software Engineering'/><title type='text'>Big load testing</title><content type='html'>I am a real fan of using the real client to do load testing. Your QA engineers will spend a lot of time building regression tests that verify the behavior of the game is still the same and no new bugs have been introduced. That entails adding scripting or player behavior "simulation" to the game code, but also includes creating the scripts that test the functionality of the game. Those test cases are really important and ideally cover almost all of the game functionality. And they have to be kept up to date as the code in the game changes.&lt;br /&gt;&lt;br /&gt;Why not reuse all that work to help load test the server? The scaffolding, client hooks, and test cases?&lt;br /&gt;&lt;br /&gt;One of my favorite ways of doing this is to have a test driver that picks random test cases and throws them at the server as fast as possible. Even if the test case involves sleeping or waiting for something like the character walking across some area of the game, if you run enough of them at the same time, you can generate significant load. And it is going to be more realistic than any other kind of test prior to having zillions of real players. It also saves you from having to reproduce the protocol and behavior of the real client and maintain it as the game team evolves everything.&lt;br /&gt;&lt;br /&gt;Why not? Even if you take the time to make a headless version of the client, it is probably going to be so resource heavy that you will have trouble finding enough machinery to really ramp up. Most games are designed to tick as fast as possible to give the best framerate, but a headless client doesn't draw, so that is a waste of CPU. Some games rely on the timing intrinsic in animations to control walk speed or action/reaction times for interactions. But you want to strip out as much content as possible to save memory. Clearly there is a bunch of work needed to reduce the footprint of even a headless client. But they really are useful.&lt;br /&gt;&lt;br /&gt;One thing you can do to make them more useful is construct a mini server cluster and see how it stands up to as many clients as you can scavenge.&lt;br /&gt;&lt;br /&gt;You can get hold of more hardware than you might think by "borrowing" it at night from the corporate pool of workstations. You will need permission, and you will want a fool proof packaging so your clients can be installed (and auto-updated) without manual intervention or a sophisticated user. There is nothing like a robot army to bring your server to its knees. IT doesn't like this idea very much because they like to use night time network bandwidth for doing backup and stuff.&lt;br /&gt;&lt;br /&gt;Another important trick is to observe the *slope* of performance changes relative to the change in load you throw at the server. If the marginal effect (incremental server load divided by incremental client load) is &amp;gt; 1 you have a problem. Some people call this non-linear or non-scalable performance. Although, to be technical, it is non-unitary. Non-linear means it is even worse that y = a*x + b. E.g. polynomial (x^2), or exponential (y = a^x). Generally you can find the low hanging fruit pretty easily. If the first 500 connected clients caused a memory increase of 100 MB, but the second 500 caused consumed 200 MB you have a problem. Obviously this applies to CPU, bandwidth and latency. And don't forget to observe DB latency as you crank up both the number of clients and the amount of data already in the DB. You may have forgotten an index.&lt;br /&gt;&lt;br /&gt;But you may still not have enough insight, even given all this. The next step could be what I call a light-weight replay client or a wedge-client. The idea is to instrument the headless client, or graphical client and record the parameters being passed into key functions like message send, or web service calls. You are inserting a wedge between the game code and the message passing code. The real client can then be used to create a log of all the interesting data that is needed to stress the server. You would then create a replay client that uses only the lower level libraries. It would read the logs, passing the recorded parameters into a generic function that reproduces the message traffic and web requests. It doesn't have to understand what it is doing. The next step is to replace the values of key parameters to simulate a variety of players. You could use random player ids, or spend some more time having the replay client understand the sequences of logs and server responses. E.g. it could copy a session ID from a server response into all further requests.&lt;br /&gt;&lt;br /&gt;Since you are wedging into existing source code, this approach is way easier than doing a network level recording and playback. That would require writing packet parsing code, and creating a state machine to try to simulate what the real client was doing. Very messy.&lt;br /&gt;&lt;br /&gt;You might still not be able to replay enough load. Perhaps you don't have enough front end boxes purchased yet, but you want to stress your core server. The DB or event processing system. We use a JMS bus (it is great for publish/subscribe semantics that allows for loose coupling between components) to tie most things together on the back end. We built a record/replay system that pulls apart the JMS messages and does parameters replacement much like the wedge client described above. It is pretty simple to simulate thousands of players banging away. Not every client event results in a back end event that affects the DB.&lt;br /&gt;&lt;br /&gt;So what we are planning on doing is:&lt;br /&gt;a) build a mini-cluster with just a few front end boxes&lt;br /&gt;b) use QA's regression test cases to drive them to their knees looking for bad marginal resource usage&lt;br /&gt;c) use wedge recordings and replay if needed for even more load on the front end boxes&lt;br /&gt;d) use the JMS message replay system to drive the event system and DB to its knees, also looking for bad marginal usage.&lt;br /&gt;e) do some shady arithmetic to convince ourselves that the simulated client count that resulted in X% utilization of our test cluster will allow us to get to our target client count in the remaining 100-X% utilization available and the new hardware we plan to have in production.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-2096777264171125793?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/2096777264171125793/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2010/11/big-load-testing.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/2096777264171125793'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/2096777264171125793'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2010/11/big-load-testing.html' title='Big load testing'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-5431773058955298812</id><published>2010-10-28T00:16:00.000-07:00</published><updated>2010-10-28T00:16:18.417-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Scalability'/><category scheme='http://www.blogger.com/atom/ns#' term='Architecture'/><category scheme='http://www.blogger.com/atom/ns#' term='Software Engineering'/><title type='text'>Use of public/private keys to avoid a race condition</title><content type='html'>I inherited an interesting technique when I took over the server of SHSO. The use of signatures to avoid a race condition that can occur when a player is handed off between hosts. The use case is: the client requests that the Matchmaker connect them to a game "room" server that has a new game session starting. That room needs correct information about the player, what level they are, what they own, etc. How do you get it to the room before the client connects to that room and the data is needed? A couple of approaches:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;You don't. You just leave the client connected but in limbo until their data arrives through the back end asynchronously. This is a pretty normal approach. Sometimes the data arrives before the client makes the connection. So you have to cover both of those cases.&lt;/li&gt;&lt;li&gt;You don't. You leave the client in limbo while the room fetches the data through the back end synchronously. This is also pretty normal, but blocks the thread the room is running on, which can suck, especially if other rooms on the same host and process also get blocked. Yes, you could multithread everything, but that is not easy (see my manifesto of multithreading!). Or you could create a little state machine that tries to remember what kind of limbo the client is in: just-connected, connected-with-data-fetch-in-progress, etc. Personally, I don't allow the processes that run the game logic to open a connection directly to the DB and do blocking queries. DB performance is pretty erratic in practice, and that makes for uneven user experience.&lt;/li&gt;&lt;li&gt;Or have the data arrive *with* the connection. From the client. Interesting idea. But can you trust the client? That is where signed data comes in.&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;A quick review of cryptography. Yes, I am totally oversimplifying it, and skipping a lot of the interesting bits, and optimization and stuff. However, it is fun to talk about...&lt;br /&gt;&lt;br /&gt;A public/private key works by taking advantage of mathematics that is easy to compute in one direction, but really hard to compute in the other direction. The most common is factoring very large integers that are the product of two very large prime numbers. There is only one way to factor the product, but you have to compute and try pretty much every prime number up to the square root of the product, and that can take a looong time.&lt;br /&gt;&lt;br /&gt;A public key can be used to encrypt plain text, and the private key is the only thing that can be used to unencrypt it. That means only the owner of the private key can read the plain text (including any else that had access to the public key).&lt;br /&gt;&lt;br /&gt;On the other hand, a signature is created by *unencrypting* the plain text using the private key. The public key can then be used to *encrypt* the signature and test if the result equals the plain text again, thereby verifying the signature, and proving that the signature and plain text came from the owner of the private key exactly as they sent it.&lt;br /&gt;&lt;br /&gt;Back to the story...the player data is signed using the private key by the Matchmaker and delivered to the client when the Matchmaker directs the player to the correct room. The client cannot tamper with their data without getting caught. The client then sends the data to the game room with the signature when it connects. The room server checks the signature using the public key, and can tell that the data is unmodified and came indirectly from the Matchmaker.&lt;br /&gt;&lt;br /&gt;Why not just encrypt the data? The room server could have been set up to be able to unencrypt. Answer:&amp;nbsp; The client wants to read and use its player's data. It wouldn't be able to do that if it were encrypted. And sending it twice (plain, and encrypted) is a waste.&lt;br /&gt;&lt;br /&gt;One interesting thing to note is that the client never saw the public, nor the private key.&lt;br /&gt;&lt;br /&gt;OK. I know it seems pretty wasteful to send all this data  down to the client, just to have it sent back up to a different host in  the server. After all, we are talking about bandwidth in and out of the data center, and more importantly, bandwidth in and out of the client's home ISP connection. Only half bad. The client needed the data anyway. It is not the greatest approach, but it is what we have. As Patton is quoted: &lt;b&gt;A good solution applied with vigor now is better than a perfect  solution applied ten minutes later.&lt;/b&gt; &lt;br /&gt;&lt;br /&gt;BTW, another reason this was built this way was that originally the system couldn't predict which room the player would wind up in, so the client needed to bring their data with them.&lt;br /&gt;&lt;br /&gt;And that is a segue to the topic of scaling across multiple data centers. It might start to make sense of that extra bandwidth question.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-5431773058955298812?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/5431773058955298812/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2010/10/use-of-publicprivate-keys-to-avoid-race.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/5431773058955298812'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/5431773058955298812'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2010/10/use-of-publicprivate-keys-to-avoid-race.html' title='Use of public/private keys to avoid a race condition'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-4509752759874613485</id><published>2010-10-25T00:05:00.000-07:00</published><updated>2010-10-25T00:05:24.833-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Load balancing'/><category scheme='http://www.blogger.com/atom/ns#' term='Processing models'/><category scheme='http://www.blogger.com/atom/ns#' term='Scalability'/><category scheme='http://www.blogger.com/atom/ns#' term='Software Engineering'/><title type='text'>Architect it for Horizontal DB Scalability</title><content type='html'>The performance of the database in an MMO tends to be the most common limiting factor in determining how many simultaneous players an individual Cluster can support. Beyond that number, the common approach is to start creating exact copies of the Cluster, and call them Shards or Realms. The big complaint about sharding is that two friends may not be able to play together if their stuff happens to be on different shards. Behind the scenes, what is going on is that their stuff is in separate database instances. Each shard only accesses one DB instance because the DB engine can only handle so much load.&lt;br /&gt;&lt;br /&gt;There are a couple of approaches that can almost completely get rid of these limits, both of which depend on creating many DB instances, and routing requests to the right instance&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The first is an "automated" version of a character transfer. When a player logs in, they are assigned to an arbitrary cluster of machines, and all their stuff is transferred to the DB for that cluster. The player has no idea which cluster they were connected to and they don't have names. There are a couple of problems with this: &lt;br /&gt;1) you can't do this if you want the state of your world around the player to be dynamic and persist; the stuff you throw on the ground; whether that bridge or city has been burned. A player would be confused if each time they logged in, the dynamic state of the world was different. This isn't all that common these days, however. Interesting problem, though.&lt;br /&gt;2) the player might not be able to find their friends. "Hey I'm beside the fountain that looks like a banana-slug. Yeah, me too. Well, I can't see you!"&lt;br /&gt;You might be able to deal with this by automatically reconnecting and transferring the friends to the same cluster and DB, but that gets tricky. In the worst case, you might wind up transferring *everyone* to the same cluster if they are all friendly.&lt;br /&gt;&lt;br /&gt;Another approach provides horizontal scalability and is one that doesn't assume anything about how you shard your world, do dungeon instancing, what DB engine you use, or many other complications. That is a nice property, and makes the DB system loosely coupled, and useful across a broad spectrum of large scale persistent online games.&lt;br /&gt;&lt;br /&gt;What dynamic data are you persisting anyway? The stuff you want to come back after a power-failure. Most games these days only care about the state of the player, and their stuff. They may have multiple characters, lots of data properties, inventory of items, skills learned, quests in progress, friend relationships, ... If you sort through all your data, you'll find that you have "tuning" or design data that is pretty much static between releases. And you have data that is "owned" by a player.&lt;br /&gt;&lt;br /&gt;To a DB programmer, that fact means that the bulk of the data can be indexed by the player_id for at least part of its primary key. So here is the obvious trick:&lt;br /&gt;Put all data that belongs to a given player into a single DB. It doesn't matter which DB or how many there are. You have thousands or millions of players. Spreading them horizontally across a large number of DB instances is trivial. You could use modulus (even player_ids to the left, odd to the right). Better would be to put the first 100,000 players in the first DB, then the next 100,000 into a second. As your game gets more successful, you add new DB instances.&lt;br /&gt;&lt;br /&gt;A couple of simple problems to solve:&lt;br /&gt;1) you have to find the right DB. Given that any interaction a player has is with a game server (not the DB directly), your server code can compute which DB to consult based on the player_id it is currently processing. Once it decides, it will use an existing connection to the correct DB. (It is hard to imagine a situation where you would need to maintain connections to 100 DB instances, but that isn't really a very large number of file descriptors in any case.)&lt;br /&gt;2) If players interact and exchange items, or perform some sort of "transaction", you have to co-persist both sides of the transaction, and the two players might be on different DB instances. It is easy to solve the transactional exchange of items using an escrow system. A third party "manager" takes ownership of the articles in the trade from both parties. Only when that step is complete, will the escrow object give the articles back to the other parties. The escrow object is persisted as necessary, and can pick up the transaction after a failure. The performance of this system is not great. But this kind of interaction should be rare. You could do a lot of this sort of trade through an auction house, or in-game email where ownership of an item is removed from a player and their DB and transferred to a whole different system.&lt;br /&gt;3) High-speed exchange of stuff like hit points, or buffs, doesn't seem like the kind of thing players would care about if there was a catastrophic server failure. They care about whether they still have the sword-of-uberness, but not whether they are at full health after a server restart.&lt;br /&gt;&lt;br /&gt;Some people might consider functional-decomposition to get better DB performance. E.g. split the DB's by their function: eCommerce, inventory, player state, quest state, ... But that only gets you maybe 10 or 12 instances. And the inventory DB will have half the load, making 90% of the rest of the hardware a waste of money. On the other hand, splitting the DB with data-decomposition (across player_id), you get great parallelism, and scale up the cost of your hardware linearly to the number of players that are playing. And paying.&lt;br /&gt;&lt;br /&gt;Another cool thing about this approach is that you don't have to use expensive DB tech, nor expensive DB hardware. You don't have to do fancy master-master replication. Make use of the application knowledge  that player to player interaction is relatively rare, so you don't need transactionality on every request. Avoid that hard problem. It costs money and time to solve.&lt;br /&gt;&lt;br /&gt;There is a phrase I heard from a great mentor thirty years ago: "embarrassingly parallel". You have an incredible number of players attached, and most actions that need to be persisted are entirely independent. Embarrassing, isn't it?&lt;br /&gt;&lt;br /&gt;Now your only problem is how big to make the rest of the cluster around this monster DB. Where is the next bottleneck? I'll venture to say it is the game design. How many players do you really want all stuffed into one back alley or ale house? And how much content can your team produce? If you admit that you have a fixed volume of content, and a maximum playable density of players, what then?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-4509752759874613485?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/4509752759874613485/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2010/10/architect-it-for-horizontal-db.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/4509752759874613485'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/4509752759874613485'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2010/10/architect-it-for-horizontal-db.html' title='Architect it for Horizontal DB Scalability'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-7187906276356277975</id><published>2010-10-21T13:16:00.000-07:00</published><updated>2010-10-21T13:16:21.924-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Scalability'/><category scheme='http://www.blogger.com/atom/ns#' term='Software Engineering'/><category scheme='http://www.blogger.com/atom/ns#' term='Networking'/><title type='text'>A lightweight MMO using web tech</title><content type='html'>I've been really head's down on &lt;a href="http://www.heroup.com/" linkindex="101"&gt;Super Hero Squad Online&lt;/a&gt;. It has been getting really great reviews, and is very fun to play. Beta is coming up, and we are expecting to go live early next year.&lt;br /&gt;&lt;br /&gt;So I thought I'd try to cut some time free to record my thoughts about its architecture. But not today. We are getting ready for an internal release tomorrow.&lt;br /&gt;&lt;br /&gt;But here are some topics I'd like to pursue:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;scaling across multiple data centers&lt;/li&gt;&lt;li&gt;horizontal DB scalability&lt;/li&gt;&lt;li&gt;an approach that avoids back end race conditions when handing off a client between front end boxes&lt;/li&gt;&lt;li&gt;big load testing&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-7187906276356277975?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/7187906276356277975/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2010/10/lightweight-mmo-using-web-tech.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/7187906276356277975'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/7187906276356277975'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2010/10/lightweight-mmo-using-web-tech.html' title='A lightweight MMO using web tech'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-5368641062241015595</id><published>2010-02-14T17:08:00.000-08:00</published><updated>2010-02-14T17:09:43.730-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Industry'/><category scheme='http://www.blogger.com/atom/ns#' term='Scalability'/><category scheme='http://www.blogger.com/atom/ns#' term='Content Development'/><title type='text'>Not seamless and not fun</title><content type='html'>A lot of the techniques I've discussed have been intended to address the difficult challenges in creating seamless worlds: interest management, load balancing, coordinate systems, consistency, authoritative servers,... But I want to take a moment to say: just because you can, and perhaps have solved these hard technical problems doesn't make you successful. You can avoid the problems by making other game design choices (compromises?) like instancing.&lt;br /&gt;&lt;br /&gt;In fact, something like instancing may be inevitable in your game, if you want it to be fun. For example, a fun quest might require no one else be able to spoil things by stealing kills or your hard won loot.&lt;br /&gt;&lt;br /&gt;And even if you have a nice seamless world, and you've squeezed out difficult latency issues in the seamless areas, you can still screw up the game so badly people drop out. I played DDO the other day. Turbine is famous for having a pretty good architecture. But boy, is it a pain to play that game. Line up on the crate; hit the crate; walk forward; click on the coins; repeat; repeat; repeat. Line up on a monster, press the mouse and hold it, and hold it, and hold it. Wander around in circles in a dungeon and get lost. Run from one corner of the dungeon to the other pulling levers. Do a quest, report back to the same dude. Gah! Didn't they ask a newb if all that was fun? No wonder people are micro-trans-ing their way up levels.&lt;br /&gt;&lt;br /&gt;The newb experience requires repeated entering of instances. It happens two or three times in the first 15 minutes. And it takes 20 seconds or so to load (I'm trying not to exaggerate). But 20 seconds in an "action" game is forever. Ok. Your server architecture and your game design require instances. Fine. But what did the newbs say about how fun the experience was. I, for one didn't like it. Did the developers sit around and say: well, tough, that's the way it has to work. Maybe. But you *can* fix the experience.&lt;br /&gt;&lt;br /&gt;Have auto-loot turned on by default. Don't hide the loot in 50 crates; give out more on a kill or a quest completion. Pre-load the instances in the background whenever the player gets near the entrance. It doesn't have to be seamless, but at least *try* to make it fun. Or less teeth grating-ly aggravating. Sorry, I'm not patient enough for it to "start getting fun after level 20". Even if it is free.&lt;br /&gt;&lt;br /&gt;--end first rant--&lt;br /&gt;&lt;br /&gt;I heard someone the other day say "hey we can't give away the good stuff, the players should have to earn it (or pay for it)." Why? Give the newbs the good stuff right away. Let them get hooked. I think it was Warhammer 40k where the first mission let you play some of the best vehicles and blow a lot of stuff to scrap. Pretty fun. Then the story line took it away. But at least you knew it was going to be a fun game. After only 5 minutes.&lt;br /&gt;&lt;br /&gt;Let's get creative. Just because there appear to be insurmountable limitations (instancing, giving away the good content...) doesn't mean they really *are* insurmountable.&lt;br /&gt;&lt;br /&gt;Good luck out there.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-5368641062241015595?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/5368641062241015595/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2010/02/not-seamless-and-not-fun.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/5368641062241015595'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/5368641062241015595'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2010/02/not-seamless-and-not-fun.html' title='Not seamless and not fun'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-3741090434741495360</id><published>2010-01-18T16:39:00.000-08:00</published><updated>2010-01-18T20:04:16.437-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Load balancing'/><category scheme='http://www.blogger.com/atom/ns#' term='Scalability'/><title type='text'>Dynamic Load Balancing a Large Scale Online Game</title><content type='html'>Why bother designing your server to support dynamic load balancing? You can load test, measure and come up with a static load balance at some point before going live, or periodically when live. But...&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Your measurements and estimates will be wrong. Be honest, the load you simulated was at best an educated guess. Even a live Beta is not going to fully represent a real live situation.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Hardware specs change. It takes time to finish development, and who knows what hardware is going to be the most cost effective by the time you are done. You definitely don't want to have to change code to decompose your system a different way just because of that.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Your operations or data center may impose something unexpected, or may not have everything available that you asked for. You might think "throw more hardware at the problem". But if they are doing their jobs, they won't let you. And if you are being honest with yourself, you know that probably wouldn't have worked anyway.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Hardware fails. You may lose a couple of machines and not be able to replace them immediately. Even if you shut down, reconfigure, and restart a shard, the change to the load balance must be trivial and quick. The easiest way is to have the system itself adjust.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Your players are going to do many unexpected things. Like all rush toward one interesting location in the game. Maybe the designers choose to do this on purpose using a holiday event. Maybe they would really appreciate if your system could stand up to such a thing so they *could* please the players with such a thing.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;The load changes in interesting sine waves. Late at night and during weekdays, the load will be substantially less than at peak times. That is a lot of hardware just idling. If your system can automatically migrate load to fewer machines, and give back leased machines (e.g. emergency overload hardware) you might be able to cut a deal with your hosting service to save some money. Anybody know whether the "cloud" services support this? What if you are supporting multiple titles whose load profiles are offset. You could reallocate machines from one to another dynamically.&lt;/li&gt;&lt;li&gt;Early shards tend to have quite a lot higher populations than newly opened ones. As incentives to transfer to new shards start having effect, hardware could be transferred so that responsiveness can remain constant while population and density changes.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;The design is going to change. Both during development and as a result of tuning, patches and expansions. If you want to let the designers make the game as fun (and successful) as possible, you don't want to give them too many restrictions.&lt;/li&gt;&lt;li&gt;It may be painful to think about but your game is going to eventually wind down. There is a long tail of committed players that will stay, but the population will drop. If you can jettison unneeded hardware, you can save money and make that time more profitable. (And you should encourage your designers to support merging of shards.) I am convinced that there is a lot of money left on the table by games that prematurely close their doors.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;So what can you you actually "balance"? You can't decompose a process, reallocate objects, computation and data structures between processes. Not without reprogramming. Or programming it in from the beginning. So load balancing entire processes is not likely to cut it.&lt;br /&gt;&lt;br /&gt;The best way to do that is design for parallelism. Functional parallelism only goes so far. E.g. if you have only one process that deals with all banking, you can't split it when there is a run on the bank.&lt;br /&gt;&lt;br /&gt;So what kinds of things are heavy users of resources? What things are relatively easy to decompose into lots of small bits (then recombine into sensibly sized chunks using load balancing)?&lt;br /&gt;&lt;br /&gt;Here are some ideas:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Entities. Using interest management (discussed in other blog entries), an Entity can be located in any simulator process on any host. There are communication overheads to consider, but those are within the data center. If you are creative, many of the features that compose a server can be represented as an Entity, even though we often limit our thinking to them as game Entities. E.g. a quest, a quest party/group, a guild, an email, a zone, the weather, game and system metrics, ... And of course, characters, monsters, and loot. The benefit of making more things an Entity is that you can use the same development tools, DB representation, execution environment, scripting/behavior system and SDK, ... And of course, load balancing. There are usually a very large number of Entities in a game, making it pretty easy to find a good balance (e.g. bin-packing). Picking up an Entity is often a simple matter of grabbing a copy of its Properties (assuming you've designed your Entities like this to begin with; with load balancing in mind). This can be fast because Entities tend to be small. Another thing I like about Entity migration is that there are lots of times when an Entity goes idle, making it easy to migrate without a player being affected at all. Larger "units" of decomposition are likely to never be dormant, so when a migration occurs, players feel a lag.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Zones. This is a pretty common approach, often with a number of zones allocated to a single process on a host. As load passes a threshold, the zone is restarted on another simulator on another machine. This is a bigger chunk of migration than an Entity, and doesn't allow for an overload within one zone. The designers have to add game play mechanisms to discourage too much crowding together. The zone size has to be chosen appropriately ahead of time. Hopefully load-balancing-zone is not the same as game-play-zone, or the content team will really hate you. Can you imagine asking them to redesign and lay out a zone because there was a server overload?&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Modules. You will decompose your system design into modules, systems, or functional units. Making the computation of each of these be able to be mapped requires little extra work. Although there are usually a limited number of systems (functional parallelism), and there is almost always a "hog" (See &lt;a href="http://en.wikipedia.org/wiki/Amdahl%27s_law"&gt;Amdahl's law&lt;/a&gt;). Extracting a Module and moving it requires quite a bit more unwiring than an Entity. Not my first choice. But you might rely on your fault tolerance system and just shut something down in one place, and have it restart elsewhere.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Processes. You may be in a position where your system cannot easily have chunks broken off and compiled into another process. In this case, only whole processes can be migrated (assuming they do not share memory or files). Process migration is pretty complicated and slow, given how much memory is involved. Again, your fault tolerance mechanism might help you. If you have enough processes that you can load balance by moving them around, you may also have a lot of overhead from things like messages crossing process boundaries (usually via system calls).&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Virtual Machines. Modern data centers provide (for a price) the ability to re-host a virtual machine, even on the fly. Has anyone tested what the latency of this is? Seems like a lot of data to transmit. The benefit of this kind of thinking is that you can configure your shard in the lab without knowing how many machines you are going to have, and run multiple VM on a single box. But you can't run a single VM on multiple boxes. So you have that tradeoff of too many giving high overhead, and too few giving poor balancing options.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;Remember, these things are different from one another:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Decomposition for good software engineering.&lt;/li&gt;&lt;li&gt;Decomposition for good parallel performance.&lt;/li&gt;&lt;li&gt;Initial static load balancing.&lt;/li&gt;&lt;li&gt;Creating new work on the currently least loaded machine.&lt;/li&gt;&lt;li&gt;Dynamically migrating work.&lt;/li&gt;&lt;li&gt;Dynamically migrating work from busy to less loaded machines.&lt;/li&gt;&lt;li&gt;Doing it efficiently, and quickly (without lags)&lt;/li&gt;&lt;li&gt;And having it be effective.&lt;/li&gt;&lt;/ul&gt;I think balancing load is a hard enough problem that it can't really be predicted and "solved" ahead of time. So I like to give myself as much flexibility ahead of time, and good tools. Even if you don't realize full dynamic migration at first, at least don't box yourself into a corner that requires rearchitecting.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-3741090434741495360?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/3741090434741495360/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2010/01/dynamic-load-balancing-large-scale.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/3741090434741495360'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/3741090434741495360'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2010/01/dynamic-load-balancing-large-scale.html' title='Dynamic Load Balancing a Large Scale Online Game'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-6708856680457065041</id><published>2009-12-20T10:18:00.000-08:00</published><updated>2009-12-20T13:30:15.089-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Industry'/><category scheme='http://www.blogger.com/atom/ns#' term='Off topic'/><title type='text'>(Real) Science IS political, sorry to disillusion you</title><content type='html'>There is the big "climate gate" hoo-haw in the media right now. Reporters are acting surprised that some leading scientists were caught manipulating scientific literature to silence skeptics and dissenters. They convinced peers to knock skeptics' articles, journals to reject skeptics' papers, remove peers from paper review committees if they passed skeptics' papers, and even shut down journals that published dissenting views. Maybe even fudged their data.&lt;br /&gt;&lt;br /&gt;&lt;irony&gt; Hey! Science is supposed to be awesome. It always eventually gets it right. Real science is about reproducible experiments and validated results. &lt;/irony&gt; Actually, no, it is political. Like most other human endeavors.&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Galileo and other historical scientists were shut down by their community. Granted, their scientific peers were not basing their views on empirical data. But many modern scientists still base their views on what they've been taught, not what they've measured. This is understandable, you have to stand on the shoulders of giants to have time to make advances.&lt;/li&gt;&lt;li&gt;Views are often validated by currently understood standards. If you plot the published speed of light against the year of publication, you will see sequences of flat spots where a value is almost identical to what was previously "measured". And then there is a jump; followed by another flat period of many years. Is this because they were using the same equipment and experimental procedure? Or because anyone that tried to publish a different "answer" was considered a skeptic and shut down? Again, this is understandable. Humans tend to try to be consistent, and not be antisocial and go against the crowd. It requires extra diligence to disprove a well respected master in their field. Like the great Einstein when quantum physics popped up. (Wait! Maybe God changes the real speed of light periodically as a joke!)&lt;/li&gt;&lt;li&gt;Scientists are funded. Based on whether they get published or referenced. Or if they agree with the "sponsoring" corporation. So the system manipulates them into agreeing with the crowd. But the underbelly of the scientific community is less pretty than what we might have thought of as this kind of indirect pressure. &lt;/li&gt;&lt;li&gt;Journals are funded. If the larger community of scientists don't subscribe to that journal offers, or schools or corporations pull their funding (or advertising!), a venue of dissent/questioning dies.&lt;/li&gt;&lt;li&gt;I've seen public "attacks" during a presentation of research. In the form of a "question". Being honest, these questions are self-aggrandizing. E.g. "what makes you think that you are right when I have already published the opposite". It can embarrass a young scientist and discourage them from disagreeing in the future. Only the thick-skinned "crazies" keep at it. Like Tesla.&lt;/li&gt;&lt;li&gt;I've been on paper review committees where papers are summarily discarded. There are so many that only one or two of the reviewers are assigned to read a given paper before the meeting. If they didn't understand it, or disagreed with the findings (based on their own experience/bias), it can get tossed very quickly. There are a lot to get through. Even when it is "marginal", the shepherding process can be taxing, discouraging a reviewer from volunteering. After all, they are contributing their expertise, but don't get their name on the paper. (Suggestion: maybe they should. If they pass something that proves incorrect, they lose points. As an incentive to get it right. Or would that discourage participation?). For "workshops" (not full Journals) an author's name is sometimes on the paper being reviewed, so their reputation is considered.&lt;/li&gt;&lt;li&gt;As science get more "fine", some experiments cannot be reproduced except on the original equipment or by the original experts. Think CERN and supercolliders.  (Or cold fusion?)  Who has another billion dollars just to *reproduce* a result? Unless the larger community thinks a result is hogwash and feels motivated to pool their resources and dump on the results. So who is going to disagree? It almost sounds like a religion at that point.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;I bet you've done the same things. Maybe at work. Tried to shut down "the competition". Competing for attention, or a raise, or recognition... You know what would be better? Listen to those that disagree and make sure they know you have heard them. They are trying to make the best decisions they can given their background. No one tries to make dumb decisions. If they are wrong, I'm sure they would appreciate learning something they don't know. Or, maybe they have something to teach you.&lt;br /&gt;&lt;br /&gt;Imagine! Learning something from someone that disagrees with you and you find irritating. A dissenter. A skeptic. Seems like those that shut down dissent are not just closed minded, but unwilling to learn. Such a scientist should be embarrassed for themselves. Isn't IDEAL science supposed to be about discovery? Too bad that in reality there is so little ideal science, and so much science influenced by the politics of "real-life science".&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-6708856680457065041?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/6708856680457065041/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/12/real-science-is-political-sorry-to.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/6708856680457065041'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/6708856680457065041'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/12/real-science-is-political-sorry-to.html' title='(Real) Science IS political, sorry to disillusion you'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-259516607971783210</id><published>2009-12-08T13:21:00.000-08:00</published><updated>2009-12-08T15:46:30.132-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Content Development'/><category scheme='http://www.blogger.com/atom/ns#' term='Software Engineering'/><title type='text'>Data Driven Entities and Rapid Iteration</title><content type='html'>It is clearly more difficult to develop content for an online game than a single player game. (For one, sometime entities you want to interact with aren't in your process). So starting with the right techniques and philosophies is critical. Then you need to add tools, a little magic and shake.&lt;br /&gt;&lt;br /&gt;There are several hard problems you hit when developing and debugging an online game:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Getting the game to fail at all. A lot of times bugs are timing related. Of course, once you ship it, to players it will seem like it happens all the time.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Getting the same failure to happen twice is really hard. E.g. if the problem is caused by multiplayer interaction, how are you going to get all players or testers to redo exactly the same thing? And in the spirit of Hiesenbugs, if you attach a debugger or add logging, good luck getting it to fail under those new conditions&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Testing a fix is really hard, because you want to get the big distributed system back into the original state and test your fix. Did you happen to snapshot that state?&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Starting up a game can take a long time (content loading). Starting an online game takes even longer because it also includes deployment to multiple machines, remote restarting, loading some entities from a DB, logging in, ...&lt;br /&gt;&lt;/li&gt;&lt;li&gt;If you are a novice content developer plagued by such a bug or a guy in QA trying to create repro steps to go along with the bug report, it will probably end badly.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;Consequently, what do you need to do to make things palatable?&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Don't recompile your application after a "change". Doing that leads (on multiple machines) to shutdown, deploy, restart, "rewind" to the failure point. You'd like to have edit and continue of some sort. To do that, almost certainly you'd need a scripted language (or at least one that does just in time compilation, and understands edit and continue).&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Don't even restart your application. Even if you can avoid recompilation, it can take a loooong time to load up all the game assets. Especially early in production, your pipeline may not support optimized assets (e.g. packed files). For a persistent world, there can be an awful lot of entities stored in the database to load and recreate. Especially if you are working against a live or a snapshot of a live shard. At the very least, only load the assets you need.&lt;/li&gt;&lt;li&gt;Yes, I'm talking about rapid iteration and hot loading. When you can limit most changes to data, there is no reason you can't leave everything running, and just load the assets that changed. In some cases when things change enough you might have to "reset" by clearing all assets from memory and reloading, but at least you didn't have to restart the server&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Rapid iteration is particularly fun on consoles, which often have painfully long deployment steps. Bear in mind that you don't have an editor in-game on a console, it is too clumsy. So the content you see in your editor on your workstation is just a "preview". You would swivel-chair to the console to see what it really looks like on-target.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Try to make "everything" data driven. For example, you can specify the properties and behaviors of your entities in a tool and use a "bag of properties" kind of Entity in most cases. After things have settled down, you can optimize the most used ones, but during content development, there is a huge win to making things data-driven. Of course, there is nothing stopping you from doing both at once.&lt;/li&gt;&lt;li&gt;Another benefit of a data-driven definition of an Entity is that it is so much easier to extract information needed to automatically create a database schema. Wouldn't you rather build a tool to do this than to teach your content developers how to write SQL?&lt;/li&gt;&lt;/ul&gt;Don't forget that most of the time and money in game development pours through the content editor GUIs. The more efficient you can make that, the more great content will appear. If you want to hire the best content developers, make your content development environment better than anyone else's.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-259516607971783210?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/259516607971783210/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/12/data-driven-entities-and-rapid.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/259516607971783210'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/259516607971783210'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/12/data-driven-entities-and-rapid.html' title='Data Driven Entities and Rapid Iteration'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-2083489824424708891</id><published>2009-11-19T13:00:00.000-08:00</published><updated>2009-12-08T15:23:50.534-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Industry'/><category scheme='http://www.blogger.com/atom/ns#' term='Software Engineering'/><title type='text'>That's not an MMO, its a...</title><content type='html'>Some people call almost anything "big" an MMO. Is facebook an MMO? Is a chess ladder an MMO? Is Pogo? Not to me.&lt;br /&gt;&lt;br /&gt;What about iMob and all the art-swapped stuff by The Godfather on the iPhone? You have an account with one character. Your level/score persists, money, and which buttons you've press on the "quest" screen. As much as they want to call that an MMO, it is something else. Is it an RPG? Well, there is a character. But you don't even get to see him. Or are you a her?&lt;br /&gt;&lt;br /&gt;These super-light-weight online games are not technically challenging. You can build one out of web-tech or some other transaction server. If you are all into optimizing a problem that scalable web services solved years ago, cool. Your company probably even makes money. But it doesn't push the envelope. Someone is going to eat your lunch.&lt;br /&gt;&lt;br /&gt;Maybe I should have called this blog "Interesting Online Game Technologies".&lt;br /&gt;&lt;br /&gt;Me? I want to build systems that let studios build the "hard" MMO's, like large seamless worlds. I don't want a landscape that only has WoW. If that tech were already available, we'd be seeing much lower budgets, more experimentation, games that are more fun, lower subscription fees, more diversity, and better content. All at the same time. I certainly don't want to build the underlying infrastructure over and over.&lt;br /&gt;&lt;br /&gt;Of course, I'd love it if the tech solved all the &lt;a href="http://onlinegametechniques.blogspot.com/search?q=hard+problems"&gt;hard problems&lt;/a&gt; so my team could build something truly advanced while still landing the funding. Unfortunately, today, people &lt;a href="http://www.eldergame.com/2009/04/smartfoxserver-the-mmo-engine-for-indies/"&gt;have to compromise&lt;/a&gt;. But maybe not for long.&lt;br /&gt;&lt;br /&gt;&lt;div style="margin-top: 10px; height: 15px;" class="zemanta-pixie"&gt;&lt;a class="zemanta-pixie-a" href="http://reblog.zemanta.com/zemified/3b1f3e00-95da-4926-aa66-68165fc34958/" title="Reblog this post [with Zemanta]"&gt;&lt;img style="border: medium none ; float: right;" class="zemanta-pixie-img" src="http://img.zemanta.com/reblog_e.png?x-id=3b1f3e00-95da-4926-aa66-68165fc34958" alt="Reblog this post [with Zemanta]" /&gt;&lt;/a&gt;&lt;span class="zem-script more-related pretty-attribution"&gt;&lt;script type="text/javascript" src="http://static.zemanta.com/readside/loader.js" defer="defer"&gt;&lt;/script&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-2083489824424708891?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/2083489824424708891/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/11/thats-not-mmo-its.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/2083489824424708891'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/2083489824424708891'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/11/thats-not-mmo-its.html' title='That&apos;s not an MMO, its a...'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-5574602859052417858</id><published>2009-08-21T08:10:00.000-07:00</published><updated>2009-08-21T08:29:58.333-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Processing models'/><category scheme='http://www.blogger.com/atom/ns#' term='Software Engineering'/><title type='text'>What is software architecture (in 15 pages or less)?</title><content type='html'>Kruchten's "4+1 views of software archicture" is one of my favorite papers of all time. It shows four views of software architecture (logical/functional, process, implementation/development, and physical). The plus one is use-cases/scenarios.&lt;br /&gt;&lt;br /&gt;Being 15 pages long, it is an incredibly efficient use of your time if you want to disentangle the different aspects of designing complex systems. And it gives you terminology for explaining what view of the system you mean, and which you have temporarily abstracted away.&lt;br /&gt;&lt;br /&gt;Don't get hung up on the UML terminology:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Logical is what is commonly refered to as a bubble diagram. What components are in your system and what are they responsible for?&lt;/li&gt;&lt;li&gt;Process is which OS processes/threads will be running and how they communicate.&lt;/li&gt;&lt;li&gt;Implementation is the files and libraries you used to build the system.&lt;/li&gt;&lt;li&gt;Physical is your hardware and where you decided to map your processes.&lt;/li&gt;&lt;/ul&gt; &lt;a href="http://en.wikipedia.org/wiki/4%2B1_Architectural_View_Model"&gt;http://en.wikipedia.org/wiki/4%2B1_Architectural_View_Model&lt;/a&gt;&lt;br /&gt;I prefer the original paper: &lt;a href="http://www.cs.ubc.ca/%7Egregor/teaching/papers/4+1view-architecture.pdf"&gt;http://www.cs.ubc.ca/~gregor/teaching/papers/4+1view-architecture.pdf&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-5574602859052417858?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/5574602859052417858/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/08/what-is-software-architecture-in-15.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/5574602859052417858'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/5574602859052417858'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/08/what-is-software-architecture-in-15.html' title='What is software architecture (in 15 pages or less)?'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-7941199051398999276</id><published>2009-08-10T15:25:00.000-07:00</published><updated>2009-08-10T16:25:01.720-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Security'/><category scheme='http://www.blogger.com/atom/ns#' term='Fault Tolerance'/><category scheme='http://www.blogger.com/atom/ns#' term='Networking'/><title type='text'>Fail over is actually kind of hard (and expensive)</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://media.photobucket.com/image/redundancy%20you%20can%20never%20be%20too%20sure/Akumaka/redundancy.jpg"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer; width: 296px; height: 234px;" src="http://i109.photobucket.com/albums/n58/Akumaka/redundancy.jpg" alt="" border="0" /&gt;&lt;/a&gt;You are building a peer to peer or peered server small scale online game. There are players that purposely disconnect as soon as someone start beating them. They think their meta score will be better, or whatever. Or maybe they have a crappy internet connection. In any case, the master simulator/session host goes down abruptly; now what?&lt;br /&gt;&lt;br /&gt;At least you would want to keep playing with the guys you've matched with; the rest of your clan or whatever. At best, you'd like to keep playing from exactly the point of the "failure" without noticing &lt;span style="font-style: italic;"&gt;any&lt;/span&gt; hiccup (good luck). The steps to get session host migration/fail over to a new simulator:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Restablish network interconnection&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Pick a new master simulator&lt;/li&gt;&lt;li&gt;Coordinate with the Live Service about who is the master (or do this first)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Have the entity data already there, or somehow gather it&lt;/li&gt;&lt;li&gt;Convert replicated entities in to real owned/master entities&lt;/li&gt;&lt;li&gt;Reestablish interest subscriptions, or whatever distributed configuration settings are needed&lt;/li&gt;&lt;li&gt;"Unpause"&lt;/li&gt;&lt;/ul&gt;Some challenges:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;To reconnect, someone needs the complete list of IP/ports used by the players. But that is consider a security issue. E.g. someone could use that info to DOS attack an opponent. Let's assume the Live Service renegotiates and handshakes your way back into business.&lt;/li&gt;&lt;li&gt;If you aren't connected, how do you elect a new master? If you don't have a master yet, how would all the clients (in a strict client/server network topology) know who to connect to? So the answer has to be precomputed. E.g. designate a backup simulator before there is a fault (maybe the earliest joiner, lowest ip address...)&lt;/li&gt;&lt;li&gt;If your game session service supports this, it can solve both of the previous issues by exposing IP addresses only to the master simulator, and since it has a fixed network address, each client can always make a connection to it and be told who is the new master.&lt;/li&gt;&lt;li&gt;If the authoritative data is lost on the fault, you may as well restart the level, go back to the lobby or whatever. So instead, you have to send the entity state data to the backup simulator(s) as you go. This is actually &lt;span style="font-style: italic;"&gt;more&lt;/span&gt; data than is necessary to exchange for an online game that is not fault tolerant, since you'd have to send hidden and possibly temporary data for master Entities. Otherwise you couldn't completely reconstruct the dead Entities.  There may be data that only existed on the master, so gathering it from the remaining clients isn't going to be a great solution. Spreading the responsibility is that much more complicated.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;So the backup master starts converting replicated Entities into authoritative Entities. Any Entities it didn't know about couldn't get recreated, so the backup master has to have a full set of Entities. Think about the bandwidth of that. You should &lt;span style="font-style: italic;"&gt;really&lt;/span&gt; want this feature before just building it. Now we hit a hard problem. If the Entities being recreated had in-flight Behaviors (e.g. you were using coroutines to model behavior), they can't be reconstructed. It is prohibitively expensive to continuously replicate the Behavior execution context. So you wind up "resetting" the Entities, and hoping their OnRecreate behavior can get it running again. You may have a self-driven Behavior that reschedules itself periodically. Something has to restart that sequence. Another thing to worry about: did the backup simulator have a truly-consistent image of the entity states, or was anything missing or out of order? At best this is an approximation of the state on the original session host.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Unless you are broadcasting all state everywhere, you are going to have to redo interest management subscriptions to realize bandwidth limitation. This is like a whole bunch of late-joining clients coming in. They would get a new copy of each entity state. Big flurry of messages, especially if you do this naively.&lt;/li&gt;&lt;li&gt;Now you are ready to go. Notify the players, give them a count-down...FIRE!&lt;/li&gt;&lt;/ul&gt;What did we forget? What defines "dropped out"; a maximum unresponsiveness time? What if the "dead" simulator comes back right then? What if the network "partitioned"? Would you restart &lt;span style="font-style: italic;"&gt;two&lt;/span&gt; replacement sessions? Do you deal with simultaneous dropouts (have you ever logged out when the server went down? I have.)?&lt;br /&gt;&lt;br /&gt;Note that the problem gets a &lt;span style="font-style: italic;"&gt;lot&lt;/span&gt; easier if all you support is clean handoff from a &lt;span style="font-style: italic;"&gt;running &lt;/span&gt;master to the new master. Would that be good enough for your game.&lt;br /&gt;&lt;br /&gt;So is it worth the complexity, the continuous extra bandwidth and load on the backup simulator? Just to get an approximate recreation? With enough work, and game design tweaking, you could probably get something acceptable. Maybe give everyone a flash-bang to mask any error.&lt;br /&gt;&lt;br /&gt;Or maybe you just reset the level, or go back to the lobby to vote on the next map. And put the bad player on your ban list.&lt;br /&gt;&lt;br /&gt;Me? I'd probably instead invest the time of my network and simulator guys in something else, like smoothness, fun gameplay, voice, performance. Or ship earlier.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-7941199051398999276?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/7941199051398999276/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/08/fail-over-is-actually-kind-of-hard-and.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/7941199051398999276'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/7941199051398999276'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/08/fail-over-is-actually-kind-of-hard-and.html' title='Fail over is actually kind of hard (and expensive)'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-5950254419127017706</id><published>2009-07-30T07:25:00.000-07:00</published><updated>2009-07-30T07:41:25.336-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Industry'/><category scheme='http://www.blogger.com/atom/ns#' term='Software Engineering'/><title type='text'>Incremental release vs. sequel</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://media.photobucket.com/image/dizzying%20intellect/daksin/IMG/3476588511_36edb6c004.jpg?o=2"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer; width: 309px; height: 166px;" src="http://i60.photobucket.com/albums/h27/daksin/IMG/3476588511_36edb6c004.jpg" alt="" border="0" /&gt;&lt;/a&gt;(Warning: plenty of irony below, as usual)&lt;br /&gt;&lt;br /&gt;There was a time when the MMO developer community thought that the ideal was to stand up your world, and then start feeding the dragon. As quickly as possible, get new content into the players hands. The more new content, the more fascinated they would be, the stickier their subscriptions would be and the more money you would make.&lt;br /&gt;&lt;br /&gt;So we put a lot of effort into techniques to manage continuous development of content, test it, and roll it out with the minimum possible maintenance window. Some got good enough they could release content or patches every week. Didn't we have automated client patchers? Why not use those to continuously deliver content. Not just streaming content as you move around in virtual space, but as you move forward in real time.&lt;br /&gt;&lt;br /&gt;Then someone noticed that Walmart took their game box off the shelf because it had been there for a year, and new titles were showing up. Surely consumers want the new stuff more? Besides, you don't need the latest release, you get all the new stuff when you patch. Then new subscriptions drop because of that lack of visibility at retail. Why would Gamestop sell prepaid cards if it is so easy to pay online?&lt;br /&gt;&lt;br /&gt;So the light goes bing, and it is suddenly obvious that sequels would be a much better approach, since you'll get shelf space if it is a new SKU. Clearly this is the best approach, since it works so well for Blizzard. (So clearly, I cannot choose the glass in front of you. Where was I?) All you have to do is patch a few bugs, and set up a parallel team to work on the sequel.&lt;br /&gt;&lt;br /&gt;But then everyone piles onto Steam. Definitely the end of brick and mortar. Maybe we go back to the low-latency content pipeline so our game is fresher than the sequel-only guys. But wait, Steam sales and free trials increase traffic at Gamestop.&lt;br /&gt;&lt;br /&gt;Clearly, I cannot choose the glass in front of me... Wait til I get started.&lt;br /&gt;&lt;br /&gt;Not really. As you can see, the point is that technologists are very unlikely to see the future of sales and distribution mechanisms. And if we did, it would take a year to adapt our development practices and product design to take optimal advantage of it.&lt;br /&gt;&lt;br /&gt;The answer? Be flexible. Don't assume you've got the one and only magic bullet. Requirements change. And for an MMO the development timespan is large enough that a lot of things will change before you are finished. Don't implement your company into a corner with a big complex optimal single point solution, and keep your mind open.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-5950254419127017706?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/5950254419127017706/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/07/incremental-release-vs-sequel.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/5950254419127017706'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/5950254419127017706'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/07/incremental-release-vs-sequel.html' title='Incremental release vs. sequel'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i60.photobucket.com/albums/h27/daksin/IMG/th_3476588511_36edb6c004.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-3048624050046358328</id><published>2009-07-13T11:55:00.001-07:00</published><updated>2009-07-13T15:33:16.092-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Processing models'/><category scheme='http://www.blogger.com/atom/ns#' term='Scalability'/><category scheme='http://www.blogger.com/atom/ns#' term='Networking'/><title type='text'>Web tech for "game services"</title><content type='html'>I hold the opinion that every disagreement is a matter of different axioms, values or definitions.&lt;br /&gt;&lt;br /&gt;I believe definitions is what is going on with this post by "Kressilac" (Derek Licciardi?):&lt;br /&gt;&lt;a href="http://blogs.elysianonline.com/blogs/derek/archive/2009/05/29/6400.aspx"&gt;http://blogs.elysianonline.com/blogs/derek/archive/2009/05/29/6400.aspx &lt;/a&gt;I'd guess we do hold the same values.&lt;br /&gt;&lt;br /&gt;Derek argues that portions of an MMO server are suited to using and best implemented using web technology. I absolutely agree. I call these parts of the system "Game Services". Most would be accessed directly from the client. Examples:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;profanity filtering,  &lt;/li&gt;&lt;li&gt; shard status, open, full, down, locked, capped &lt;/li&gt;&lt;li&gt; in game search/player online,  &lt;/li&gt;&lt;li&gt; clan/guild management,  &lt;/li&gt;&lt;li&gt; item trade,  &lt;/li&gt;&lt;li&gt; auction,  &lt;/li&gt;&lt;li&gt; voting/elections,  &lt;/li&gt;&lt;li&gt; chat,  &lt;/li&gt;&lt;li&gt; match making/lobby,  &lt;/li&gt;&lt;li&gt; leaderboards,  &lt;/li&gt;&lt;li&gt; persistent messages/email,  &lt;/li&gt;&lt;li&gt; reputation management/community services,  &lt;/li&gt;&lt;li&gt; in-game advertising&lt;/li&gt;&lt;li&gt; Search,  &lt;/li&gt;&lt;li&gt; authentication,&lt;br /&gt;&lt;/li&gt;&lt;li&gt;CSR account locking &lt;/li&gt;&lt;li&gt; patching, streaming patching &lt;/li&gt;&lt;li&gt; microtransactions &lt;/li&gt;&lt;li&gt; petitions&lt;/li&gt;&lt;li&gt; custom content &lt;/li&gt;&lt;li&gt;character annotation, friend lists,&lt;br /&gt;&lt;/li&gt;&lt;li&gt; knowledge base &lt;/li&gt;&lt;li&gt; voice chat &lt;/li&gt;&lt;li&gt;Maybe: inventory, quests, crafting (touches in-game entities)&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;Anyone got more for this list?&lt;br /&gt;&lt;br /&gt;Most of these systems are "decorative" and are for the community aspects of the game.&lt;br /&gt;&lt;br /&gt;The complication arises where the data managed by these services is affected by or used by the simulator (I.e. in-game logic). E.g. the number of members of your clan changes Mana recharge rate. I'd suggest that most of those kind of communications are not critical to be transactional or latency critical or can have the game design bent to accommodate that restriction.&lt;br /&gt;&lt;br /&gt;There are a couple of those game services (especially those interacting with items) that &lt;span style="font-weight: bold;"&gt;are&lt;/span&gt; entangled. The easiest way to deal with those is to transfer ownership of the Entities in question to one system or the other such that there is no synchronization needed other than at the transfer. I'm betting that is how WoW does auctions and mailing of items.&lt;br /&gt;&lt;br /&gt;My "run screaming; it sucks" article is my thinking about the core gameplay/simulator manipulated Entities. What Derek calls Real Time Data. To me that is the "hard problem". All the rest of the stuff can be handled by web-tech, and that is a solved problem (waves hand dismissively), and not so interesting.&lt;br /&gt;&lt;br /&gt;Well. There are a &lt;span style="font-style: italic;"&gt;few &lt;/span&gt;interesting issues, like coordinating authentication. But the coolest payoff (as Derek states) is that these things automatically become available offline via browsers, mobile devices, etc.&lt;br /&gt;&lt;br /&gt;BTW, I'm working on another contentious article that more fully details the issues that drive my opinion about DB-centric game state management.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-3048624050046358328?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/3048624050046358328/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/07/web-tech-for-game-services.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/3048624050046358328'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/3048624050046358328'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/07/web-tech-for-game-services.html' title='Web tech for &quot;game services&quot;'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-3802049169560515894</id><published>2009-06-29T12:41:00.000-07:00</published><updated>2009-06-29T12:50:47.233-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Software Engineering'/><category scheme='http://www.blogger.com/atom/ns#' term='Networking'/><title type='text'>Google's Protocol Buffers (for messages and files?)</title><content type='html'>This is an interesting package:&lt;br /&gt;&lt;a href="http://code.google.com/apis/protocolbuffers/docs/overview.html"&gt;&lt;br /&gt;http://code.google.com/apis/protocolbuffers/docs/overview.html&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;It can be used for on-the-wire and for file formats. It is much more efficient than XML, and has multiple language API’s (e.g. easier to send a message between apps of different languages). It deals with versioning, and has automated class and serialization code generation.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I haven't used it yet. Any thoughts? How good is it for archive files/pack files? How good is its historical version handling wrt up-converting or semantic changes (like feet to meters)?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-3802049169560515894?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/3802049169560515894/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/06/googles-protocol-buffers-for-messages.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/3802049169560515894'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/3802049169560515894'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/06/googles-protocol-buffers-for-messages.html' title='Google&apos;s Protocol Buffers (for messages and files?)'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-8530075808812438118</id><published>2009-06-23T07:47:00.000-07:00</published><updated>2009-06-23T08:15:07.438-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Processing models'/><category scheme='http://www.blogger.com/atom/ns#' term='Software Engineering'/><title type='text'>Entity Concurrency</title><content type='html'>Ever since Simula and Demos in the 60's, object-oriented (or process-oriented) simulation has been considered the most natural and intuitive approach to representing  a system. Certainly it is the way we think when we write programs using imperative languages (like C++), even when taking advantage of multithreading.&lt;br /&gt;&lt;br /&gt;Over simplifying things, process oriented entities have a "main loop" which continues to be in scope, on the stack but possibly suspended even as the entity is idle or blocked waiting for something. An example would be a vehicle that stops at a red light. After the light goes green, the program continues with the next line of code (perhaps navigating to the nearest gas station). Conversly in an event-oriented simulation, the car would be unscheduled and the light would have to signal the car change its state from waiting to driving the next leg of the route. In the one case, the programmer can write all the code from the perspective of the entities (the vehicle, the light...). In the other, they write a soup of events and state changes that is very hard to visualize and see whether the logic is correct.&lt;br /&gt;&lt;br /&gt;The easiect way to realize process oriented entities is to take advantage of coroutines built in the scripting language of your choice. Using a thread per object can get horribly expensive, even if they were cooperative threads. Using a coroutine allows a program to choose to block between two statements and go idle, context switching to another entity. When the resource being blocked on is available, the system can switch back to the suspended entity.&lt;br /&gt;&lt;br /&gt;So the challenge is coordinating the objects. You can look at a previous article on bin/res for some ideas.&lt;br /&gt;&lt;br /&gt;You can even have multiple concurrent activities running on an entity. E.g. monitoring your fuel level, driving a route, and listening for new orders from a taxi-dispatcher. Each would consume another coroutine. A complication here is that when a coroutine blocks and goes idle, the others might run and change the value of states in the entity. Fortunately this does not result in race conditions because coroutines are cooporative and only switch at a point that the programmer chooses. They can then be very aware that other processing might change things before they wake up again.&lt;br /&gt;&lt;br /&gt;Coroutines can also be used to spread computation across multiple ticks without having to refactor the algorithm you are using. If you have a long AI or path planning algorithm, you could suspend at any point and resume during the next tick. The exact context is restored by the scripting language coroutine system. It might also make it easier to cancel such a computation at those suspend points.&lt;br /&gt;&lt;br /&gt;When thinking about online games, load balancing is critical. We do it by migrating entities. But if an entity has a suspended context in a coroutine at the point you want to do the migration, things get harder. How do you pick up the stack context and reconstruct it on the target machine?&lt;br /&gt;&lt;br /&gt;One way is to refuse to migrate until the entire context tears down (e.g. it returns to the "main loop"). Better would be to make use of the scripting language facilities to serialize and restore a coroutine. Various folks have shown that Stackless Python is capable of pickling an entity that has an outstanding context, and reconstructing it (either after a save/load, or after migration).&lt;br /&gt;&lt;br /&gt;Has anyone tried "pickling" a coroutine in Lua? It's been on my todo list for a while. The question is whether references to (global) variables outside the coroutine can be "reattached" when it is unserialized on the other side. And what format is the coroutine "printed" and trasmitted as?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-8530075808812438118?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/8530075808812438118/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/06/entity-concurrency.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/8530075808812438118'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/8530075808812438118'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/06/entity-concurrency.html' title='Entity Concurrency'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-1272656078468040929</id><published>2009-06-16T10:16:00.000-07:00</published><updated>2009-06-16T11:08:46.472-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Off topic'/><title type='text'>Off Topic: Most common solid waste?</title><content type='html'>OK. Weird thought... If you went to a landfill, what would be the most common solid waste you saw? A number of years back, a greeny asked that question and challenged people to go see for ourselves. It's been bugging me ever since. Here are some quick Googled results.&lt;br /&gt;&lt;br /&gt;The most common waste product is paper (about 40 percent of the total). Other common components are: yard waste (green waste), plastics, metals, wood, glass and food waste. The composition of the municipal wastes can vary from region to region and from season to season. (U of Cal)&lt;br /&gt;&lt;br /&gt;Paper, Organics (in Canada, from an Amazon "search inside" book)&lt;br /&gt;&lt;br /&gt;Malaysia: Plastic waste is the most common solid waste that we generate in the country accounting for 7-12 percent by weight and 18-30 percent by volume of the total residential waste generated.&lt;br /&gt;&lt;br /&gt;Throwaways (diapers) comprise 2 percent of the nation's solid waste by weight, making them the third most common solid waste item after newspapers and beverage and food containers. (diaper "activist" site, NY)&lt;br /&gt;&lt;br /&gt;So "paper" is #1 and is quite recyclable, and energy rich. Plastic and organics too (possibly #2 and #3, if you include diapers; heh!).&lt;br /&gt;&lt;br /&gt;Still doesn't answer what the "source" of the paper and plastic is. Fast food bags and wrappers? Retail boxes/packaging, grocery packaging, industrial/wholesale packaging? Books, junk mail, printouts?&lt;br /&gt;&lt;br /&gt;I'd love to have a reference to a more detailed discussion of the source data. Not the reinterpreted-for-an-agenda summaries. And then make my own conclusions about reducing. (Maybe I should just look in *my* trash can at the end of the week).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-1272656078468040929?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/1272656078468040929/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/06/most-common-solid-waste.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/1272656078468040929'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/1272656078468040929'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/06/most-common-solid-waste.html' title='Off Topic: Most common solid waste?'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-4043597861551282311</id><published>2009-06-15T08:35:00.000-07:00</published><updated>2009-06-15T10:01:35.655-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Scalability'/><category scheme='http://www.blogger.com/atom/ns#' term='Software Engineering'/><title type='text'>Online Hard Problems</title><content type='html'>I've tried to keep a list of "hard problems" for online games. Sometimes it is to impress folks with how much work is involved in trying to build the technology itself (and redirect them toward middleware), and sometimes it is to remind me of how much work we have left as a middleware company. Here is my current collection. It is not every system needed, but the ones that are particularly tricky or challenging and require the right architectural decisions early. I'd appreciate if folks would add to this.&lt;br /&gt;&lt;br /&gt;Hopefully, over time I'll be able to publish my conclusions about how to actually solve these problems. Many interact, and need a unified approach to be able to be solved at all.&lt;br /&gt;&lt;br /&gt;Distributed and parallel nature&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Load balancing.&lt;/li&gt;&lt;li&gt;Latency hiding.&lt;br /&gt;&lt;/li&gt;&lt;li&gt; Timing, synchronization and ordering.&lt;/li&gt;&lt;li&gt;Anticheating/security &lt;/li&gt;&lt;/ul&gt;Performance&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Threading,&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Execution performance, &lt;/li&gt;&lt;li&gt;Memory performance,&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Network performance and message streaming, &lt;/li&gt;&lt;li&gt;Persistence performance, &lt;/li&gt;&lt;li&gt;      Bandwidth reduction,&lt;/li&gt;&lt;li&gt;Content streaming/background loading&lt;/li&gt;&lt;/ul&gt;                 Scale and seamlessness&lt;br /&gt;&lt;ul&gt;&lt;li&gt; Scale for both casual and intensive styles (reusing the same infrastructure)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Seamlessness at runtime&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Entity interaction across server "boundaries"&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Seamlessness at edit time &lt;/li&gt;&lt;li&gt;Transactional entity interaction &lt;/li&gt;&lt;li&gt;Shardless worlds&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;Ease of Use/Development/Operations&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Ease of use for designers, game developers (code and assets)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Ease of developing large scale content &lt;/li&gt;&lt;li&gt;Development efficiency and rapid iteration (assets and code), &lt;/li&gt;&lt;li&gt;Fault tolerance, debuggability of large distributed system,&lt;/li&gt;&lt;li&gt;Performance debugging, &lt;/li&gt;&lt;li&gt;Script/behavior debugging, script/behavior ease of use, &lt;/li&gt;&lt;li&gt;Hosting, operations ease of use (server deployment, control, monitoring, ...)&lt;/li&gt;&lt;li&gt;In game microtransactions (security) &lt;/li&gt;&lt;li&gt;Integration to multiple platform services &lt;/li&gt;&lt;li&gt;Client deployment, patching, binary patching &lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-4043597861551282311?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/4043597861551282311/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/06/online-hard-problems.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/4043597861551282311'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/4043597861551282311'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/06/online-hard-problems.html' title='Online Hard Problems'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-1344489892356589627</id><published>2009-06-05T12:01:00.000-07:00</published><updated>2009-06-05T13:45:55.769-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Processing models'/><category scheme='http://www.blogger.com/atom/ns#' term='Software Engineering'/><title type='text'>Bin/Res process synch</title><content type='html'>Back in the 80's Graham Birtwistle developed Demos (ala Greek citizenry, not that canned thing the publisher is shown), a library extension to Simula. It is one of the earliest and most intuitive process-oriented simulation systems around. Here's one link:&lt;br /&gt;&lt;a href="http://cs.ubishops.ca/ljensen/simulation/sync.htm"&gt;http://cs.ubishops.ca/ljensen/simulation/sync.htm&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;There is a particular synchronization mechanism I have always liked about Demos. It is called &lt;a href="http://www.iro.umontreal.ca/%7Evaucher/DEMOS/Demos8"&gt;bin/res&lt;/a&gt;. These are two queue-like objects that are most easily described as elements in an assembly line. A res is a resource that can be obtained and returned. (Yes, like a semaphore.) A bin is a collection into which items can be placed and removed. The items can contain arbitrary information.&lt;br /&gt;&lt;br /&gt;What is interesting here is that when an entity pulls from a bin or a res when it is empty, the entity blocks until something becomes available. If an entity puts something into a bin or res that exceeds its configured maximum the producing entity blocks.&lt;br /&gt;&lt;br /&gt;In process-oriented entity modeling, this is a great simple synchronization technique. In the middle of a complex bit of behavior logic, the Entity can block until something important happens.&lt;br /&gt;&lt;br /&gt;Another cool thing about bin/res is that it has an event-oriented optimization. When an Entity is blocked, it inserts itself into an entity queue and blocks. When the other Entity changes the bin/res, it uses that queue to notify and wake up the blocked entity. This avoids all polling. It also works between processes using messages. That can be very important and useful in an online game.&lt;br /&gt;&lt;br /&gt;Also note that bin/res can have a lot of other extensions. There can be multiple producers/consumers. The queues can be LIFO, FIFO or randomly serviced. And it is easy to put statistics like wait-times on them.&lt;br /&gt;&lt;br /&gt;There are a million uses for bin/res. You can make your behavior scripts wait until some property reaches a desired value without polling. You can make two entities wait for each other before proceeding no matter what order they get to the staging area. You can deal with fairness and race conditions between multiple players. Any kind of queuing can be automatically flow controlled.&lt;br /&gt;&lt;br /&gt;I've been a fan for almost 30 years. Ever since I came across Demos at University of Calgary. Hi Graham!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-1344489892356589627?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/1344489892356589627/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/06/binres-process-synch.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/1344489892356589627'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/1344489892356589627'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/06/binres-process-synch.html' title='Bin/Res process synch'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-7136782113865106139</id><published>2009-06-01T13:47:00.000-07:00</published><updated>2009-06-01T14:02:36.012-07:00</updated><title type='text'>Smart pointer leak code worked</title><content type='html'>I have to brag on a prototype I finished last week. I'd mentioned previously a way to track &lt;a href="http://onlinegametechniques.blogspot.com/2009/03/debugging-leaked-references.html"&gt;smart pointer leaks&lt;/a&gt;. Well, I used a buddy's stack trace snapshot library and was able to inspect where in code all the smart pointers were set for an arbitrary object.&lt;br /&gt;&lt;br /&gt;&lt;efd::ilogger&gt;&lt;efd::ilogger&gt;&lt;efd::ilogger&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&lt;verbatim&gt;&lt;br /&gt;void LoggerTest::testInit()&lt;br /&gt;{&lt;br /&gt;   void *temp = (void*)efd::GetLogger();&lt;br /&gt; if (true) {&lt;br /&gt;    efd::SmartPointer thing;&lt;br /&gt;    thing = efd::GetLogger(); // assignment should be recorded here.&lt;br /&gt;&lt;br /&gt;    DarrinTestDumpStack(temp);&lt;br /&gt; }&lt;br /&gt; DarrinTestDumpStack(temp); // The reference above disappears in this printout.&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;And the output:&lt;br /&gt;efd::SmartPointer::SmartPointer + 44&lt;br /&gt;LoggerTest::testInit + 80 &lt;&lt;&lt;&lt;------ See, this was where I grabbed the ref. TestDescription_LoggerTest_testInit::runTest + 22 CxxTest::RealTestDescription::run + 67 CxxTest::TestRunner::runTest + 116 CxxTest::TestRunner::runSuite + 170 CxxTest::TestRunner::runWorld + 172 CxxTest::TestRunner::runAllTests + 80 CxxTest::ErrorFormatter::run + 21 main + 106 __tmainCRTStartup + 424 mainCRTStartup + 15   efd::LoggerSingleton::Initialize + 459 &lt;&lt;&lt;&lt;------ See. This is the only other reference (the factory). InitializeTestApp + 300 EEGlobalFixture::setUpWorld + 22 CxxTest::RealWorldDescription::setUp + 94 CxxTest::TestRunner::runWorld + 103 CxxTest::TestRunner::runAllTests + 80 CxxTest::ErrorFormatter::run + 21 main + 106 __tmainCRTStartup + 424 mainCRTStartup + 15 RegisterWaitForInputIdle + 73 0xffffffffcccccccc   *************************************************************** Second dump (only the factory reference is left): efd::LoggerSingleton::Initialize + 459 InitializeTestApp + 300 EEGlobalFixture::setUpWorld + 22 CxxTest::RealWorldDescription::setUp + 94 CxxTest::TestRunner::runWorld + 103 CxxTest::TestRunner::runAllTests + 80 CxxTest::ErrorFormatter::run + 21 main + 106 __tmainCRTStartup + 424 mainCRTStartup + 15 RegisterWaitForInputIdle + 73 0xffffffffcccccccc &lt;/verbatim&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/efd::ilogger&gt;&lt;/efd::ilogger&gt;&lt;/efd::ilogger&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-7136782113865106139?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/7136782113865106139/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/06/smart-pointer-leak-code-worked.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/7136782113865106139'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/7136782113865106139'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/06/smart-pointer-leak-code-worked.html' title='Smart pointer leak code worked'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-3737597855367239177</id><published>2009-05-21T07:47:00.000-07:00</published><updated>2009-05-21T11:01:21.683-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Scalability'/><category scheme='http://www.blogger.com/atom/ns#' term='Networking'/><category scheme='http://www.blogger.com/atom/ns#' term='Interest Management'/><title type='text'>Adaptive/Dynamic Categorization</title><content type='html'>Any static partitioning of data flows using a static algorithm (like a geometric grid) will break down. Either due to something intrinsic, or because the players are evil, and delight in ignoring your simplifying assumptions.&lt;br /&gt;&lt;br /&gt;So for stability and performance/scale reasons a categorization policy that is dynamic will be better. Periodically a centralized or distributed computation can inspect the in-use categories and determine whether there is a better mapping.&lt;br /&gt;&lt;br /&gt;Imagine there was a flash-crowd come to see an epic battle. The initial mapping had that area mapped to one category. The area can be split into two area and given two distinct categories. At a specific point in time, the new mapping will be instituted. Call that the next epoch. We need to do that across all users of those categories. The easiest way would be to broadcast the new mapping. Once all participants have acknowledged they have the new mapping and re-subscribe using that mapping, producers can start using the new categories. Once all participants have acknowledged adopting the new mapping, the old categories can be unsubscribed from.&lt;br /&gt;&lt;br /&gt;There are many interesting dynamic categorization policies. They are based on live measurement of the system as the game is being played. The new categorization is then applied live. Clearly this can provide better efficiency than offline analysis and updating a static mapping, since in-game patterns can change minute to minute.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Split and merge geometric grid cells or triangles. This is how many mmo systems work today, so if that is your preference you can keep it while migrating to using categories.&lt;/li&gt;&lt;li&gt;Compute "nearness" between producers and consumers, and identify clusters of entities. The set of entities identified is given a category. As entities move around, they subscribe to other clusters of entities. Once the interconnections become too "stretched", a new clustering can be computed. There are pretty simple clustering algorithms, such as inspecting R-squared for an Entity against various proposed clusters to find the nearest one inside some threshold. Note that this approach can be applied to individual entities incrementally as they move.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;A similar clustering algorithm can be applied to actual communication patterns as opposed to assuming that geometric nearness always indicates high interaction rates. The recategorization system would record and analyze metrics of interactions as they change in real time.&lt;/li&gt;&lt;li&gt;The hosting software can detect overloads of various resources, and indicate to the recategorization system that splitting off instances of busy areas would be valuable (assuming your game design supports that approach).&lt;/li&gt;&lt;li&gt;As dungeons or PvP arenas are spawned, a new set of categories especially for the players involved will be computed and used. Because the subscribers indicate which instance they are currently assigned to, the categorization policy will compute the Categories for that instance, and a player will not see anything from a different instance.&lt;/li&gt;&lt;li&gt;A game may have unique capabilities like being able to add, remove or move land masses. As coordinate systems drift around, the categorization policy would need to take this into account. It may be able to do that using relative coordinates, or it may need to completely recategorize space.&lt;/li&gt;&lt;li&gt;Changes in team or guild size may trigger recategorization.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;As a clarification, in contrast to dynamic categorization, consider this scenario: As density increases in an area, the policy can adjust the maximum sight distance. While this doesn't directly change the decomposition of the data, it will result in fewer categories being consumed. (This is really a subscription policy change, not a categorization policy change)&lt;/li&gt;&lt;/ul&gt;Really, the key thing is to consider that real time feedback can be used to help improve the data distribution performance of your system.&lt;br /&gt;&lt;br /&gt;Obviously an algorithmic policy is better than an enumerated one, since distributing new parameters to factor into a computation requires less data than a complete enumeration of the descriptions for each category.&lt;br /&gt;&lt;br /&gt;Like almost any distributed computation, recategorization can benefit from incremental changes. Local measurements and performance triggers can instigate a categorization change to just a few mappings, and may be able to be distributed to only a few machines. These incremental changes can quickly ripple through the category space and come to an approximately optimal mapping.  One benefit of the incremental approach over the periodic approach is that the trigger to recompute can be an event as opposed to a polling of metrics and a recomputation that may have limited or no effect.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-3737597855367239177?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/3737597855367239177/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/05/adaptivedynamic-categorization.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/3737597855367239177'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/3737597855367239177'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/05/adaptivedynamic-categorization.html' title='Adaptive/Dynamic Categorization'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-1144292146551122407</id><published>2009-05-14T07:42:00.000-07:00</published><updated>2009-05-14T11:43:03.733-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Scalability'/><category scheme='http://www.blogger.com/atom/ns#' term='Interest Management'/><title type='text'>Static Categorization Policy</title><content type='html'>A Static Categorization policy is one that partitions the data flows in an online game using a fixed function (usually a function of the values of the data being partitioned). The algorithm used does not change, and the tuning values do not change during gameplay. The performance of the system is determined by the designer during development testing. There are many examples:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;A 2-d grid with infinite altitude. Any entity in a grid cell transmits to the associated category. Any entity that can see into that grid cell subscribes to that category. And the grid size does not change. This can lead to lots of categories (not a real problem if your categorization implementation is designed for that), too many entities in a cell (if the cells are tuned too large), too many subscription changes (if the cells are too small). Etc.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Variations of the grid approach include non-uniform grids (quad-trees), or arbitrary polygon "tiling".&lt;br /&gt;&lt;/li&gt;&lt;li&gt;The designer could identify rooms, or spaces between rendering portals. Each would be assigned a unique category, and all data within them would be grouped. Only when near doors would a consumer subscribe to more than one category.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;FPS engines have used BSP trees to optimize rendering and visibility (and auditory effects) calculations (e.g. http://developer.valvesoftware.com/wiki/BSP_Map_Optimization) Whether automatically computed ahead of time or defined with hints, these 3-d volumes could have a category each. It is easy to query the data structure to determine the category corresponding to the current position. Doing an approximate spherical or convex volume query of the BSP tree should also be easy for consumers to find all their needed categories.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Data limited to team, quest party, or clan can be separated each into its own category. A consumer (client) always knows which "entity set" they are in, so can determine where to publish and where to consume. Same with chat channels.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Chat channels are interesting in that you may want to use a 3rd party chat infrastructure, but use in-game position or group membership to determine connectivity. Categories can be very useful for that, or if you want, could be used to route chat through the game engine. That way gameplay mechanics could affect chat (jamming, ambient "noise", battery charge, ...)&lt;/li&gt;&lt;/ul&gt;Again, all these are considered static because the algorithm and tuning do not change. If something gets overloaded there is no recourse to retune at runtime. Best you can do is change the policy in the next patch.&lt;br /&gt;&lt;br /&gt;Note that membership is dynamic, however. That is how the system is able to limit what data is received (for security and bandwidth reduction), but still get everything that is needed to the right consumers.&lt;br /&gt;&lt;br /&gt;Dynamic categorization is even more fun, but has some interesting challenges. More next time.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-1144292146551122407?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/1144292146551122407/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/05/static-categorization-policy.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/1144292146551122407'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/1144292146551122407'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/05/static-categorization-policy.html' title='Static Categorization Policy'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-6770865891338906548</id><published>2009-05-07T15:05:00.000-07:00</published><updated>2009-05-07T16:04:14.137-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Interest Management'/><title type='text'>Subscribing ahead</title><content type='html'>Interest management is the process of delivering interesting data to those interested. An entity ensures that its sensor model has all the correct entities available to it by subscribing to the set of categories known to contain those entities. That subscription fills the local cache with replicas of the interesting entities.  Then the sensor can be implemented to immediately query the cache knowing that it contains accurate data.&lt;br /&gt;&lt;br /&gt;The policy the application uses to fill the cache can be as simple as a grid, or as complex as a dynamically determined set of clusters of nearby entities. In either case, the consumer knows the category being used by any entities that might be within the consumer's area of interest, field of view, line of sight, range of hearing or what not. Note that the producer generally produces into a single category.&lt;br /&gt;&lt;br /&gt;So now producers send updates, and interested consumers get them. But what if someone moves? They will change the category they produce into, or the set of categories they consume from. But what about latency? That is where it gets "fun". A consumer can use its maximum velocity and an estimate of subscription latency to compute the extra distance needed to guarantee that the cache contains those moving entities even when the producer and consumer are moving toward each other at maximum velocity. That will ensure that its cache is populated with any entities that will-be interesting in a few moments. The cache will necessarily have more entities than are logically visible, so the sensor algorithm must filter out ones that are too far away but have been delivered "just in case".&lt;br /&gt;&lt;br /&gt;But what about producers that are moving? We use latency hiding and bandwidth optimization to reduce the number of updates being sent, and then predict where the producer is at the current time. But that increases the latency of the produced data. OK. We increase the subscription range even farther. But you have to factor in the producer's maximum velocity. And you don't know what type of entity it is. So you have to use the maximum velocity of the fastest entity in the game.&lt;br /&gt;&lt;br /&gt;That can suck if there are a few jet fighters, but tons of infantry. The solution? Separate the fast movers from the slow movers and use two distinct "layers" of categories. You would subscribe out quite a bit further for fast movers, but wouldn't get any slow movers in that wide subscription because they are produced into a different layer. And there won't be very many fast movers to consume. If you think about it, there is just no way to consume all the data from fast movers that are far away and might suddenly move toward you if there are a lot of them. The worst case example is an entity that can teleport anywhere instantly. Consumers would have to be subscribed to everything. Or you would have to change the game design to hide the latency from subscribing on-demand (like making a poof of smoke, or invulnerable just after teleport, or something).&lt;br /&gt;&lt;br /&gt;The beauty of category based subscription is that these game-specific factors can be reduced to a set of integer values that can be computed by both producer and consumer. The consumer doesn't need to know that there are entities in the interesting category. If there are, the publish/subscribe system will deliver them. All that system does is match up integer values: hey this thing goes to these guys. The system doesn't need to assume that the categories are related to geometry, or that they are a linear function, or anything. You can use them as sets. Or unique addresses for multi-point to point communication, or for radio broadcasts on a single station. Or anything.&lt;br /&gt;&lt;br /&gt;But my point: you have to use prediction to reduce bandwidth. But you also have to use prediction when subscribing or might miss something "interesting".&lt;br /&gt;&lt;br /&gt;&lt;a href="http://onlinegametechniques.blogspot.com/2009/02/simulator-is-authoritative-db-centric.html"&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-6770865891338906548?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/6770865891338906548/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/02/subscribing-ahead.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/6770865891338906548'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/6770865891338906548'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/02/subscribing-ahead.html' title='Subscribing ahead'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-6317366112404370286</id><published>2009-04-30T07:25:00.000-07:00</published><updated>2009-04-30T11:02:04.758-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Processing models'/><category scheme='http://www.blogger.com/atom/ns#' term='Scalability'/><category scheme='http://www.blogger.com/atom/ns#' term='Networking'/><title type='text'>Why would you want gateway boxes?</title><content type='html'>An important aspect of an MMO shard is where and how clients connect. My preference is to have a set of Gateway boxes that responsible for managing those connections. Their responsibilities include:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Authentication handshakes with the client and account system&lt;/li&gt;&lt;li&gt;Rapid filtering of malicious traffic (IP filtering or more), denial of service attacks, smurfing (billions of identical, apparently innocuous/legal requests), and so on&lt;/li&gt;&lt;li&gt;Separation of responsibility. This is valuable in a memory and cache efficiency sense, in that a single machine is focusing on just one thing. Simulators don't have to keep track of clients. They only connect to GW boxes.&lt;/li&gt;&lt;li&gt;Instead of N clients each connecting to up to K simulators (N*K connections), you have only N client connections, each to only one of J gateways and the K simulators only connect to the J gateways (N+J*K). Since N is &lt;span style="font-weight: bold;"&gt;much&lt;/span&gt; bigger than J, this is a huge advantage both in connection count, connection management processing overhead, and memory on both client and server hosts. Normally N is quite a bit bigger than J (say 4 times).&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Message exploding and the majority of connections are happening in the data center over high speed switches, and in a secure environment on the backend switch.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;By focusing only on GW functions, and not running game logic, they can be made much more reliable than simulator boxes/processes. This can help a lot with player experience during fault recovery&lt;br /&gt;&lt;/li&gt;&lt;li&gt;The Gateway boxes are the only ones with public IP addresses, so it allows a large fraction of your shard to be secure by having no direct route from the WAN. The idea here is GW boxes have (at least) two NICs, one on the switch with the main firewall, the other on the backend network.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;This also has physical network topology benefits, since the back end hosts can be on their own switch.&lt;/li&gt;&lt;li&gt;Message header overhead is reduced when sending to a client, since all data to one client is from a single shard host, and it can do bundling for over the WAN messages (most important).&lt;/li&gt;&lt;li&gt;Gateways can also "be" the lobby or character selector prior to entering the game world.&lt;/li&gt;&lt;li&gt;Non simulation messages like chat or game service stuff (auctions, guild management, email, patching/streaming of content, ...) don't bother the simulators.&lt;/li&gt;&lt;li&gt;The sizing and configuration to optimize for the size of your peak player connections are now independent of that for the simulators and load balancing.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;I also subscribe to the philosophy of persistent client connections (or with connectionless protocols, staying with one assigned gateway). The major benefit of this is that a client does not have to reauthenticate and renegotiate their connection with another host in the shard when their character moves around in the world, or some other load balancing activity changes the simulators they need to interact with.&lt;br /&gt;&lt;br /&gt;To do this, the GW is also responsible for routing messages between the client and the simulators that are "of interest". This gets back to category based routing and channel managers discussed earlier. Data from multiple simulators is sent to the GW box and forwarded to each interested client.&lt;br /&gt;&lt;br /&gt;Data from the client tends to be routed to the one Simulator that client is currently using. I.e. the one that owns its "controller" entity where client requests are validated, and (normally) their player character is owned/simulated.&lt;br /&gt;&lt;br /&gt;You want multiple gateway processes (not multi threaded) per gateway box to avoid losing as many player connections if something should crash (and then reauthenticating, etc). This also helps deal with file descriptor limitations per process for the connections if your OS configuration limits you.&lt;br /&gt;&lt;br /&gt;There are downsides, but not overwhelming:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;An extra hop for most messages. This hop is on a datacenter switch, and will be very fast.&lt;/li&gt;&lt;li&gt;There are extra machines to buy. Well, not really, the same message handling work is being done but not directly by the simulators, so they can get more done each (and that has other more subtle communcation benefits). We just partition it, and use the same number of machines.&lt;/li&gt;&lt;li&gt;An extra switch and extra NICs. You can use two VLANs on any decent switch, if you have to.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;In summary, you are just moving some work from one place to another in the same sized shard, but getting a lot of system simplicity, security, and communication benefits.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-6317366112404370286?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/6317366112404370286/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/04/why-would-you-want-gateway-boxes.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/6317366112404370286'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/6317366112404370286'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/04/why-would-you-want-gateway-boxes.html' title='Why would you want gateway boxes?'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-3182961200986190958</id><published>2009-04-24T06:21:00.000-07:00</published><updated>2009-04-24T11:37:21.092-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Processing models'/><category scheme='http://www.blogger.com/atom/ns#' term='Scalability'/><category scheme='http://www.blogger.com/atom/ns#' term='Networking'/><title type='text'>Peer to What?</title><content type='html'>Ever notice how people use exactly the same word to make an important differentiation? Shorthand, laziness, different backgrounds, or &lt;a href="http://en.wikipedia.org/wiki/Hanlon%27s_razor"&gt;maliciousness&lt;/a&gt;?&lt;br /&gt;&lt;br /&gt;So in our industry what do people really mean, or think they mean when they ask "Hey. Does that support peer to peer?" Here are a couple of definitions and more clear terminology that I prefer. Maybe I can find some sources to back me up. BTW, I'm thinking of small-scale online today. Although many/most of my architectural biases apply here as well. (Other than whether the server is hosted or not).&lt;br /&gt;&lt;br /&gt;Academically speaking, peer-to-peer is a communication topology that, in general, indicates that clients or players communicate directly with each other. Like peer-to-peer file sharing. It is used in contrast with the client-server topology where each client has a single connection to the server, and any interaction between clients is performed by means of the server. One benefit of the peer to peer topology is that data can move between clients with one hop of latency instead of two. One benefit of client-server is that each client can be presented the same data in the same order (deterministically).&lt;br /&gt;&lt;br /&gt;However, many people are less concerned with whether two clients communicate directly, but instead use the term peer-to-peer to indicate they desire other features:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;There is no part of the game hosted in a central location like a data center. Note that central-server or authoritative-server is not the same thing as hosted-server.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;There is no stand-alone server that needs to be stood up ahead of time, possibly occupying another piece of hardware. When the player starts their client, the multiplayer game is automatically ready to go. There is no separate server process consuming extra resources on one of the user's machine.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;There is no single point of failure in the distributed system. If the master (usually the first player to start) were to drop out of the game, there is no reason for the remaining players to go through matchmaking again.&lt;/li&gt;&lt;/ol&gt;It turns out that a client-server topology can be used and still achieve all three of the above features. Obviously, "hosted" or not in #1 is easy. Just run the server on someone's machine. The long-running server in #2 is easy. Either start one up and shut it down when the client is started or stopped, or you join someone else's game session. Or if you are resource constrained (like on a console), and can't abide the second process, make a dual-purpose client. The first player in would have their client become the master/authoritative server. #3 requires a little bit of technology. When the master drops out of the game, another of the clients must become the master, and the other clients connect directly to them. This gets a little tricky in a non-uniform network, but it is a problem that is solvable with automatable algorithms, so the players don't know that it is happening.&lt;br /&gt;&lt;br /&gt;So even if you have some of the three problems above, you don't have to jump to the conclusion that client-server is therefore not approprite. And given how much easier it is to get a system running that assumes there is a single authoritative server, I'd recommend you look into it, and prove (ie. measure) whether you have an intractible problem with client server.&lt;br /&gt;&lt;br /&gt;In the mean time, I'm going to continue to be skeptical when folks say "I need peer-to-peer". My question will be "what feature of peer-to-peer do you think you need?". My "favorite" topology for small scale non-hosted multiplayer is what I call "peered-server" to contrast it to hosted-server. Client server communication topology, single authoritative simulator, no central persistent hosted server process (other than a match maker, which is not the same thing as a simulator).&lt;br /&gt;&lt;br /&gt;Really, the thing to keep in mind is that the process communication topology does not have to equate to the network communication topology. It could be a little more efficient if they match, but could be a lot harder to get your project finished.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-3182961200986190958?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/3182961200986190958/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/04/peer-to-what.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/3182961200986190958'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/3182961200986190958'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/04/peer-to-what.html' title='Peer to What?'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-5431117816954704384</id><published>2009-04-20T07:50:00.001-07:00</published><updated>2009-04-20T10:48:59.132-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Off topic'/><category scheme='http://www.blogger.com/atom/ns#' term='Software Engineering'/><title type='text'>How to ignore most emails (safely)</title><content type='html'>At work, I use Outlook. It has "organization" tools. One of the best is found here: View/ArrangeBy/Custom/AutomaticFormatting. It gives you a set of conditions and formatting for each line in the list view. I use the default of unread mail formatting with a bold font.&lt;br /&gt;&lt;br /&gt;1) I use my favorite font color for emails that are sent only to me: select "where I am:" "the only person on the To line". Even if others are CC'd on this email, it was sent directly to you. You'd better read it. No one else will.&lt;br /&gt;&lt;br /&gt;2) My second favorite font color for emails sent specifically to me (and others): select "where I am:" "on the To line with other people". Someone took the time to direct this email directly to you and some of your peers. You should probably read it. Maybe one of your peers will respond, but you never know.&lt;br /&gt;&lt;br /&gt;3) My third favorite font color for emails that I was specifically CC'd on: select "where I am:" "on the CC line with other people." (also covers where you were the only person on the cc line). Someone took the time to include you on the thread, but you know it is FYI (for your information) only, so you can almost certainly ignore it without getting in trouble, or until you have some time. This allows you to decide based on the subject line, for example. Many #3's turn into someone else's thread, and I can get by with skimming only a couple of those.&lt;br /&gt;&lt;br /&gt;4) My least favorite color is "all the rest", and will be the default formating. These will almost always be delivered to you because you are on an email group. These can almost always be ignored for several days. If you happen to be on one or two where that is not the case, make a custom rule to highlight these. If you are on a ton of groups that each need immediate attention, you are probably doing something else wrong that I can't give you emperical advice about. I know I wouldn't survive long under that kind of stress. Don't borrow that kind of trouble. Seriously consider whether your response in those forums is honestly needed so urgently. I'l bet if you ignore it and you are the only one that can respond, you are eventually going to get a #1 or #2 email on it.&lt;br /&gt;&lt;br /&gt;I also use a custom font color for "followup" emails that I have specifically marked as needing my attention later. Not only do they show up as red, but they sort to the "bottom" of my list view (I read my list like a console window scrolls.&lt;br /&gt;&lt;br /&gt;I don't empty my inbox. I think that is "busy work", and just puts important items out of view in other mail boxes where I wind up forgetting about them. I only open half or 2/3rds of my emails, and only when I decide I have time.  I'll flip the window open (left on list view) and see if there are any #1's and then just close it. That means almost half my emails are left in the "bold" state, and I don't feel guilty about that. Those are emails that were broadcast, not sent "to" me expecting me to solve a problem. They were FYI to me. If it was an important broadcast, it probably had the "high priority" flag set.&lt;br /&gt;&lt;br /&gt;I also use the preview window and set auto-read on (marks the item as read if you "preview" it for more than a few seconds). If I get the sense I need to read this more carefully or later, I'll manually flip it to unread so it stands out and I'll reevaluate it later.&lt;br /&gt;&lt;br /&gt;These tricks work well in meetings when you think you need to glance at your queued emails, or any other time when you are very pressed for time. A simple glance at the font colors gets you focused on the one or two emails you should look inside. Often there is little or nothing to do with even those, so mark them unread for when you have a block of time to "clear" the backlog.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-5431117816954704384?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/5431117816954704384/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/04/how-to-ignore-most-emails-safely.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/5431117816954704384'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/5431117816954704384'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/04/how-to-ignore-most-emails-safely.html' title='How to ignore most emails (safely)'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-8719644752782357322</id><published>2009-04-16T14:32:00.000-07:00</published><updated>2009-04-16T15:00:10.558-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Processing models'/><category scheme='http://www.blogger.com/atom/ns#' term='Scalability'/><category scheme='http://www.blogger.com/atom/ns#' term='Software Engineering'/><title type='text'>What does a "Flexible" architecture look like?</title><content type='html'>We've all heard the maxim that the only constant is change. This is true on a lot of levels. During the development of a title, the designer is going to come up with some doozies..."Hey, what if the player controls the character *and* his minions?" Maybe it is an important change, but if you have a big complicated (and maybe purpose-built) MMO system, things like that can give you nightmares.&lt;br /&gt;&lt;br /&gt;So even if you know (or think that you know) what the game is going to look like, 3-5 years down the road, it won't, and you'll have a lot of difficult refactoring if you don't plan for change. Put in insulation layers so that big changes don't propagate very far. For example, don't assume you are going to be using SQL, and start coding Entity behaviors with embedded queries. Instead, define a persistence interface that could be implemented by any number of technologies. Maybe it starts as a flat file, or an XML DB. But by the time you go live, Microsoft bought your company and they want the DB on SQL Server. And while we are talking about the DB, don't *ever* talk to the DB with a synchronous query that blocks gameplay. You might be surprised by the variance in response times even a good Oracle DB will give. My horror story: chances were 50:50 that when we deployed a new version on TSO that had a schema change, that the DBA's would migrate the data in some hand-cobbled way. And (get this) an index file would disappear. Things looked ok until you got a few hundred live customers connected.&lt;br /&gt;&lt;br /&gt;Where was I? Insulation. An architecture that can withstand or localize pretty radical changes is simultaneously flexible. You can use the parts in different ways, or replace things you don't like. And when you replace them the change doesn't ripple very far. I'm arguing you need to do this even for a purpose-build MMO-engine, so doing the same thing for a middleware MMO engine results in no net impact on performance or usability. And since a middleware developer wants to sell their engine to studios doing a variety of titles, that flexibility is required just to make the sale.&lt;br /&gt;&lt;br /&gt;I like to think about sitting across from a hard-nosed tech-lead in a sales meeting who absolutely *NEEDS* a certain feature that we don't have. I can say: no, we didn't do that, but its flexible, and you can swap that peice out. They can start with what we have, and do the swap out if they have time; which I cynically think doesn't happen very often. When have you had "extra" time during development? Well, at least we made the sale. Its a lot better than saying no we don't do that, our way is more efficient, and it will be wicked-hard to change. OK, I have to share this too: GDC '07 or '06, I heard Tim Sweeny say: "Modularity is overrated". Makes you wonder why folks love to hate developing with Unreal.&lt;br /&gt;&lt;br /&gt;How do you create insulation? Look into PIMPL (pointer to implementation) or Interfaces (pure virtual base classes) to hide all implementation detail from users of a module. Make the modules loosely coupled. Don't assume too much about ordering or synchronous interaction between modules. One of my favorite patterns is the publish/subscribe or producer/consumer pattern. One module sends a message that is categorized, and has no idea who might consume it. Modules that want that data or notification subscribe ahead of time, and run a handler when that message is sent. Now you can add new modules without even recompiling the sender code, much less adding a compile-time dependency. Turns out this approach is good for improved compile times too.&lt;br /&gt;&lt;br /&gt;Loose coupling between software modules is really great. Take it to the next step, and avoid (run screaming from?) synchronous interaction between processes. E.g. I wouldn't use CORBA. Too easy to create deadlocks. Instead, prefer sending an asynchronous message. Don't use critical sections in blocks of code, or locks on all kinds of data structures. Too easy to create deadlocks. Instead prefer sending an asynchronous message.&lt;br /&gt;&lt;br /&gt;I see a pattern developing here. It leads to the ability to map logical processes to other threads or other remote processes with very little change to your code. The event-oriented, message-based approach to distributed systems is very successful. I guess I've already talked about CSP (Communicating Sequention Processes) in an early post, so I'll just drift off.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-8719644752782357322?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/8719644752782357322/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/04/what-does-flexible-architecture-look.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/8719644752782357322'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/8719644752782357322'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/04/what-does-flexible-architecture-look.html' title='What does a &quot;Flexible&quot; architecture look like?'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-730136833505191759</id><published>2009-04-07T10:03:00.000-07:00</published><updated>2009-04-07T10:38:36.264-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Software Engineering'/><title type='text'>Never Too Many Asserts</title><content type='html'>My friend Keisuke says an assert is like a circuit breaker. It does no harm to be in the circuit when things are operating normally. But when the voltage goes out of bounds, your program stops.&lt;br /&gt;&lt;br /&gt;We all accept that a defect costs more to fix the later it is discovered. If you have already checked in, it affects your coworkers. If you have already shipped you will have to go through a new release or public patch. The tenets of extreme programming are that if something is good, taking it to an extreme is probably better. So if we catch a defect in an assert while we are actually developing nothing could be better. If our coworkers "protect" their modules with asserts, we can be more confident that our use of their stuff is legitimate. And we can go faster, and have a more reliable system.&lt;br /&gt;&lt;br /&gt;I like to use a number of different kinds of asserts:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;External Interface (ASSERT_EI): validates that the parameters to a method that is expected to be used by other major modules or by the customer are legal and in range. But they should not be used to validate user input. This is a super-critical assert. The more of these the better.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Internal Interface (ASSERT_II): validates the parameters to methods that exist for good code structure and are intended for use only within the current module. Private methods would use these. They are for defensive programming, and reminding yourself what your assumptions were when you designed this method. They can provide a kind of documentation. These are less important than ASSERT_EI, but are great for debugging old code that you don't remember very well or that you didn't write. Again, the hope is that you can go faster. What if you took a short cut, and haven't finished a method for one use case. You can leave an assert such that if you accidentally use the method that way, you will obviously be reminded of the missing work, as opposed to having the method silently do nothing, or fail with some bizzare side effect.&lt;/li&gt;&lt;li&gt;Internal Consistency (ASSERT_IC): validates that the state of a class remains consistent as it is manipulated. It is an invariant check. The design of your module makes assumptions about its data structures (e.g. an data item is in a list only once). Assert this periodically. I like to add a SanityCheck method to almost every class I build. It executes all the invariant checks I can think of (and yes, it can be pretty slow). It is especially useful to sprinkle around if you are currently tracking down a bug. It makes sense to verify your invariants going into and out of a method. Centralizing those invariants in a SanityCheck function can be pretty useful.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;External Consistency (ASSERT_EC): I don't use this very much, since nicely modular systems should not have tight interdependencies. For example, your module may be dependent on the configuration file being parsed before it is initialized. An ASSERT_EC can check (and document) that assumption.&lt;/li&gt;&lt;/ul&gt;I've seen game programmers take a couple of positions against asserts:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;They slow down the execution of debug builds so much that the app becomes unusable. OK. It really is too slow. There should be easy ways to disable the asserts especially in heavily executed bits of code. In fact, one should think twice about putting asserts inside tight loops. Also, your assert implementation should be able to runtime skip the predicate based on runtime configuration files, and do it fairly efficiently (e.g. check a global boolean). Disabling per module would be a good start. Providing a level-of-detail argument to the asserts should be possible.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;I keep having to skip some assert in someone else's code that I don't understand, and that interferes with my workflow. Everyone should be careful not to use asserts to remind us of work that needs doing. Some other mechanism should be used. Or wrap those checks in a per-developer ifdef. On the other hand, if a developer is hitting asserts they don't understand, maybe they are breaking code they don't understand. Using either comments or change-control software, a developer should find the auther of the assert, and learn about that bit of the system, or get them to change the assert if it is now obviated.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;I changed one little thing and all these asserts starting going off. This one is pretty funny. Given that the asserts are there for a reason, there is a pretty good chance that the one little change had much further reaching implications than expected. For example: someone now wants to handle a member being NULL. But ASSERT_IC's start going off that the member shouldn't be NULL. The thing is, if the rest of the class was built assuming that member can never be NULL, it could easily derefernce the pointer without checking. An argument that "it should have checked for NULL" doesn't fly. The assumption was built in.&lt;/li&gt;&lt;/ul&gt;Adopting these "extreme" uses of asserts might even take the place of unit tests for that class.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-730136833505191759?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/730136833505191759/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/04/never-too-many-asserts.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/730136833505191759'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/730136833505191759'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/04/never-too-many-asserts.html' title='Never Too Many Asserts'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-4548916404564752539</id><published>2009-04-01T14:43:00.000-07:00</published><updated>2009-04-01T14:43:00.736-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Load balancing'/><category scheme='http://www.blogger.com/atom/ns#' term='Processing models'/><title type='text'>In Game "Transactions"</title><content type='html'>There are times when two players (or even other kinds of Entities) need to trade an object or perform some other action that is critical to occur transactionally. Generally, this is meant to mean that no matter what kind of error occurs, the transaction occurred or did not occur. E.g. I paid you 10K, and got a gold bar.&lt;br /&gt;&lt;br /&gt;Support for transactions needs to work across server hosts. It is a very disruptive experience to be able to interact in one spot, but not 2 feet to the left. You have to assume that the players can purposely crash one or the other host at the worst possible point in the interaction (&lt;a href="http://research.microsoft.com/en-us/um/people/lamport/pubs/byz.pdf"&gt;like Byzantine Generals&lt;/a&gt;). Assuming you can trust you database, the idea is simply to get both sides of the interaction saved to the DB simultaneously (i.e. in one DB transaction).&lt;br /&gt;&lt;br /&gt;What if the two Entities are on different hosts? You have to simultaneously change remote variables and get the persistence request from two places to the DB, blah, blah, distributed handshake, Byzantine...brain fry. They do this in Cobal for banking systems. How hard can it be? Well it is too hard to be worth it (if you include failure/backout/retry, etc), even if you think it would be a fun challenge.&lt;br /&gt;&lt;br /&gt;So here is a fantastic consequence of the non-geometric load balancing mechanism we've been talking about. Each simulator is single threaded (if you use multiple threads, you have multiple simulators). An Entity Behavior will execute without preemption. So straight-line code will run to completion, a DB save request can occur, and all is good. All you need to do is get both Entities onto the same simulator, and run that straightline code and co-persist the two local Entities (i.e. send the db save request with both entity's states). Boom. Done. Either the transaction makes it to disk or it doesn't. Not just half. This scenario is the origin of a lot of dupe bugs you've heard about.&lt;br /&gt;&lt;br /&gt;The fact that migration policy and mechanism are separate means that adding a new policy is easy. E.g. migrate the guy I'm about to interact with over here (or me over there). Once it finishes, the transactional behavior can be scheduled. If things are a little crazy, it may take a while, but it won't fail and start giving away money. And it won't tell the user the transaction succeeded when it didn't.&lt;br /&gt;&lt;br /&gt;Clearly, this is not something you want to do all the time. It would be too slow. Just for the stuff that *really* matters to the users.&lt;br /&gt;&lt;br /&gt;If you don't like the thought of your Entities migrating around, build an Escrow Entity. Place one side of the transaction into it and copersist. Migrate to the other side, place the other half in and copersist. Move out the half, persist, migrate, move out the other half, persist, done. If there is a failure at any point, the Escrow object is known about by the DB and it can continue, or return the goods. Just like a Lawyer, but not as expensive.&lt;br /&gt;&lt;br /&gt;Either of these approaches can support multi-Entity Transactions.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-4548916404564752539?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/4548916404564752539/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/04/in-game-transactions.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/4548916404564752539'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/4548916404564752539'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/04/in-game-transactions.html' title='In Game &quot;Transactions&quot;'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-8038801533219933311</id><published>2009-03-31T07:57:00.000-07:00</published><updated>2009-03-31T08:06:43.139-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Networking'/><category scheme='http://www.blogger.com/atom/ns#' term='Interest Management'/><title type='text'>Networking Resource</title><content type='html'>Here's an article with lots of links to classic networking techniques:&lt;br /&gt;&lt;a href="http://gafferongames.com/2009/01/25/game-networking-resources/"&gt;http://gafferongames.com/2009/01/25/game-networking-resources/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Here are some research papers leaning toward large scale issues:&lt;br /&gt;&lt;a href="http://maggotranch.com/biblio.html"&gt;http://maggotranch.com/biblio.html&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-8038801533219933311?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/8038801533219933311/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/03/networking-resource.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/8038801533219933311'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/8038801533219933311'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/03/networking-resource.html' title='Networking Resource'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-7238439507503306452</id><published>2009-03-27T11:03:00.000-07:00</published><updated>2009-03-27T07:57:19.920-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Load balancing'/><category scheme='http://www.blogger.com/atom/ns#' term='Scalability'/><category scheme='http://www.blogger.com/atom/ns#' term='Interest Management'/><title type='text'>Entity Migration Mechanism</title><content type='html'>From earlier posts, we see that to achieve scalability, we need:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;large numbers of simulators&lt;/li&gt;&lt;li&gt;an ability to load balance based on load (not just geography)&lt;/li&gt;&lt;li&gt;an authoritative simulator to avoid DB bottlenecks&lt;/li&gt;&lt;li&gt;a single-write paradigm to avoid overly complex synchronization&lt;/li&gt;&lt;/ul&gt;And since our world scales with the number of Entities, not the number of functions in the game, then load balancing, and thus, scalability is realized using Entity migration.&lt;br /&gt;&lt;br /&gt;Setting aside the policy and impetus for initiating an Entity migration, lets talk about the mechanics. By separating policy and mechanism, we can experiment or customize the policy to use application-specific information, resulting in a closer to optimal solution. And we won't have to reimplement the mechanism each time.&lt;br /&gt;&lt;br /&gt;We know we can run an Entity on any host by using interest management as discussed earlier to feed an Entity everything it needs to operate correctly. So all we really need to realize a migration is:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;getting the Entity state onto the new host&lt;/li&gt;&lt;li&gt;getting the data flowing to that host that is needed by that Entity&lt;/li&gt;&lt;li&gt;doing this quickly enough that there are no hiccups visible to the players&lt;/li&gt;&lt;li&gt;avoid all ordering and race conditions so there is no game logic difference compared to not migrating (no side effects)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;survive crashes of any component at the worst possible moment (i.e. preserve important transactionality) without significant impact to the players&lt;/li&gt;&lt;/ul&gt;First, we use a data-driven means to identify which state variables need to be transferred to the new host. There is no reason to transfer truly temporary variables, but there are reasons to transmit variables that are not needed in the persistent database. E.g. current target. There are many mechanisms to serialize an entity and reconstitute it. One challenging aspect is whether to transfer the execution context (e.g. the stack and program counter) if your simulator uses coroutines to support blocking and waiting in a Behavior script. For example, Stackless Python is famous for pickling coroutines and reconstituting them.&lt;br /&gt;&lt;br /&gt;There is a handshake needed to get this state across:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;suspend further execution of the entity so things don't change during the migration&lt;/li&gt;&lt;li&gt;transmit the state&lt;/li&gt;&lt;li&gt;recreate the entity on the target host&lt;/li&gt;&lt;li&gt;resume execution of the entity.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;Seems simple, but consider that update messages to players from the original and destination host might get out of order, the DB request queue might get backed up on the source (after all you are migrating away from a busy simulator) and save requests to the DB might get out of order, replicated state on the source and target might be at different versions (the entity may see a neighbor jump backward or forward in time).&lt;br /&gt;&lt;br /&gt;So we need to add some steps:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Get the target subscriptions set up and acknowledged before the transfer so when the Entity arrives, all data is available there that it had in its original location&lt;/li&gt;&lt;li&gt;Have the original simulator "flush" its DB queue so the DB never sees out of order persistence requests, and then stop persisting that entity until after the migration.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;increment an "epoch" counter to allow us to discard any replication messages or requests from the past&lt;/li&gt;&lt;li&gt;Given the increase in time and complexity, it may be worth optimizing the process by pre-loading the target host without actually pausing the original entity. Then once everything is set up, resend states that may have changed during the preload. Of course, you might also make use of any state that was previously replicated to the target.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;There a quite a few trick available and needed to get this to come out right and be efficient. But distributed transactions like this happen reliably in a lot of "critical" systems in other industries so it is quite solvable. You can see how the need for migration and the requirement to do it quickly without player-visible hitches requires us to adopt many of the design principles already accepted: authoritative simulator, interest management, data-driven persistence and replication traits describing entity state variables, ... All of these key features are intertwined, so if your systems goes off track somewhere there, you may be buying a lot trouble elsewhere.&lt;br /&gt;&lt;br /&gt;One of the coolest features of interest management is that you can choose to not migrate and the game still runs the same (but may use more datacenter-only networking). So if you can't migrate an entity until it finishes a behavior (because you can't migrate your stack), no problem, just wait. Program that into your policy. If you find that hitches are visible but only when a player is in a heavy combat situation, your policy can delay initiating the migration until the participants have been quiescent for a while.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-7238439507503306452?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/7238439507503306452/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/02/entity-migration-mechanism.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/7238439507503306452'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/7238439507503306452'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/02/entity-migration-mechanism.html' title='Entity Migration Mechanism'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-7923697960485442777</id><published>2009-03-16T09:00:00.001-07:00</published><updated>2009-03-19T10:37:20.022-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Software Engineering'/><title type='text'>Debugging leaked references</title><content type='html'>Everyone likes using smart pointers and reference counting to manage the lifetime of objects that are passed through modular interfaces. They can avoid a lot of copying and callers don't have to read a lot of documentation to figure out if they are responsible for deleting the object.&lt;br /&gt;&lt;br /&gt;One big problem with using ref counted objects is when they "leak". It is extremely difficult to debug what code has taken an extra reference to an object that you think should have a single remaining reference. Why didn't this object destroy itself when I cleared this "last" smart pointer? When an object is passed through many layers and stored in various containers, there can be quite a few increment/decrement references occurring, so putting in breakpoints is not very useful. I have found this to be one of the most difficult and tedious problems to fix in a large application like a game.&lt;br /&gt;&lt;br /&gt;Regular memory leak detectors are not too useful, since they just say "it leaked", not why. What you really want to know is the location of every outstanding reference to a block that you think should have already been deleted. You can ask this question at the end of a run on the unexpectedly outstanding blocks, or in the middle of a run when you have found something strange and want to backtrack.&lt;br /&gt;&lt;br /&gt;It is "easy" but not much use getting the memory address of every smart pointer that points at the problem object. Make multi-map keyed on object addresses. Each time you assign or clear a smart pointer (increment or decrement a reference on an object), modify that map to include or exclude the address of that smart pointer. Done.&lt;br /&gt;&lt;br /&gt;Making this "useful" is a matter of recording information about each smart pointer for later logging. Ideally, you would record a partial stack trace so you can see where the smart pointer was affected and what caused that. This requires another chunk of data beyond the smart pointer's memory location.&lt;br /&gt;&lt;br /&gt;If you are only concerned about extra references, you could probably get away with recording the stack trace only on addRef. Note that for each smart pointer, there is only a single outstanding reference, so you only need the one stack trace per smart pointer.&lt;br /&gt;&lt;br /&gt;If you have a situation where you have too many decRefs, you may need to record the stack trace of each addRef and decRef for the broken object. Garbage collecting that data is going to be interesting, because you can never be sure when an extra decRef might happen.&lt;br /&gt;&lt;br /&gt;Putting this together, you can use regular memory leak detection to identying unexpectedly outstanding ref counted objects. Then ask this ref count debugging system for the stack trace(s) of the where the smart pointers where initialized that are still holding a reference to that object.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-7923697960485442777?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/7923697960485442777/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/03/debugging-leaked-references.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/7923697960485442777'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/7923697960485442777'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/03/debugging-leaked-references.html' title='Debugging leaked references'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-240119393160757467</id><published>2009-03-13T11:47:00.001-07:00</published><updated>2009-03-14T12:39:45.929-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Software Engineering'/><title type='text'>What is a memory leak and why do we care?</title><content type='html'>I am a big fan of &lt;a href="http://www.ibm.com/developerworks/rational/library/06/0822_satish-giridhar/"&gt;Purify&lt;/a&gt;. It is a tool that helps programmers deal with the challenging memory systems of C and C++ . In those languages, we can "leak" memory.&lt;br /&gt;&lt;br /&gt;We try to get rid of all the leaks (usually a few days before we ship). I've seen this take quite a lot of effort. Superficially, it seems like wasted effort since modern operating systems keep track of your memory and recover it when your processes exits. So, really, we could just exit and kill the process. The app would shut down a whole lot faster.&lt;br /&gt;&lt;br /&gt;Why does that sound like such a bad idea? Why do we even care about leaks?&lt;br /&gt;&lt;ul&gt;&lt;li&gt;We don't want to have our process use more resources than it needs. It could crash or make other apps unhappy.&lt;/li&gt;&lt;li&gt;It makes for higher quality software. We are strictly disciplined about who owns what.&lt;/li&gt;&lt;li&gt;Managing the scope/lifetime of an object is occasionally very important. For example, we may be holding some I/O that needs to be flushed.&lt;br /&gt;  &lt;/li&gt;&lt;/ul&gt; There are two competing definitions of leak. Obviously I like the Purify definition better. I'm going to argue that the other (more common) definition is an oversimplification and a compromise:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Precise: a leak is an allocated block of memory that you don't have a pointer to anymore. So you can't ever delete it. A potential leak is one where you only have a pointer to an address somewhere in the middle of the block. This might happen if you do some funky pointer arithmetic, and intend to undo it to later delete the block. Anything else is referred to as an &lt;span style="font-weight: bold;"&gt;in-use &lt;/span&gt;&lt;span style="font-weight: bold;"&gt;allocation&lt;/span&gt;.&lt;/li&gt;&lt;li&gt;Traditional: a leak is a block that is still allocated (outstanding) when the application shuts down.&lt;/li&gt;&lt;/ul&gt;Most custom memory allocators use the traditional definition, because developers have no good way of providing the more precise definition. They require the application writer to build good quality code that carefully deletes all the memory it allocates such that none is left outstanding at the end of execution. Certainly that definition subsumes the Precise definition. The traditional definition is saying there are no leaks, potential leaks, nor outstanding allocations. Nothing. Period. How can you argue with that?&lt;br /&gt;&lt;br /&gt;The tools used are simple. They keep track of outstanding allocations, but don't and can't subtract precise leaks. They have neat ways of showing where the memory was allocated, and so on. The app writer finds all those outstanding allocations and carefully clears them as the application shuts down.&lt;br /&gt;&lt;br /&gt;But you don't super-need-to clear the in-use memory. It is not hurting anything. You could delete it any time, if you really wanted to (sounds like an addict). OK. There is one way it can hurt something. You could accidentally have lists or free-lists that bloat up and it will look like you really are using that memory. However, since it is in-use, you should be able to instrument your lists as watch them bloat up.&lt;br /&gt;&lt;br /&gt;You &lt;span style="font-weight: bold;"&gt;do&lt;/span&gt; super-need-to clear precise leaks. They are caused by overwriting or clearing your pointers without doing the deallocate first. If you don't plug those leaks, you are going to sink, or crash, or lose oil pressure or air pressure, or some other analogy.&lt;br /&gt;&lt;br /&gt;How could Purify possibly differentiate between precise leaks and in-use? Believe it or not, it watches every pointer assignment. And sees when the last reference to a block is lost. It does this with something called Object Code Instrumentation, so it doesn't care what compiler you are using. It inserts assembly code into your executable (e.g. around assignments), fixes up jumps and symbol addresses it changes when making room for the inserted code. It consequently &lt;span style="font-weight: bold;"&gt;knows &lt;/span&gt;what you have done or accidentally done to every pointer (including nasty things with pointer math).&lt;br /&gt;&lt;br /&gt;As a result it can focus the coders attention on blocks that are unreferencable. It can even (theoretically) throw a breakpoint at the instruction that overwrites the pointer. I &lt;span style="font-weight: bold;"&gt;know&lt;/span&gt; it can break at the instruction where you read an uninitialized variable. At any point during debugging, you can make a call and have it dump unreferencable blocks of memory and where they were allocated. Of course you can also dump all in-use blocks by calling a function when paused in the debugger. I believe you can make a checkpoint and then later dump all blocks that are in-use that were allocated since the checkpoint.&lt;br /&gt;&lt;br /&gt;If you insert some code in your custom allocator, the Purify SDK can even use your allocator to do its magic. You could make calls to it at run time to dump metrics or react to issues.&lt;br /&gt;&lt;br /&gt;As you can see, unreferencable blocks are the real leaks. We only have to clear out all in-use blocks because we don't use tools like Purify, and have overreact and compromise. I don't like the busy work of clearing every last legitimate in-use block. I don't think I should have-to-have-to. (As long as I'm careful about my "other" resources like I/O buffers.) It makes for better code if I do, but if I want to trade time for quality or one feature for another, I still have to clear the real leaks, and like the option of ignoring the other blocks swept up by the tradition definition.&lt;br /&gt;&lt;br /&gt;It has a bunch of other cool stuff too. Like knowing when you access memory that is uninitialized, or already deallocated. It knows when you run off the end of an array. It can instrument third party object code even if you don't have source, and it doesn't use your allocator.&lt;br /&gt;&lt;br /&gt;Of course there is a runtime performance hit, but it isn't super bad. And you can control which modules are instrumented, so you might exclude your rendering for example.&lt;br /&gt;&lt;br /&gt;It has a UI that integrates to Visual Studio and gives you a nice report of these various kinds of memory errors at the end of your run, or whenever you ask. It will even take you to the offending line of code.&lt;br /&gt;&lt;br /&gt;Don't balk at the price either. Just remember the amount of time the poor guy spent that was stuck with clear every last in-use block. It more than pays for itself &lt;span style="font-weight: bold;"&gt;very&lt;/span&gt; quickly.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-240119393160757467?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/240119393160757467/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/03/what-is-memory-leak-and-why-do-we-care.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/240119393160757467'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/240119393160757467'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/03/what-is-memory-leak-and-why-do-we-care.html' title='What is a memory leak and why do we care?'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-1470014553332301244</id><published>2009-02-28T12:18:00.000-08:00</published><updated>2009-08-27T11:56:09.416-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Load balancing'/><category scheme='http://www.blogger.com/atom/ns#' term='Processing models'/><category scheme='http://www.blogger.com/atom/ns#' term='Scalability'/><title type='text'>The Manifesto of Multithreading (for High Performance)</title><content type='html'>This document is a statement of principles that Emergent software developers adhere to concerning issues of concurrency, parallelism, multiprocessing and multithreading.&lt;br /&gt;Related Docs:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://twiki.emergent.net/bin/view/ServerDev/ServerProcessModel"&gt;http://twiki.emergent.net/bin/view/ServerDev/ServerProcessModel&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://cache-www.intel.com/cd/00/00/05/15/51534_developing_multithreaded_applications.pdf"&gt;http://cache-www.intel.com/cd/00/00/05/15/51534_developing_multithreaded_applications.pdf&lt;/a&gt; (Great comments on Granularity in section 3.2. Add comments on dynamic      agglomeration.)&lt;/li&gt;&lt;li&gt;Communicating Sequential Processes, C. A. R. Hoare. &lt;a href="http://www.usingcsp.com/"&gt;http://www.usingcsp.com/&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://en.wikipedia.org/wiki/Von_Neumann_architecture"&gt;http://en.wikipedia.org/wiki/Von_Neumann_architecture&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold;"&gt;Principles:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Multi-threading on a single processor is beneficial only in rare circumstances such as when we expect a program to block repeatedly, such as doing I/O.&lt;/li&gt;&lt;ul&gt;&lt;li&gt; When concerned about performance, this is unlikely to be the case, so we should see only one or two threads created per core.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;Operating system context switching between threads can be very expensive. It involves saving and loading processor state, tends to result in complete cache invalidation, and most expensively entails updating of OS process management data structures. For systems that provide virtual memory, reprogramming the MMU and swapping out kernel resources such as file pointers can make switching even more expensive. Some system calls can force a scheduling quantum to end.&lt;/li&gt;&lt;ul&gt;&lt;li&gt;Avoid context switching by keeping the number of threads low. Avoid system calls.&lt;/li&gt;&lt;li&gt;Decompose the problem into large segments to minimize the number of context switches and total overhead. This is most easily done by decomposing into as many pieces as possible, then using a policy to agglomerate them back into large sequential computations.&lt;/li&gt;&lt;li&gt;Reducing the number of threads will make the job of an SMP OS scheduler trivial and inexpensive. Ideally it could map a thread to a processor and leave it there indefinitely. Any load balancing needed can be accomplished by the application using app-specific knowledge without affecting the number of threads and their scheduling.&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;The measure of efficiency of using available multiprocessors is determined by how much processing time is wasted or spent idle.&lt;/li&gt;&lt;ul&gt;&lt;li&gt;For ease of measurement and tuning, an application should avoid consuming as much processing as possible, but should try to consume as little as possible, then idle, making inefficiencies much more visible.&lt;/li&gt;&lt;li&gt;Never use a spin lock.&lt;/li&gt;&lt;li&gt;Idle time of this sort can be filled by work scheduled to be done on processors that are already fully utilized or overloaded. This is the reason for load balancing and is one component of thread scheduling.&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;Processors interfere with each other’s efficiency when accessing shared data. This happens directly if the other processor hits a lock that is already acquired. If the blocked thread suspends, another thread on that processor may be able to take over. But that adds a context switch on top of the lock check. Mutual exclusion (mutex) mechanisms and other atomic instructions cause cache flushes on remote processors, or lock the entire memory bus. These costs are hidden when doing performance analysis since they don’t affect the requestor, and are not attributable to any single instruction on the remote processor (other than as apparently “spontaneous” cache misses). Even storage barriers used in out of order memory access modes can add overhead from cache effects.&lt;/li&gt;&lt;ul&gt;&lt;li&gt;Avoid the need for shared data. Either partition the data so it is exclusively used by one processor, or hand off entire blocks of data to a processor so the data has a single-writer at any one time.&lt;/li&gt;&lt;li&gt;Minimize the number of interactions with shared objects, as each interaction bears an overhead. This relates to the (TBD) agglomeration policy mechanism discussed above.&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;Even computations that only read data owned/written to by another thread concurrently must be guarded.&lt;/li&gt;&lt;ul&gt;&lt;li&gt;In some cases managing a replica of the data for remote readers will simplify programming (removing the need for the guards), and have other benefits similar to double buffering such as removing a serial sequencing.&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;Guarding disjoint blocks of code with a critical section mutex mechanism is error prone because the coordination code is not collocated. Overly conservative exclusion will impact performance. Surprising interactions (side-effects or reading shared objects) can lead to errors/races.&lt;/li&gt;&lt;ul&gt;&lt;li&gt;Avoid code-based mutual exclusion (critical sections) for commonly/widely accessed systems. If it is the “only” approach, consider centralizing it (e.g. single reader and writer), as opposed to requiring almost every method of an object to grab the mutex.&lt;/li&gt;&lt;li&gt;Consider whether the object being guarded must be a singleton or can be replicated per thread (in which case, no exclusion is required).&lt;/li&gt;&lt;li&gt;Don’t confuse thread exclusion with reentrancy.&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;Locks on data or code blocks that require more than one resource to be locked can lead to deadlocks, or priority inversion or self-blocking. If the deadlock is of low probability, it may not be observed in testing. Even though some consider it easy to detect and fix a deadlock, they are very risky since it is so hard to guarantee they don’t exist.&lt;/li&gt;&lt;ul&gt;&lt;li&gt;If you find yourself resolving conflicts from multiple locks, it is time to redesign the system.&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;Lock contention tends to not gracefully degrade in terms of performance and fairness/starvation. More sophisticated locking is required such as a ticket mutex for fairness. Simpler locks consume even more resources, mask their caller’s logical idleness, and cause memory performance side effects as threads busily contend for the lock.&lt;/li&gt;&lt;ul&gt;&lt;li&gt;Avoid high contention resources. Duplicate them, or rethink whether they are as shared as you think.&lt;/li&gt;&lt;li&gt;In the spirit of CSP (Communicating Sequential Processes), assign the ownership of the resource to a single thread and use message communication to access it.&lt;/li&gt;&lt;li&gt; Before devising a more sophisticated and special purpose mechanism or data structure to address the high contention, reconsider the larger problem. Often a more coarse approach using CSP will fit better with the rest of the system. The task to be parallelized may not turn out to significantly contribute to overall performance. Optimize globally.&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;Scheduling preemption should not be relied on to provide performance. Each forced context switch is an unwanted overhead. Near 100% of the CPU can be used by a single thread that has enough load assigned to it. Preemption should only be used for rarely invoked latency sensitive functions like I/O or for long running low priority background processing where the preemption rate can be very low.&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold;"&gt;Guidance:&lt;/span&gt;&lt;br /&gt;Hoare’s Communicating Sequential Processes (CSP) can be used to solve any concurrency problem (any concurrent computation can be realized/reimplemented as CSP) and has a very easily understood mental model. A logical process (LP) contains its own state variables and performs computation concurrently to other LPs. LPs interact only via messages whose content and ownership is atomically transferred as it is sent.&lt;br /&gt;&lt;br /&gt;By leaving the computation sequential, and avoiding all concurrency within each sequential process, algorithm development returns to the familiar Von Neumann Architecture. This approach is better for our customers as we do not require them to be concurrency experts when coding. It is better for Emergent since not all the developers need to be aware of the concurrency consequences in various systems they are less familiar with. Performance analysis becomes trivial, since algorithms are sequential, and message queuing can be analyzed to inspect workload. Critical paths and idealized parallelism analysis of the communication graph can be used to determine if a better load balance is possible, or if the problem itself needs to be further decomposed to realize a performance gain, or if more processors would improve performance.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;A thread is treated as a CSP logic process. While it may share a single address space with other threads, by policy it does not share any writeable  (and ideally, any readable) data.&lt;/li&gt;&lt;li&gt;The only point of concurrency is the message queue. A message with its content belongs to one logical process xor another.&lt;/li&gt;&lt;li&gt;A large amount of data that needs to be “shared” among threads is handed off in a message. Copying can be avoided by using a pointer in the message to the transferring data and adopting the policy that the data “belongs” to the message once it is attached, and belongs to the target logical process once it arrives. This effectively uses SMP shared memory as a communication medium.&lt;/li&gt;&lt;li&gt;The principles of CSP apply equally well in a multi-threaded shared-memory environment as in a multi-process SMP, in a NUMA or in a distributed processing environment. This future-proofs our software and allows reconfiguration and performance tuning by changing policies without rewriting code. This addresses application design and load changes as it evolves.&lt;/li&gt;&lt;li&gt;Minimizing the number of LPs reduces context switch overhead but requires better load balancing algorithms. A good static balance based on the expected application behavior can extract quite a lot of CPU capability, especially when the workload does not fluctuate very much over time. This should be a design goal even without considering concurrency to avoid frame rate jitter. Dynamic load balancing can use recent history to predict a better near term balance. However, to do this there is work-migration overhead. This takes the form of per-task “global” scheduling onto available processors (less desirable as this is a continuing cost), or periodic analysis, extracting and transferring workload between LPs. Note that in most cases it is not worth the effort of eking out the last amount of performance with sophisticated load balancing techniques.&lt;/li&gt;&lt;li&gt; Mapping similar work to an LP increases the likelihood of instruction cache hits. So when agglomerating small tasks, create large lists of similar tasks as opposed to doling them out round-robin.&lt;/li&gt;&lt;li&gt;Always optimize at the end. Getting perfect parallelism out of a system that is only 10% of the application computation is probably a waste of effort.&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold;"&gt;Other Concerns&lt;/span&gt;&lt;br /&gt;Note that strict CSP suffers from communication overhead. For systems which are extremely fine-grained and latency sensitive, a custom approach might be considered if the extra work and risk are justified. But it should be sufficiently encapsulated to avoid having its tradeoffs unknowingly cause performance side effects elsewhere.&lt;br /&gt;&lt;br /&gt;Concurrency for utility reasons like background loading can also benefit from techniques used for high performance concurrency, but being less performance sensitive, the benefit is primarily in reuse of common facilities and ease of development.&lt;br /&gt;&lt;br /&gt;The CSP focused approach and infrastructure can be reused trivially to implement most kinds of parallelism: functional parallelism, data parallelism, master/worker thread parallelism, pipelined parallelism, etc. It is not appropriate for very fine grained parallelism such as parallel loops, or SIMD computation but that should be rarely needed when considering the application as a whole.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-1470014553332301244?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/1470014553332301244/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/02/manifesto-of-multithreading-for-high.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/1470014553332301244'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/1470014553332301244'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/02/manifesto-of-multithreading-for-high.html' title='The Manifesto of Multithreading (for High Performance)'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-4770611829884951816</id><published>2009-02-01T10:33:00.000-08:00</published><updated>2009-02-16T15:20:28.568-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Processing models'/><category scheme='http://www.blogger.com/atom/ns#' term='Scalability'/><title type='text'>The Simulator is Authoritative: DB-Centric  Sucks</title><content type='html'>I have seen several persistent online games that chose to rely on the game state persistence database to manage the concurrency that results from multiple simulators which are needed for scale. Their idea is to use DB abort-retry semantics to serialize all access to the Entity states. This sort of works and is pretty easy to implement, but introduces too many problems to justify it.&lt;br /&gt;&lt;br /&gt;The first problem to overcome is that read-modify-write for every state change through to the DB would be prohibitively slow. Such things would need to be in a single transaction otherwise you'd have race conditions for concurrent access by multiple simulators. (See our comments on single-writer). I suspect DB-centric implementations wind up adopting a single-writer policy to try to avoid this problem, leading to the question of why be DB centric in the first place?&lt;br /&gt;&lt;br /&gt;There are a couple of tantalizing benefits to DB centric that might make it seem attractive. The one I've seen most closely is that it makes Entity migration very simple. A simulator unloads the Entity; the DB becomes the only copy of the Entity; the new simulator locks the ownership of that Entity and loads it. Using a DB in this way, race conditions on this hand off are impossible.&lt;br /&gt;&lt;br /&gt;There are hidden costs of a DB-centric approach for migration. Supporting Entity migration through the DB means that *every* Entity state property must be persisted to the DB, otherwise the restoration onto the destination simulator will effectively "reset" the Entity. This can have significant undesirable performance implications, since data that is not needed for longer term persistence (e.g. when the player logs back in next weekend) must be written through to the DB in case the Entity migrates. It also means that if an NPC or other Entity that is not persisted across shard shutdown must also be persisted if it needs to be migratable. These things result in wasted space and wasted DB throughput. In my experience, DB throughput is the limiting factor in scaling a shard. Please, please, run screaming from DB-centric!&lt;br /&gt;&lt;br /&gt;Another justification is if a simulator crashes (and don't fool yourself, they *all* do!), then very little work is lost because every change is being written through. But consider the complication of cross-simulator transactions. For Entities to interact that are on different simulators, every DB interaction in the cluster must be serialized, and that can get super-slow. I judge that for almost everything, players won't quit over losing a few minutes of game play.&lt;br /&gt;&lt;br /&gt;At this point the DB-centric guys object and say: well, actually we don't pay the write-to-DB round trip, we use a distributed database with local caching in-memory for high-performance and more immediate local access. The problem there is that if the local in-memory DB crashes, you lose data anyway. And worse, the data that was persisted to disk may not be a shard-wide consistent snapshot of the Entities' states.&lt;br /&gt;&lt;br /&gt;I have observed many DB failures (mainly user error or hardware failure), even with high-cost Oracle installations. A DB admin does a query, or a developer writes some "custom" query, and it has unexpected performance implications, or an unsuspected table deadlock that only shows up at full load. The DB engine detects this deadlock (after a while), but there is a significant hiccup. A DB centric game shard locks up completely, since the simulators are not authoritative. Conversely, a simulator-centric architecture allows the shard to keep running even with the DB shut down! This is an MMO operator's dream come true and can be used for things like defragmenting table space or other maintenance like backups.&lt;br /&gt;&lt;br /&gt;Note that DB latency would live on the critical path of responsiveness in many cases. We've also put unnecessary extra load on the DB. Given that DB response times can be quite variable when there is a mixture of different query types, this puts a lot of pretty difficult consequences on the rest of the system.&lt;br /&gt;&lt;br /&gt;Bottom line: make the Simulator authoritative even over the game state database. The DB is just a backing-store for use when the shard restarts or a player logs back in. The DB holds just the part of the world that needs to persist. This obeys a rule of thumb for good scaling, particularly in distributed systems: make performance controlled by configuration, not by the application data. In this case, the rate of persistence and the resulting amount of lost game play on a simulator crash is controlled by a configuration file and is tunable no matter what the load on the shard. A DB-centric approach is at the mercy of the number of players, number of Entities, the rate of change of Properties determined by Entity Behavior scripts and all kinds of other things that are definitely not independently tunable.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-4770611829884951816?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/4770611829884951816/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/02/simulator-is-authoritative-db-centric.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/4770611829884951816'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/4770611829884951816'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2009/02/simulator-is-authoritative-db-centric.html' title='The Simulator is Authoritative: DB-Centric  Sucks'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-5792700509995598232</id><published>2008-12-12T13:00:00.000-08:00</published><updated>2008-12-12T14:43:16.652-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Processing models'/><category scheme='http://www.blogger.com/atom/ns#' term='Scalability'/><title type='text'>Single Writer Authoritative Simulator</title><content type='html'>We can now see how an Entity can have a local cache populated with all the remote Entity proxies it needs by using Interest Management and data distribution over a publish and subscribe message system.&lt;br /&gt;&lt;br /&gt;One tough problem remains. What do you do about race conditions if two simulators modify the same Property on an Entity at around the same time? Lightly digging into this reveals a decent solution where such writes would be resolved at a central location and distributed from there so all consumers see the same ordering. However, allowing multiple writers to a single Property can lead to inconsistencies that need sophisticated transaction management.&lt;br /&gt;&lt;br /&gt;The easier approach is to disallow multiple writers. Ensure that all properties of an Entity are modified only by that Entity. Any other Entity that wants to make a change must send a request. This boils down to "Communicating Sequential Processes", and is a well-understood computer science paradigm. Normally the Entity stays on one host for a good period of time, and the Entity is said to be Owned by the associated simulator.&lt;br /&gt;&lt;br /&gt;The owning Simulator is said to be the authoritative simulator. All computation that affects that Entity is performed on the owning simulator. The values it computes are pushed out to other interested simulators where they become proxies/reflections/replicas, and are read-only.&lt;br /&gt;&lt;br /&gt;The single writer paradigm allows a junior game content developer to remain blissfully unaware of concurrency. They think about one Entity at a time. An interaction with another Entity is not trying to read or write to a concurrently evolving set of variables. Instead it is sending a request to the other Entity, which will eventually get around to handling the request sequentially. The developer can think in single-threaded terms. Yay! In fact, the simulator is also made single-threaded so there can be no mistakes (note this still leaves ways to make good use of multiple cores).&lt;br /&gt;&lt;br /&gt;The behavior that is running on an Entity is able to immediately read any of the Properties of Entities in which it has already expressed an interest. Since the simulator is single threaded, this can be done without locks. The properties of the proxies are only updated when the message system is ticked, and since the simulator is single-threaded, that is done after the Entities are done executing. Note that because we use state push, the property values of the proxies have the lowest latency  *possible*.  We can also apply latency hiding techniques to further improve the proxy's estimate of the value on the authoritative simulator.&lt;br /&gt;&lt;br /&gt;All this results in a very accurate and familiar representation of a computing environment that appears to have all Entities on the same machine. But since it is actually distributed, its performance will scale. The distributed nature is abstracted away without impacting the developer.&lt;br /&gt;&lt;br /&gt;If you are thinking about multi-entity transactions, you'll have to wait for it...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-5792700509995598232?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/5792700509995598232/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2008/10/single-writer-authoritative-simulator.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/5792700509995598232'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/5792700509995598232'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2008/10/single-writer-authoritative-simulator.html' title='Single Writer Authoritative Simulator'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-6033142845213756290</id><published>2008-12-08T11:59:00.000-08:00</published><updated>2009-02-16T15:19:55.105-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Load balancing'/><category scheme='http://www.blogger.com/atom/ns#' term='Scalability'/><category scheme='http://www.blogger.com/atom/ns#' term='Interest Management'/><title type='text'>Publish/Subscribe Message Delivery</title><content type='html'>In a previous post, I argued that publish/subscribe was the only tricky thing needed for a totally flexible interest management based online game system.&lt;br /&gt;&lt;br /&gt;Publish/subscribe (producer/consumer) based message systems give semantics similar to multicast. A producer sends a message to a channel. All current consumers on that channel receive a copy of that message. To avoid becoming broadcast (where every consumer receives every message sent), the messages are decomposed into channels using a Category, one per channel (so you can think of it as a channel id). A Category is an integer so they are trivial to deal with at the lower level (as opposed to strings or something). For simplicity, each message is only sent to one Category.&lt;br /&gt;&lt;br /&gt;This system is very loosely coupled giving it a lot of flexibility and extensibility. A producer does not need to know the existence of any of the consumers. The set of consumers and their implementation can change without touching the producer. For example, a logging system could be attached to a channel without affecting the system, and would give good data for debugging.&lt;br /&gt;&lt;br /&gt;To implement the publish/subscribe system efficiently, we must manage the producer and consumer subscription requests efficiently. Broadcasting that a consumer is interested in some Category to each producer is too inefficient. So we introduce the notion of a channel manager that keeps track of the interests of all producers and consumers.&lt;br /&gt;&lt;br /&gt;The channel manager is responsible for redistributing each data message. A producer sends a message to the channel manager. The channel manager maintains the list of interested consumers, and forwards a copy of the producer's message to each consumer. We have exchanged the non-scalable broadcast of subscription messages for an extra hop of latency for each data message.&lt;br /&gt;&lt;br /&gt;The channel manager can easily be made scalable. The simplest approach is to use the integer Category value and a simple a modulus operation to load balance across any number of channel manager processes. Both producers and consumers use the same computation. And all subscriber messages and all data messages on one Category travel through a single channel manager.&lt;br /&gt;&lt;br /&gt;This architecture is the obvious one. There are more sophisticated approaches that can reduce the two hop latency by using a direct connections between producers and consumers. The subscription messages still need to route through a channel manager, but the producers need to maintain the list of interested consumers. This adds the requirement that producers subscribe to produce, and adds more subscription messages and more latency on a subscription. There are also subtle data message ordering problems.&lt;br /&gt;&lt;br /&gt;If you want to go nuts, you could use real multicast. The challenge there is that there are limited numbers of multicast groups. So you have to solve the problem of multiple channels sharing one multicast group.&lt;br /&gt;&lt;br /&gt;So you get to choose. Easy implementation or optimized but tricky implementation. Like most code. In this case I argue that the simple approach has good enough performance for the needs of online games. The producers and channel manager live in a data center on hosts attached to a high speed switch, so network latency is minuscule.&lt;br /&gt;&lt;br /&gt;The design philosophy of this system is to minimize unnecessary computation due to unwanted messages arriving on a host that are just thrown away. Hosts cost money. Bandwidth inside the data center is free. So good interest management is key.&lt;br /&gt;&lt;br /&gt;So. We have sliced off the publish/subscribe problem. All we have left is how to approach interest management policies which are application specific.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-6033142845213756290?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/6033142845213756290/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2008/12/publishsubscribe-message-delivery.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/6033142845213756290'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/6033142845213756290'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2008/12/publishsubscribe-message-delivery.html' title='Publish/Subscribe Message Delivery'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-2072118306714015462</id><published>2008-12-03T11:42:00.001-08:00</published><updated>2009-02-16T15:19:27.980-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Processing models'/><category scheme='http://www.blogger.com/atom/ns#' term='Scalability'/><title type='text'>Where are memory buses going?</title><content type='html'>In the 80's and 90's almost all supercomputers were distributed memory systems. Hypercubes, meshes, and a very few SIMD machines. Any concept of a shared address space was simulated. If there was limited support for remote memory access it was through slow and complex transport systems (one example is the BBN Butterfly).&lt;br /&gt;&lt;br /&gt;Recently we see desktop machines with "many" cores. For ease of use, these are symmetric multiprocessors. Each processor is equally able to access any address. The interconnect is referred to as a bus, even when not physically implemented that way. There are sophisticated cache coherency mechanisms and inter-processor synchronization instructions which "lock the bus", or invalidate remote cache lines to make it possible to have atomic access to a line of memory for at least one operation (e.g. atomic increment or swap).&lt;br /&gt;&lt;br /&gt;But these approaches don't scale (in the computer science sense). Even a "bus" that is a token ring or other network-like transport can only scale so far. Maybe 32 processors. I've seen SGI Origin 2000 and Sun Dragon machines (admittedly in the late 90's) that scaled this large and were still (mostly) symmetric. They used what amounted to a packet switched network and distributed systems techniques to provide atomicity and coherency.&lt;br /&gt;&lt;br /&gt;Regardless, the most efficient use of these machines determined by emperical study (and common sense) was to segregate the memory disjointely among the processors. This made the caches more effective since they accessed less of the address space (avoiding issues with Translation Lookaside Buffers), and avoided synchronization issues. One must keep in mind that doing an atomic operation even when there is no current contention can dramatically affect N-1 other processors because the operation flushes the bus or remote cache lines, etc. In the end, we tended to not make use of the symmetry aspects.&lt;br /&gt;&lt;br /&gt;So people now talk a lot about Non Uniform Memory Access. For example, blocks of RAM are tightly associated with a processors or small # of cores, but there is also a "global" bus that allows access to the entire address space of the machine. So you have the appearance of a multiprocessor machine, but some addresses are a lot slower to access. The right way to use these machines is identical to what we used to do. Have disjoint blocks of memory per processor (or tightly couple set of cores).&lt;br /&gt;&lt;br /&gt;What is interesting about this evolution is that you can see it is moving hardware architecture toward a distributed computing model. The memory "buses" themselves are networks that can have multiple in-flight packets containing memory access requests. Some will have routing or bridging between different disjoint buses/networks within the one machine. But to effectively use this architecture it must be programmed as a distributed system.&lt;br /&gt;&lt;br /&gt;Fortunately, we know how to do that. You use a collection of processes (distinct address spaces), and pass messages (optimized to use the high speed memory access bus/net). Communicating Sequential Processes. The beauty here is that such a software system can much more easily be tuned and reconfigured than a "monolithic" multithreaded application as hardware specs change (more processors, different local/remote memory access speeds...).&lt;br /&gt;&lt;br /&gt;If you step back another step and think about physics, you can also easily convince yourself that this evolution is permanent. How much compute power can fit into a cubic block of space? It is limited by distance, heat, complexity density... The only way to "grow" that computing power will eventually be increasing the amount of space consumed. In terms of computer science scalability (i.e. taking it to the extreme), space grows as N^3. So we can see that at best, communication speed (the distance/radis to the furthest part of the one computer) and delay, factoring in the speed of light, will grow linearly while computing power will grow as the cube. Thus the scaling eventually is dominated by the communication. Direct communication and *no* synchronization would give the best performance. So we can conclude that distributed memory systems connected with network (even if they act like memory buses) will provide optimal performance.&lt;br /&gt;&lt;br /&gt;That is where we are going. I say develop our software with that in mind. Threading seems like a good idea, but it is really a cheap hack that allows two separate processes to have a bunch of shared datastructures. Eventually those shared datastructures will have to be synchronized over a longer distance, so lets start doing that (e.g. create duplicates, watch for edits and send out updates). Using application specific knowledge this can be done *much* more efficiently than a symmetric memory system can, which sends every changed byte and more.&lt;br /&gt;&lt;br /&gt;The connection to online games? I've just described a distributed object replication system. And that is the basis of the scalable online game architecture I've been outlining the whole time.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-2072118306714015462?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/2072118306714015462/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2008/12/where-are-memory-buses-going.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/2072118306714015462'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/2072118306714015462'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2008/12/where-are-memory-buses-going.html' title='Where are memory buses going?'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-5092493157917395527</id><published>2008-10-19T16:00:00.000-07:00</published><updated>2009-02-16T15:18:49.796-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Scalability'/><category scheme='http://www.blogger.com/atom/ns#' term='Interest Management'/><title type='text'>Back to First Principles: Interest Management</title><content type='html'>When you think of an Entity needing to interact with its environment, you don't tend to think about arbitrary lines running through the geometry. (There is no colored line on the ground between countries). In fact, players delight in trying to find those exceptions and do things while standing on either side, or jumping back and forth as fast as possible.&lt;br /&gt;&lt;br /&gt;The way to think of the problem is that geometric decomposition is solely to support load balancing. Stuff on this side runs on this host, stuff on that side runs on that host. Much of the rest of the system just takes advantage of that assumption. (And it is not such a great assumption.)&lt;br /&gt;&lt;br /&gt;But what if we ignore load balancing, and just think of the Entities all over the place trying to interact? At the extreme, each Entity would be on its own host. Now we have classical distributed systems problem, and can tap into that knowledge.&lt;br /&gt;&lt;br /&gt;Distributed object technologies, like CORBA, hide the fact that some objects are remote by using a local smart-proxy. Interactions by a locally owned/executed Entity with the local proxy are forwarded to the remote-original object. The big problem here is that CORBA can block the requestor, and the request has a round-trip latency.&lt;br /&gt;&lt;br /&gt;The better way to solve this is to ensure the local proxy is already up to date before the local Entity starts interacting. This allows the proxy's state to be as accurate as physically possible (the local proxy is at most out of date by a one-way network latency unit of time).&lt;br /&gt;&lt;br /&gt;Now we have to solve the interest management problem. The system wouldn't scale if we broadcast Entity updates (in both network consumption and in space for storing the proxies). Here we rely on a few restrictions that we think are not too onerous. The Entity must declare what it is interested in, and it must never write directly to a local proxy.&lt;br /&gt;&lt;br /&gt;The simplest interest management approach is to break the world into tiles. If an Entity can see into a tile at all, it is interested in all of that tile. If another Entity is currently located in that tile, it publishes its state updates to that tile. Using a publish-subscribe communication mechanism, all interested Entities consume every Entity's state that they can see. (There are much more interesting interest management approaches we will discuss later).&lt;br /&gt;&lt;br /&gt;The result is, we don't have the nasty load balancing problems of other systems. The host on which an Entity is running doesn't matter. Two Entities can interact with each other no matter where they are hosted. And the simulation operates the same way it would in other systems.&lt;br /&gt;&lt;br /&gt;The only remaining technical challenge is building a publish/subscribe system that is reliable and efficient.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-5092493157917395527?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/5092493157917395527/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2008/09/back-to-first-principles-interest.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/5092493157917395527'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/5092493157917395527'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2008/09/back-to-first-principles-interest.html' title='Back to First Principles: Interest Management'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-6329505037052461619</id><published>2008-09-18T13:00:00.000-07:00</published><updated>2008-09-18T15:26:47.825-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Load balancing'/><title type='text'>Forced immediate migration is nasty</title><content type='html'>Let's say we have a solution that somehow doesn't suffer from poor decomposition of space (too many small pieces, or no way to break the load into balanceable pieces -- i.e. having overloads). There is still another very difficult technical problem. When an Entity just crosses a geometric boundary, some of these systems will require the Entity to migrate onto the new host *immediately*. That is because there are assumptions built in elsewhere about where to look for an Entity, or how far an Entity can see/be seen.&lt;br /&gt;&lt;br /&gt;The problem with a forced immediate migration is that an Entity might be working on something tricky just then. Some use cases and consequences:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Engaged in battle; delays/hitches during marshal/transmit/unmarshal during one of the most critical gameplay experiences&lt;/li&gt;&lt;li&gt;Running a script; how to pack up an in-flight Lua or Python script? Turns out Stackless Python supports in-flight script pickling. Another option is to write your own language, and build pickling in your Virtual Machine. Pretty complicated, and possibly slow. What about temporary or global variables; external references. I believe Eve does this, but I'm not sure if they do in-flight migration.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Being persisted to the DB, running a financial transaction; anything considered critical and fault sensitive should not be made even more complex by injecting a synchronous but distributed action. You are asking for deadlocks and race conditions in what is by definition the most critical aspect of the system.&lt;/li&gt;&lt;/ul&gt;There are two possible solutions that both allow a simpler migration system, but as a very beneficial side-effect allow Entities to interact across host boundaries:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Do everything event-oriented. This means that there are never any Behaviors outstanding at the end of ticking an Entity. When the migration service runs, each Entity has become just a set of Properties. The problem with this is that content developers find event oriented programming confusing and complicated. They have to explicitly manage a logical context (what is this Entity doing over a period of several events?), or fall back to a state-machine mechanism that adds a different kind of complexity (and more tools). To make it worse, you can still have race conditions (who opened that chest first, vs. who pulled out the loot?).&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Don't migrate immediately. I think this is the silver bullet, and it is possible to realize. Even more interestingly, it is possible to *never* migrate, and that opens the door to using Entity Behavior technology that is not possible to migrate (e.g. C/C++ running on a Posix thread with pointers hanging out all over the place; computation that is only sensible to run on certain kinds of hardware). And the thought of not paying a migration performance penalty is kind of tantalizing. I'm going too far. In practice you would want to do the migration; there are tons of benefits.&lt;/li&gt;&lt;/ul&gt;Again, I'm going to make you wait a bit longer before I tell you how all this can work.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-6329505037052461619?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/6329505037052461619/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2008/09/forced-immediate-migration-is-nasty.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/6329505037052461619'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/6329505037052461619'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2008/09/forced-immediate-migration-is-nasty.html' title='Forced immediate migration is nasty'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-3202238963629318842</id><published>2008-09-18T10:16:00.001-07:00</published><updated>2009-02-16T15:17:46.688-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Load balancing'/><title type='text'>More on Geometric Decomposition</title><content type='html'>I need to amplify something in my &lt;a href="http://onlinegametechniques.blogspot.com/2008/09/geometry-decomposition-is-bad-idea.html"&gt;previous comments about geometric decomposition&lt;/a&gt;. I said "there is a practical minimum size limit to even the dynamic splitting of a region. What does that mean?&lt;br /&gt;&lt;br /&gt;Decomposition doesn't affect the distance that a character can see, or the number of other Entities that it can see or interact with. That is controlled by tuning and the game design. If someone wants every Entity in the game to be within 5 m of each other, then every Entity will see every other one.&lt;br /&gt;&lt;br /&gt;However, that doesn't have anything to do with *load leveling*. Load leveling is the decision about where to execute the behavior for an Entity. There is no reason that the same host must execute all the Entities in one geometric area. There are many fairly easy ways to do that computation and get the same answer whether the two Entities are owned/executed on the same or different hosts.&lt;br /&gt;&lt;br /&gt;In our flash-crowd example, we have actually made things worse by decomposing into smaller pieces of space, and mapping each to a different host. Now when an Entity moves just a little way, it may have to migrate to a new host because it cross a boundary. So our Entity migration rate goes through the roof, costing computation and communication overhead. I can show that it is unnecessary overhead, and unfortunately is applied at the point in the virtual world that is the most busy.&lt;br /&gt;&lt;br /&gt;Someone may argue that Entities that are near one another *must* be on the same host to interact. That assumes they are directly reading *and* writing one another's state variables. The problem is that this approach precludes the direct interaction of any two Entities that are across a border (and maybe that border is shifting around if you do this dynamically). A designer, and a player wouldn't understand why there were some places they couldn't do some things. Poor ease of use, mental models of computation that are too complicated.&lt;br /&gt;&lt;br /&gt;It turns out that there are pretty simple approaches that allow Entities to interact transparently across hosts. That fact makes all the difference.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-3202238963629318842?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/3202238963629318842/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2008/09/more-on-geometric-decomposition.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/3202238963629318842'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/3202238963629318842'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2008/09/more-on-geometric-decomposition.html' title='More on Geometric Decomposition'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-2686073299385790776</id><published>2008-09-18T10:16:00.000-07:00</published><updated>2009-06-16T11:08:21.958-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Off topic'/><title type='text'>Off Topic: Recycle Aluminum Cans?</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://lh3.ggpht.com/ObviousDWest/SNKz1VfnWNI/AAAAAAAAAC0/fph_aAntb0s/PICT0132.JPG"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer; width: 200px;" src="http://lh3.ggpht.com/ObviousDWest/SNKz1VfnWNI/AAAAAAAAAC0/fph_aAntb0s/PICT0132.JPG" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;A quick Google search will show a ton of pages that claim something like "recycling one aluminum can will save enough energy to power a 100 watt light bulb for almost 4 hours". That is actually an awful lot of energy. You could go burn your hand on it any time for 4 hours. Such statistics (all statistics?) make me go "what? really?".&lt;br /&gt;&lt;br /&gt;Probably what they mean, since it would be the easiest to measure/compute is power-to-create-from-bauxite minus power-to-create-from-a-can. That doesn't even begin to answer the question.&lt;br /&gt;&lt;br /&gt;What about mining and delivery of the bauxite, and other overheads on that end? What about the energy used to collect or separate the cans, and bring them to the plant? I assume things like cleaning and sterilizing are included in power-to-create-from-a-can. And I would discount the human cost of picking them out of the trash, or dropping them in the recycling bin.&lt;br /&gt;&lt;br /&gt;Just try to find that information with Google! Or even determine which data is included in the quoted stat.  The numbers and wording make me wonder whether all the pages are quoting each other. Maybe the first guy to say it made it up.&lt;br /&gt;&lt;br /&gt;Here is a skeptic:&lt;br /&gt;&lt;a href="http://www.perc.org/pdf/ps28.pdf"&gt;http://www.perc.org/pdf/ps28.pdf&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Back to stuff I know something about...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-2686073299385790776?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/2686073299385790776/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2008/09/recycle-aluminum-cans.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/2686073299385790776'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/2686073299385790776'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2008/09/recycle-aluminum-cans.html' title='Off Topic: Recycle Aluminum Cans?'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/ObviousDWest/SNKz1VfnWNI/AAAAAAAAAC0/fph_aAntb0s/s72-c/PICT0132.JPG' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-1060280070209828504</id><published>2008-09-11T16:08:00.000-07:00</published><updated>2008-09-18T15:25:15.759-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Load balancing'/><title type='text'>Geometric decomposition is a bad idea</title><content type='html'>The computational load in a game scales with the number of Entities. The more NPC's and players, the more the servers have to do.&lt;br /&gt;&lt;br /&gt;Load balancing is a good thing because having idle hosts is a waste of money. You'd rather not have bought the hardware, and paying for power, A/C, and maintenance on an unused host is pointless. Ideally, you would have the same (full) load on each host.&lt;br /&gt;&lt;br /&gt;Consequently, load balancing is all about mapping Entities to hosts. Dynamic load balancing is about migrating Entities to new hosts.&lt;br /&gt;&lt;br /&gt;Many MMOs use the naive approach of decomposing their world into chunks/zones, and then mapping those to different server hosts to provide some load balancing.  Smarter ones break the world into many more pieces than there are hosts, and rely on probabilities to provide some kind of load distribution. Really smart ones dynamically decompose and coalesce pieces of geometry as they fill up and empty out.&lt;br /&gt;&lt;br /&gt;However, these systems will still face the "flash crowd" problem. "Hey, dudes, there's a blue dragon downtown! Let's go see Thresh get thrashed!". And suddenly 500 players are standing within 100m of each other. Even the dynamically adjusted systems have a lower limit on the useful decomposition of geometry. If you slice down to 10m pieces, you will still be interacting with all the pieces within 100m and all the hosts running them.&lt;br /&gt;&lt;br /&gt;It would be better to load balance based on load. Distribute the Entities evenly among all available hosts. The challenge with this approach is how do you interact with Entities that are nearby in the game world if they are running off on some random host? More on that...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-1060280070209828504?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/1060280070209828504/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2008/09/geometry-decomposition-is-bad-idea.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/1060280070209828504'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/1060280070209828504'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2008/09/geometry-decomposition-is-bad-idea.html' title='Geometric decomposition is a bad idea'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-482450259755140831</id><published>2008-09-09T08:24:00.000-07:00</published><updated>2008-09-18T15:27:47.494-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Processing models'/><title type='text'>Failed Replicated Computing on The Sims Online</title><content type='html'>Another example of how Replicated Computing didn't work in a large scale client/server game...&lt;br /&gt;&lt;br /&gt;The Sims content is implemented in a custom scripting language referred to as Edith script. We needed a way to migrate the tons of single-player script content online. Normally you would develop scripts for a client/server architecture by separating logical actions that need an authoritative result out from client actions that add decorative, interactive display. But the mass of existing single player content had them co-mingled.&lt;br /&gt;&lt;br /&gt;The other aspect of gameplay was the user would select a game Entity and choose one of several actions that they wanted to perform.&lt;br /&gt;&lt;br /&gt;Some of the lead engineers reasoned that we had identical initial state (in the form of a save file), we could route the events requested by a user through the server and have each client play the associated script to result in the same final state (rinse-repeat). Of course you couldn't play graphical actions on the server, so the idea was to make those script builtins nop's on the server, and only do something client-side. Since we had control of the script VM we should be able to make the computation deterministic. Right? Uh. No.&lt;br /&gt;&lt;br /&gt;The first test of this approach resulted in drift within seconds. In a level that was empty. The character began choosing "fidget" actions randomly, and wound up heading in different directions. To synchronize the random number generators the seed had to start the same, making it now part of the initial state. But the number of calls to the generator was determined by frame rate, OS scheduling and other client-side environmental issues that couldn't be controlled.&lt;br /&gt;&lt;br /&gt;So the slippery-slope began. We found butterfly effects all over the place. Actions were run in different orders. Action requests had side-effects before they were routed through the server. We didn't initially disable game-pause, and buffers backed up and overflowed. ...&lt;br /&gt;&lt;br /&gt;The result was the design team could not work. They tried doing development single-player, but this was an online game. No online content was working. We built a manual resync mechanism so the playtesters could get a full state snapshot sent down from the server (ctrl-L; like in emacs!). And we noticed they would hit ctrl-L every 10 seconds "just in case". But that reset every client, and other playtesters got upset when their workflow was interrupted (every 10 seconds).&lt;br /&gt;&lt;br /&gt;So we built an automatic resync that detected drift. But for large levels the state snapshot was bigger than the message system could handle. And on and on. Drift-fix. Sync-fix. Timing-fix. Side-effect fix...&lt;br /&gt;&lt;br /&gt;We actually *shipped* with a resync that grabbed a state snapshot for each Entity involved in each action and applied it before each action was played out on the client. The only thing that allowed this to work was that the interaction rate was so much lower than a first person shooter that we didn't swamp the server to client network connection.&lt;br /&gt;&lt;br /&gt;It was hard, time-consuming, and technically embarassing. So what is the "right" way to do it? More on that...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-482450259755140831?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/482450259755140831/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2008/09/failed-replicated-computing-on-sims.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/482450259755140831'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/482450259755140831'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2008/09/failed-replicated-computing-on-sims.html' title='Failed Replicated Computing on The Sims Online'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-3909547800803469585</id><published>2008-09-08T15:15:00.000-07:00</published><updated>2008-09-18T15:27:33.155-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Processing models'/><title type='text'>Replicated Computing (10k Archers)</title><content type='html'>It is essential for fairness that each player in a multiplayer game sees the same data. A game designer may choose to have some hidden state (see Game Theory), but all the public state must be kept in sync to have a fair shared experience. No matter whether latency is different for each player. No matter if they have different peak bandwidth available.&lt;br /&gt;&lt;br /&gt;Some data doesn't matter and is only decorative. Where the gibs fall usually doesn't affect later gameplay. There is only a small loss of shared experience if one player experiences some awesome or amusing effect, but the others don't.&lt;br /&gt;&lt;br /&gt;About 4 years ago I heard a GDC talk [reference] that explained how a Microsoft (I think) dev team built a multiplayer RTS and kept all the player's games in sync. They used "replicated computing". They assumed that two clients having an identical initial state, and applying a repeatable/deterministic operation/state change that it would result in both clients having the same resulting state. While this is true in computing theory, it is almost never true in real life.&lt;br /&gt;&lt;br /&gt;Why?&lt;br /&gt;* The state are *not* identical. The operation/event is *not* deterministic.&lt;br /&gt;* The timing of the event is not the same (due to network latency issues), and somehow that timing affects the repeatability of the event (e.g. the event is applied during the next "turn" for one client).&lt;br /&gt;* The machines have different processors. In particular, floating point processors do *not* always return the same results as one another. You can get different results when the computation happens in registers vs. in memory, since they tend to have more bits of precision in registers. This leads to the butterfly/chaos effect. A little drift, a little more, and suddenly you are talking about real money!&lt;br /&gt;* Any interaction with an outside system (I/O, time of day, keyboard input, kernel operations...) can return radically different results on the two clients.&lt;br /&gt;* (Pseudo) Random number generation sequences take on radically different values even if you only call it one extra time. Keeping the seeds in sync call by call is hugely expensive, and so is controlling the replicated execution so exactly the same number of calls.&lt;br /&gt;* And many other reasons that are too painful to control.&lt;br /&gt;&lt;br /&gt;And that is the moral of the story. They got it working (miraculously), but spent an admittedly *huge* amount of time finding all the reasons things would drift, and finding workarounds for them.&lt;br /&gt;&lt;br /&gt;They argued that there was no way to synchronize the state of all the game Entities, because there were so many, and the state size would swamp the network.&lt;br /&gt;&lt;br /&gt;Even so, nobody wants to do that kind of cleanup or heroic debugging effort each time you ship a game. And all your work is out the window if a novice script writer breaks some rules.&lt;br /&gt;&lt;br /&gt;So what is a better way? More on that...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-3909547800803469585?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/3909547800803469585/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2008/09/replicated-computing-10k-archers.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/3909547800803469585'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/3909547800803469585'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2008/09/replicated-computing-10k-archers.html' title='Replicated Computing (10k Archers)'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1757856588756891210.post-2038150013044323808</id><published>2008-09-08T11:10:00.000-07:00</published><updated>2008-09-08T11:19:35.377-07:00</updated><title type='text'>Online Game Techniques</title><content type='html'>I want to share some basic and advanced techniques that I favor for use in implementing online games. They apply generally to both small scale and large scale online games. The postings reflect my philosophy and I will try to include a justification of each approach.&lt;br /&gt;&lt;br /&gt;I also want to techniques and best practices for parallel and distributed systems. It is intrinsically a very hard problem. Imposing some simple constraints can make such systems possible to implement and debug with a reasonable amount of effort, but also will make it possible to hide almost all the complexity from a user of the system. I want to make these systems approachable by novices but also appreciated by experts.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1757856588756891210-2038150013044323808?l=onlinegametechniques.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://onlinegametechniques.blogspot.com/feeds/2038150013044323808/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://onlinegametechniques.blogspot.com/2008/09/online-game-techniques.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/2038150013044323808'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1757856588756891210/posts/default/2038150013044323808'/><link rel='alternate' type='text/html' href='http://onlinegametechniques.blogspot.com/2008/09/online-game-techniques.html' title='Online Game Techniques'/><author><name>Darrin West</name><uri>http://www.blogger.com/profile/08387670564219526228</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
