I need a name for the Ultimate Online Game Framework we've been discussing so people know what I'm refering to (crazy idea, or something for that project). I want it to be the greatest thing in the world. And as everyone knows, that is a nice MLT (Mutton, Lettuce and Tomato sandwich). I was toying with calling it Quantum, because I like the thought of Quantum Entanglement to represent entity state replication. So maybe the Quantum Distributed Entity System (QDES), and later the Quantum this and that.
I'm also really stuck on what language to write it in. C++ or Java. I think it is critical that a game development team use the same language on client and server. Otherwise the team gets split in half, and winds up not talking enough. Being able to swap people onto what ever needs work is really important for keeping to a schedule. And hiring from two piles of expertise is just that much harder. Java has so many great libraries. But C++ is so much more popular in the game industry. On mobile, you've got Objective-C. If you're using Unity, you've got C#.
To be honest, most of your game logic should probably be written (at least initially) in your favorite scripting language. Java and C++ both support Python, Lua, Javascript for example. IKVM appears to allow Mono to be embedded in Java, so you could maybe even do C# on both sides. This argument (http://www.mono-project.com/Scripting_With_Mono) is certainly compelling. Ever heard of Gecko? It embeds a C++ compiler so you can compile and load/run C++ "scripts" at run time. The reason I bring up all these options is: you could implement client and server in different languages, but embed the same scripting engine in both sides, so the game developers get the benefit of reusing their staff on both sides.
If the architecture and design of QDES is awesome enough, someone might reimplement it in their favorite language anyway. Or port it to a very different platform that doesn't support whatever language choice we make. So maybe the choice I'm stuck on isn't that serious right now. Maybe I should just start with the architecture and design.
I'm drafting some high level requirements and am thinking I'll post them here on a "page" instead of a blog article. Not sure how comments and such would work with that however. But I wanted something stickier than a post and something I could update and have a single reference to. Same with the designs as they start to come.
Anyway. Thanks for being a sounding board. Maybe some of you and your friends will have time to contribute to (or try out) all this. (And I haven't completely forgotten about doing that tech survey to see what other similar projects are out there in the open source world. I found a few. Some are a little dusty. Some are a little off track from what I'm thinking.)
Small and large scale online game techniques. Parallel and distributed systems. Technical philosophy. Some software engineering.
Tuesday, May 29, 2012
Friday, May 25, 2012
Essential Components of the New Framework
What are the components that are absolutely essential to get right in the Ultimate MMO Framework (still need a catchy name)? If had those components today, you'd be using them in your current game, even if you had to integrate them to a bunch of other stuff you have to get a full solution. If we could build those essential pieces, and they were free, I think they would slowly take over the industry, shouldering less effective solutions out of the way. If each piece can be made independent of the others, there is a better chance the system will be adopted even if someone thinks it isn't all a perfect fit.
It is tempting to start with message passing. After all, it isn't an online game without that. But there are a lot of great message passing systems. There might even be some decent event handling systems. And there are definitely some good socket servers that combine the two. While I see problems with these that could be improved, and I see missing features, I think message passing is a second priority. You could make do with an existing system, and swap it out later. Although, I don't know how you can get away from the assumption that you have a publish/subscribe api that the higher levels can use.
More essential is the Entity System. It is where game play programmers interact with your system, and when done well, much of the rest of the system gets abstracted away. The decisions made here are what enable a server to scale, a client to stay in synch, content to get reused in interesting ways, and game play to be developed efficiently. BTW, what I mean by an Entity is a game object. Something that represents a thing or concept in the game world. The term comes from discrete event simulation. Jumping ahead a bit, the Entity System needs to be Component oriented such that an Entity is aggregated from a collection of Components. The Entity State system is the basis of replication to support distributed computation and load balancing, and of persistence. Done well, the Entity System can be used for more than just visible game objects, but could support administrative and configuration objects as well.
Related, but probably separatable is the Behavior system. How do get Entities to do something, and how do they interact with one another? I don't mean AI, I mean behavior in the OO encapsulated-state-and-behavior sense. It will be interesting to distinguish between and tie together AI "plans", complex sequences of actions, and individual behavior "methods". And, of course, the question of languages, scripting and debugging land right here. (A second priority that relates to Behaviors is time management. How do you synchronize client and server execution, can you pause, can you persist an in-flight behavior?)
Content Tools are the next big ticket item. If the Framework is stable enough, a new game could be made almost exclusively using these tools. Close to 100% of your budget would be spent pushing content through them in some imaginary perfect world. These tools allow for rapid iteration and debugging across multiple machines, and multiple platforms. It is not clear how much of the tools can be made independent of Entity state and behavior choices.
What other systems are absolutely essential? They drive everything else? They don't already exist, and you feel like you always have to rebuild them?
What do you think? Wouldn't it be cool to have a standalone Entity System that you could use in your next game? What if it came with a collection of Components that solved a lot of major problems like movement, collision, visibility, containment?
It is tempting to start with message passing. After all, it isn't an online game without that. But there are a lot of great message passing systems. There might even be some decent event handling systems. And there are definitely some good socket servers that combine the two. While I see problems with these that could be improved, and I see missing features, I think message passing is a second priority. You could make do with an existing system, and swap it out later. Although, I don't know how you can get away from the assumption that you have a publish/subscribe api that the higher levels can use.
More essential is the Entity System. It is where game play programmers interact with your system, and when done well, much of the rest of the system gets abstracted away. The decisions made here are what enable a server to scale, a client to stay in synch, content to get reused in interesting ways, and game play to be developed efficiently. BTW, what I mean by an Entity is a game object. Something that represents a thing or concept in the game world. The term comes from discrete event simulation. Jumping ahead a bit, the Entity System needs to be Component oriented such that an Entity is aggregated from a collection of Components. The Entity State system is the basis of replication to support distributed computation and load balancing, and of persistence. Done well, the Entity System can be used for more than just visible game objects, but could support administrative and configuration objects as well.
Related, but probably separatable is the Behavior system. How do get Entities to do something, and how do they interact with one another? I don't mean AI, I mean behavior in the OO encapsulated-state-and-behavior sense. It will be interesting to distinguish between and tie together AI "plans", complex sequences of actions, and individual behavior "methods". And, of course, the question of languages, scripting and debugging land right here. (A second priority that relates to Behaviors is time management. How do you synchronize client and server execution, can you pause, can you persist an in-flight behavior?)
Content Tools are the next big ticket item. If the Framework is stable enough, a new game could be made almost exclusively using these tools. Close to 100% of your budget would be spent pushing content through them in some imaginary perfect world. These tools allow for rapid iteration and debugging across multiple machines, and multiple platforms. It is not clear how much of the tools can be made independent of Entity state and behavior choices.
What other systems are absolutely essential? They drive everything else? They don't already exist, and you feel like you always have to rebuild them?
What do you think? Wouldn't it be cool to have a standalone Entity System that you could use in your next game? What if it came with a collection of Components that solved a lot of major problems like movement, collision, visibility, containment?
Labels:
Architecture,
Content Development,
QDES,
Software Engineering
Saturday, May 19, 2012
Why I don't like threads
People say I'm crazy because I don't like threads in this day and age of multicore processors. But I have good reason based on years of experience, so hear me out.
Doing threading well is harder than most people think. Making a decision to multithread your application imposes on all developers, including those less experienced with such things than the ones normally making that decision. I like to think that framework and server developers are much more experienced with such things than game developers who might be plugging stuff into the server. And framework developers are building their framework to be used by those game play programmers. Play to your audience. Wouldn't it be better if they never had to know about threading issues? I believe it is possible to build a framework that insulates regular developers from such concerns (the communicating sequential process model).
Now I'm a big fan of using threading for what it is good for: dealing with interrupts that should be serviced asap without polluting your main application with lots of polling and checking. Examples of this would be a background thread for servicing the network (you want to drain the network buffers quickly so they don't overflow and back things up and cause retransmissions); background file or keyboard I/O which needs to rarely wake up and service an incoming or outgoing IO buffer; remote requests that block for a long time and would otherwise stall the app (like a DB request, or http request). Note in particular that none of these are high performance computations. They are really dominated by the blocking/waiting time. The use of a thread in this case is really all about an easier programming model. The background thread can be written as a loop that reads or writes using blocking, and since it is not done on the main thread, the main thread doesn't have checks and polling sprinkled in.
Most of the time, when you have some heavy computation to do, it will eventually scale up to require more than a single machine anyway. So you are going to have to write your system to be distributed and communicate between the machines using messages anyway. If you have already done that work, you can easily use it within a single machine that has many cores. If you try to build a system that makes use of the many cores within a machine by using threads, and you also solve the distributed case, you've doubled your work and maintenance effort. One of the best ways to decompose a problem to be solved by worker threads is to deliver the work in a task queue and have them contend for it. As each is pulled off, it is processed by a handler function. This is exactly the same approach you would use for the distributed case. So the only difference is that in one case you have messages passing between processes on the same machine, or task-messages passing between threads. Yes, I understand the performance difference. But if your app is that performance sensitive, the inter-process message passing can be implemented using shared memory and avoid the kernel switches when delivering a message to the current machine. The intent here is to save you from the double implementation, save your framework users from having to deal with thread programming, and the performance difference is pretty small.
There is also a big problem with heavily threaded apps in production. There are pretty lousy tools for helping you figure out thread related performance problems. When you are only dealing with background interrupt-handling threads, there are not going to be serious performance problems unless one of the background threads starts polling wildly and consuming 100% cpu. But if a highly multithreaded app starts using too much CPU, or starts being unresponsive, how do you tell what is actually happening among the various threads? They don't have names, and the kernel isn't very good about helping you keep track of what work happens on each thread. You wind up having to build your own instrumentation into the application every time there is such a problem. And doing a new build and getting it into production is a lot of effort. On the other hand, if you follow the distributed model, you can easily see which process is spiking CPU. You can easily instrument the message flows between processes to see if there is too much or too little inter-process communication. Often you wind up logging all such traffic for post-mortem analysis anyway. Remember, you are not likely to have the luxury of attaching a debugger to a production process to grab stack traces, or what not. So you are going to be staring at a monolithic multithreaded app and trying to guess what is going on inside.
Problems in threaded apps tend to be subtle, they wind up being hard to debug, and often only show up after the app has been running at production loads for quite a while. Writing good threaded software is the responsibility of every programmer that touches an app that adopts it, and the least experienced programmer in that app is the one you have to worry about. Operating and debugging the app in production is not easy. These are the reasons I don't like threads. I'm not afraid of them. I understand them all too well. I think there are better ways (CSP) that are actually easier and faster to develop for in the first place. And you are likely to have to adopt those ways in any case as you scale beyond a single machine.
More thoughts on this subject here: http://onlinegametechniques.blogspot.com/2009/02/manifesto-of-multithreading-for-high.html
(Any position statement like this is going to sound a little nuts if you try to apply to every conceivable situation. What I was thinking about when I wrote this was a server application, in particular, something large scale, and event oriented. If, for example, you are writing a graphical client on a 360, using multiple processes would be looney. Multiple processes listening on the same socket can be a problem (with some workarounds). You might not be able to abide even a shared memory message passing delay between components, like in a rendering pipeline. Interestingly, these examples are all amenable to the same analysis: what is the response time required, what resources are being shared, how much will the computation have to scale up, is the physical hardware intrinsically distributed anyway? My point is the default answer should be to encapsulate any threading you *have to* do so that the bulk of your development doesn't have to pay the daily overhead of always asking: is that line of code accessing anything shared; is this data structure thread safe? It slows down important conversations, and it leaves shadows of a doubt everywhere.)
Doing threading well is harder than most people think. Making a decision to multithread your application imposes on all developers, including those less experienced with such things than the ones normally making that decision. I like to think that framework and server developers are much more experienced with such things than game developers who might be plugging stuff into the server. And framework developers are building their framework to be used by those game play programmers. Play to your audience. Wouldn't it be better if they never had to know about threading issues? I believe it is possible to build a framework that insulates regular developers from such concerns (the communicating sequential process model).
Now I'm a big fan of using threading for what it is good for: dealing with interrupts that should be serviced asap without polluting your main application with lots of polling and checking. Examples of this would be a background thread for servicing the network (you want to drain the network buffers quickly so they don't overflow and back things up and cause retransmissions); background file or keyboard I/O which needs to rarely wake up and service an incoming or outgoing IO buffer; remote requests that block for a long time and would otherwise stall the app (like a DB request, or http request). Note in particular that none of these are high performance computations. They are really dominated by the blocking/waiting time. The use of a thread in this case is really all about an easier programming model. The background thread can be written as a loop that reads or writes using blocking, and since it is not done on the main thread, the main thread doesn't have checks and polling sprinkled in.
Most of the time, when you have some heavy computation to do, it will eventually scale up to require more than a single machine anyway. So you are going to have to write your system to be distributed and communicate between the machines using messages anyway. If you have already done that work, you can easily use it within a single machine that has many cores. If you try to build a system that makes use of the many cores within a machine by using threads, and you also solve the distributed case, you've doubled your work and maintenance effort. One of the best ways to decompose a problem to be solved by worker threads is to deliver the work in a task queue and have them contend for it. As each is pulled off, it is processed by a handler function. This is exactly the same approach you would use for the distributed case. So the only difference is that in one case you have messages passing between processes on the same machine, or task-messages passing between threads. Yes, I understand the performance difference. But if your app is that performance sensitive, the inter-process message passing can be implemented using shared memory and avoid the kernel switches when delivering a message to the current machine. The intent here is to save you from the double implementation, save your framework users from having to deal with thread programming, and the performance difference is pretty small.
There is also a big problem with heavily threaded apps in production. There are pretty lousy tools for helping you figure out thread related performance problems. When you are only dealing with background interrupt-handling threads, there are not going to be serious performance problems unless one of the background threads starts polling wildly and consuming 100% cpu. But if a highly multithreaded app starts using too much CPU, or starts being unresponsive, how do you tell what is actually happening among the various threads? They don't have names, and the kernel isn't very good about helping you keep track of what work happens on each thread. You wind up having to build your own instrumentation into the application every time there is such a problem. And doing a new build and getting it into production is a lot of effort. On the other hand, if you follow the distributed model, you can easily see which process is spiking CPU. You can easily instrument the message flows between processes to see if there is too much or too little inter-process communication. Often you wind up logging all such traffic for post-mortem analysis anyway. Remember, you are not likely to have the luxury of attaching a debugger to a production process to grab stack traces, or what not. So you are going to be staring at a monolithic multithreaded app and trying to guess what is going on inside.
Problems in threaded apps tend to be subtle, they wind up being hard to debug, and often only show up after the app has been running at production loads for quite a while. Writing good threaded software is the responsibility of every programmer that touches an app that adopts it, and the least experienced programmer in that app is the one you have to worry about. Operating and debugging the app in production is not easy. These are the reasons I don't like threads. I'm not afraid of them. I understand them all too well. I think there are better ways (CSP) that are actually easier and faster to develop for in the first place. And you are likely to have to adopt those ways in any case as you scale beyond a single machine.
More thoughts on this subject here: http://onlinegametechniques.blogspot.com/2009/02/manifesto-of-multithreading-for-high.html
(Any position statement like this is going to sound a little nuts if you try to apply to every conceivable situation. What I was thinking about when I wrote this was a server application, in particular, something large scale, and event oriented. If, for example, you are writing a graphical client on a 360, using multiple processes would be looney. Multiple processes listening on the same socket can be a problem (with some workarounds). You might not be able to abide even a shared memory message passing delay between components, like in a rendering pipeline. Interestingly, these examples are all amenable to the same analysis: what is the response time required, what resources are being shared, how much will the computation have to scale up, is the physical hardware intrinsically distributed anyway? My point is the default answer should be to encapsulate any threading you *have to* do so that the bulk of your development doesn't have to pay the daily overhead of always asking: is that line of code accessing anything shared; is this data structure thread safe? It slows down important conversations, and it leaves shadows of a doubt everywhere.)
Tuesday, May 15, 2012
Ultimate Open Source Game Development System
I'm between jobs again. It happens a lot in this industry. One of the frustrating things about that is you leave behind investments you've made in building infrastructure that was supposed to save you effort on future projects. If the company you're leaving fails, that infrastructure investment won't help anyone. So what do you do? Just build another one for the next company? Rinse, repeat. Buy something? Can't influence that much. Build a company that sells infrastructure? Pretty hard to make a profit selling to us picky developers. Open source? Let's think about that...
If I was to build an open source game development system, what would I focus on? I'm not a graphics whiz, but I know about simulation, OO, distributed systems, MMOs, and picky developers. Let's see here...
I think the first thing to do is a survey the current state of open source game systems.
If I was to build an open source game development system, what would I focus on? I'm not a graphics whiz, but I know about simulation, OO, distributed systems, MMOs, and picky developers. Let's see here...
- Pick a primary development language. But realize that not all developers will love it. Is there a way to support multiple languages?
- Rapid iteration is key, so game logic must be able to be written in a scripting language. But it must also be possible to hard code parts of it for performance.
- Tools are key. In a good game development project the majority of the effort of the team should feed through the content tools, not the compiler. Wouldn't it be ideal if you could build a great game without any programmers?
- It must be debuggable. In a distributed system, this requires some thought.
- The world size must be able to scale. A lot of projects are bending their game design to avoid this problem, and that is definitely the least expensive approach. But if your infrastructure supports large scale on day one, what could you do?
- You want reuse of game logic, realizing different developers/designers have different skills. This means you want a hierachy of game elements that are developed by appropriate experts, and snapped together by others. This game object and level design effort should be efficient and fun.
- The team size must be able to scale; both up and down. This is a matter of content management. You don't want central locked files everyone contends for, and you don't want burdensome processes if you are a small team.
- It should be runtime efficient in space and time. This enables use on games that have huge numbers of game object instances.
- It is easy to ask for dynamic object definition, but that can work against performance. How often is that used? And what other ways are there realize that effect?
- The framework should be intuitive to as many people as possible. This means being very careful about terminology.
- Now we enter an area I don't know much about. How do you structure the infrastructure development project itself? You want buy in from lots of people, but you also need a single vision so the result is consistent. You need a means to adjust the vision without blowing up the whole effort, and without encouraging people to fork the effort.
- What will be the relationship to other open source projects? We can choose to use various utility libraries. But what about using other game projects for graphics? Issues will arise at the boundaries.
- The system should be useful and get some adoption even without being finished. Because it will never be finished.
- It should be modular enough. That allows a developer to use the parts they want, and replace the parts they must. It allows broken parts to be replaced.
- We will need a demo game. How ambitious should it be?
- We need a name. And a vision statement.
I think the first thing to do is a survey the current state of open source game systems.
Thursday, June 2, 2011
Techniques for Handling Cheating (Part 1)
Cheating is fun for some people. It is a game on top of your game. "Can I find a path through the maze of security mechanisms you have laid in my path?"
First, why does a developer, care about cheating in online games?
The interaction between cheaters and developers has been called an arms race. And there are a lot more players than developers. Developers can't really hope to keep up and close every possible issue. So at some point it becomes a cost benefit thing. There will always be some cheating. You'll want to hit the big ones, and pick your battles.
There are a number of aspects to consider:
I think one of best mitigation strategies is public shaming. It leaves cheaters thinking that "everyone" is watching them, and it lets non-cheaters see that you as a developer are paying attention. You can let players report on other players. Ban the egregious cheaters, especially if they are greifing other players. Of course, they will be back with a different email address if their goal in life is to cause trouble. But this is a slippery slope susceptible to gaming as well. If you provide a means for the community to use social pressure against perceived cheaters, it can also be exploited by cheaters for greifing. E.g. if you show the community the number of reports against a player, you might think it would highlight those that should be avoided. But some might consider it a badge of honor (among thieves), or worse will use it for extortion against unempowered innocents.
You will want some form of "ignore", however, that each player can apply to those they consider a cheater. It could be used to make sure a player never gets matched into a dungeon instance or PvP match with someone, or have to listen to their obnoxious chat. Ideally, it would stop them from interacting with your character at all, and make them invisible. Just imagine being in kindergarten, and all the other kids ignored you. You aren't kicking them out of the game, but almost. Again, this might be exploited. What if someone ignored every player that was better than them at PvP. It would artificially inflate their win rating, and your leaderboards would be unfair.
But let's talk about the technical aspects of cheat prevention. (Let's ignore server intrusion problems.) Ultimately, the way a player manipulates the system is through the messages their client sends to the server. If your client is bug free, and has not been tampered with, all is well. The messages are a result of a human operating the UI as the designers intended. The difference between two players is their skill and knowledge of the game. But how can the server be sure all is well. It can only look at the messages and try to differentiate between an untampered client and one that is tampered with or replaced with a script.
I'll post this and come back later with a discussion of different kinds of attacks and ways to deal with them.
First, why does a developer, care about cheating in online games?
- They spent a lot of effort making content so they want to make sure players experience it instead of skipping over it and "stealing" the reward. The idea being that the players will have more fun facing the challenges and beating them. They'll appreciate it more if they have to work for it. Maybe. Some people are weird, and get a sense of appreciation out of working through the cheats.
- Cheating can directly interfere with other player's enjoyment of the content. E.g. griefing, stealing their stuff,...
- The perception of unfairness (everyone else has all the goodies, and you don't; you can't win PvP without also cheating; ...). Players can get frustrated by this and leave, and the developer loses money.
- It can interfere with the operation of the servers, and that interferes with other players' enjoyment of the game.
- Cheaters can actually steal something of value. If they sell it (e.g. gold farming), that can affect in game economy, or more directly, affect the profitability of the company.
The interaction between cheaters and developers has been called an arms race. And there are a lot more players than developers. Developers can't really hope to keep up and close every possible issue. So at some point it becomes a cost benefit thing. There will always be some cheating. You'll want to hit the big ones, and pick your battles.
There are a number of aspects to consider:
- Detection: what is a cheat? Maybe it is gaining XP or loot too quickly. Test for this on the fly by adding logic to the game server? Run metrics queries against the DB or event logs periodically?
- Reporting: put something in the server logs; send an alert email; weekly report out of the metrics system?
- Mitigation: take away what they gained? ban them (and lose their subscription money)? Reimburse other players that have been harmed?
- Prevention: do your best to secure the attack points of your system; check all client requests for sanity; do summary level real time rate limiting (detects your own bugs cheaters might exploit, speed hacks, bots/farming, aim-bots...); don't trust the client
I think one of best mitigation strategies is public shaming. It leaves cheaters thinking that "everyone" is watching them, and it lets non-cheaters see that you as a developer are paying attention. You can let players report on other players. Ban the egregious cheaters, especially if they are greifing other players. Of course, they will be back with a different email address if their goal in life is to cause trouble. But this is a slippery slope susceptible to gaming as well. If you provide a means for the community to use social pressure against perceived cheaters, it can also be exploited by cheaters for greifing. E.g. if you show the community the number of reports against a player, you might think it would highlight those that should be avoided. But some might consider it a badge of honor (among thieves), or worse will use it for extortion against unempowered innocents.
You will want some form of "ignore", however, that each player can apply to those they consider a cheater. It could be used to make sure a player never gets matched into a dungeon instance or PvP match with someone, or have to listen to their obnoxious chat. Ideally, it would stop them from interacting with your character at all, and make them invisible. Just imagine being in kindergarten, and all the other kids ignored you. You aren't kicking them out of the game, but almost. Again, this might be exploited. What if someone ignored every player that was better than them at PvP. It would artificially inflate their win rating, and your leaderboards would be unfair.
But let's talk about the technical aspects of cheat prevention. (Let's ignore server intrusion problems.) Ultimately, the way a player manipulates the system is through the messages their client sends to the server. If your client is bug free, and has not been tampered with, all is well. The messages are a result of a human operating the UI as the designers intended. The difference between two players is their skill and knowledge of the game. But how can the server be sure all is well. It can only look at the messages and try to differentiate between an untampered client and one that is tampered with or replaced with a script.
I'll post this and come back later with a discussion of different kinds of attacks and ways to deal with them.
Sunday, May 15, 2011
Super hero Squad (our latest title) is now live
Things have been quiet here because all my attention was focused on Super Hero Squad (www.heroup.com). It is a Marvel title developed at The Amazing Society in Seattle, a studio of Gazillion. It is a light weight MMO, uses the Unity graphics engine, Smartfox, Apache, some Java apps on the back end, and MySQL. It is shardless, and the architecture scales horizontally with the number of concurrent players, including the database. The back end components are loosely coupled based on JMS publish/subscribe.
It has definitely been a fun project, and I'm working with a team with lots of deep experience. Load is ramping up, but not yet near the load tests we ran ahead of time. So I'm paying attention, but not anxious about it.
Along the way, we found ways to ship early and still have a fun and stable game. But as with all MMO's that actually launch, there is a lot of work left to do when you are "done". The context switch is challenging right now to go from: "we have to ship; we are not going to do that", to "remember those things we cut to simplify things; its time to put them back on the table". Now we have the fun of changing things without breaking a running service. And monitoring and fixing the service cuts into development. So things slow down at the same time they get more reactionary.
It has definitely been a fun project, and I'm working with a team with lots of deep experience. Load is ramping up, but not yet near the load tests we ran ahead of time. So I'm paying attention, but not anxious about it.
Along the way, we found ways to ship early and still have a fun and stable game. But as with all MMO's that actually launch, there is a lot of work left to do when you are "done". The context switch is challenging right now to go from: "we have to ship; we are not going to do that", to "remember those things we cut to simplify things; its time to put them back on the table". Now we have the fun of changing things without breaking a running service. And monitoring and fixing the service cuts into development. So things slow down at the same time they get more reactionary.
Sunday, February 27, 2011
Running branches for continuous publishing
I am a very strong proponent of what are called running branches for development of software, and for the stabilization and publication of online games. One of the more important features of large scale online games is that they live a long time, and have new content, bug fixes and new features added over time. It is very difficult to manage that much change with a relatively large amount of code and content. And since you continue to develop more after any release, you will want your developers to be able to continue working on the next release while the current one is still baking in QA, and rolling toward production.
I will skip the obvious first step of making the argument that version control systems (aka source code change control, revision control) are a good idea. I like Perforce. It has some nice performance advantages over Subversion for large projects, and has recently incorporated ease of use features like shelving and sandbox development. I like to call the main line of development mainline. I also like to talk about the process of cutting a release and deploying it into production as a "train". It makes you think about a long slow moving object that is really hard to stop, and really difficult to add things to and practically impossible to pull out and pass. And if you get in the way, it will run you down, and someone will lose a leg. Plus it helps with my analogy of mainline and branch lines.
So imagine you are preparing your first release. You make a build called Release Candidate 1 (RC1), and hand it off to QA. You don't want your developers to go idle, so you have two choices, they can pitch in on testing, or they can start working on release 2. You will probably do a bit of each, especially early in the release cycle, since you often dig up some obvious bugs, and can keep all your developers busy fixing those. But at some point they will start peeling off and need something to do. So you sic them on Release 2 features, and they start checking in code.
Then you find a bug. A real showstopper. It takes a day to find and fix. Then you do another build and you have RC1.1. But you don't want any code from Release 2 that has been being checked in for several days. It has new features you don't want to release, and has probably introduced bugs of its own. So you want to use your change control system to make a branch. And this is where the philosophy starts. You either make a new branch for every release, or you make a single Release Candidate branch and for each release, branch on top of it.
Being prepared ahead of time for branching can really save you time, and confusion, especially during the high stress periods of pushing a release, or making a hotfix to production. So I'm really allergic to retroactive branching, where you only make a branch if you find a bug and have to go back a patch something.
Here's why: the build system has to understand where this code is coming from, or you will be doing a lot manual changes right when things are the most stressed. If you have already decided to make branches, you will also have your build system prepared and tested to know how to build off the branch. You will also have solved little problems like how to name versions, prepare unambiguous version strings so you can track back from a build to the source it came from, and many more little surprises.
The build system is another reason why I prefer running branches as opposed to a new branch per release. You don't have to change any build configuration when a new release comes along. The code for RC2 is going to be in exactly the same place as RC1. You just hit the build button. That kind of automation and repeatability is key to avoiding "little" mistakes. Like accidentally shipping the DB schema from last release, or wasting time testing the old level up mechanism, or missing the new mission descriptions.
And then there is the aesthetic reason. If you cut a branch for every release, your source control depot is going to start looking pretty ugly. You are planning on continuous release, right? Every month. After 5 years that would be 60 complete copies of the source tree. Why not just 2: ML and RC (and maybe LIVE, but let's save that for another time).
Finally, as a developer, if you are lucky enough to be the one making the hotfix, you will want to get a copy of the branch onto your machine. Do you really want another full copy for each release that comes along? Or do you just want to do an update to the one RC branch you've prepared ahead of time? It sure makes it easier to switch back and forth.
An aside about labels: You might argue you could label the code than went into a particular build, and that is a good thing. But one problem with labels that has always made me very nervous is that labels themselves are not change controlled. Someone might move a label to a different version of a file, or accidentally delete it or reuse it, and then you would lose all record of what actually went into a build. You can't do that with a branch. And if you tried, you would at least have the change control records to undo it.
One more minor thought: if you want to compare all the stuff that changed between RC1 and RC2, it is much easier to do in a running branch. You simply look at the file history on the RC branch and see what new stuff came in. To do that when using a branch per release requires a custom diff each time you want to know: e.g. drag a file from one branch onto the same file on the other. Pretty clumsy.
Also note that these arguments don't apply as well for a product that has multiple versions shipped and in the wild simultaneously. An online game pretty universally replaces the previous version with the new one at some point in time. The concurrency of their existence is only during the release process.
Summary:
I may revisit the topic of branching in the form of sandbox development which is useful for research projects and sharing between developers without polluting the mainline.
I will skip the obvious first step of making the argument that version control systems (aka source code change control, revision control) are a good idea. I like Perforce. It has some nice performance advantages over Subversion for large projects, and has recently incorporated ease of use features like shelving and sandbox development. I like to call the main line of development mainline. I also like to talk about the process of cutting a release and deploying it into production as a "train". It makes you think about a long slow moving object that is really hard to stop, and really difficult to add things to and practically impossible to pull out and pass. And if you get in the way, it will run you down, and someone will lose a leg. Plus it helps with my analogy of mainline and branch lines.
So imagine you are preparing your first release. You make a build called Release Candidate 1 (RC1), and hand it off to QA. You don't want your developers to go idle, so you have two choices, they can pitch in on testing, or they can start working on release 2. You will probably do a bit of each, especially early in the release cycle, since you often dig up some obvious bugs, and can keep all your developers busy fixing those. But at some point they will start peeling off and need something to do. So you sic them on Release 2 features, and they start checking in code.
Then you find a bug. A real showstopper. It takes a day to find and fix. Then you do another build and you have RC1.1. But you don't want any code from Release 2 that has been being checked in for several days. It has new features you don't want to release, and has probably introduced bugs of its own. So you want to use your change control system to make a branch. And this is where the philosophy starts. You either make a new branch for every release, or you make a single Release Candidate branch and for each release, branch on top of it.
Being prepared ahead of time for branching can really save you time, and confusion, especially during the high stress periods of pushing a release, or making a hotfix to production. So I'm really allergic to retroactive branching, where you only make a branch if you find a bug and have to go back a patch something.
Here's why: the build system has to understand where this code is coming from, or you will be doing a lot manual changes right when things are the most stressed. If you have already decided to make branches, you will also have your build system prepared and tested to know how to build off the branch. You will also have solved little problems like how to name versions, prepare unambiguous version strings so you can track back from a build to the source it came from, and many more little surprises.
The build system is another reason why I prefer running branches as opposed to a new branch per release. You don't have to change any build configuration when a new release comes along. The code for RC2 is going to be in exactly the same place as RC1. You just hit the build button. That kind of automation and repeatability is key to avoiding "little" mistakes. Like accidentally shipping the DB schema from last release, or wasting time testing the old level up mechanism, or missing the new mission descriptions.
And then there is the aesthetic reason. If you cut a branch for every release, your source control depot is going to start looking pretty ugly. You are planning on continuous release, right? Every month. After 5 years that would be 60 complete copies of the source tree. Why not just 2: ML and RC (and maybe LIVE, but let's save that for another time).
Finally, as a developer, if you are lucky enough to be the one making the hotfix, you will want to get a copy of the branch onto your machine. Do you really want another full copy for each release that comes along? Or do you just want to do an update to the one RC branch you've prepared ahead of time? It sure makes it easier to switch back and forth.
An aside about labels: You might argue you could label the code than went into a particular build, and that is a good thing. But one problem with labels that has always made me very nervous is that labels themselves are not change controlled. Someone might move a label to a different version of a file, or accidentally delete it or reuse it, and then you would lose all record of what actually went into a build. You can't do that with a branch. And if you tried, you would at least have the change control records to undo it.
One more minor thought: if you want to compare all the stuff that changed between RC1 and RC2, it is much easier to do in a running branch. You simply look at the file history on the RC branch and see what new stuff came in. To do that when using a branch per release requires a custom diff each time you want to know: e.g. drag a file from one branch onto the same file on the other. Pretty clumsy.
Also note that these arguments don't apply as well for a product that has multiple versions shipped and in the wild simultaneously. An online game pretty universally replaces the previous version with the new one at some point in time. The concurrency of their existence is only during the release process.
Summary:
- You want to branch so you can stabilize without stopping ongoing work for the next release
- You want a branch so you are ready to make hot fixes
- You want a running branch so your build system doesn't have to get all fancy, and so your repo looks simpler.
I may revisit the topic of branching in the form of sandbox development which is useful for research projects and sharing between developers without polluting the mainline.
Subscribe to:
Posts (Atom)