Monday, September 8, 2008

Replicated Computing (10k Archers)

It is essential for fairness that each player in a multiplayer game sees the same data. A game designer may choose to have some hidden state (see Game Theory), but all the public state must be kept in sync to have a fair shared experience. No matter whether latency is different for each player. No matter if they have different peak bandwidth available.

Some data doesn't matter and is only decorative. Where the gibs fall usually doesn't affect later gameplay. There is only a small loss of shared experience if one player experiences some awesome or amusing effect, but the others don't.

About 4 years ago I heard a GDC talk [reference] that explained how a Microsoft (I think) dev team built a multiplayer RTS and kept all the player's games in sync. They used "replicated computing". They assumed that two clients having an identical initial state, and applying a repeatable/deterministic operation/state change that it would result in both clients having the same resulting state. While this is true in computing theory, it is almost never true in real life.

Why?
* The state are *not* identical. The operation/event is *not* deterministic.
* The timing of the event is not the same (due to network latency issues), and somehow that timing affects the repeatability of the event (e.g. the event is applied during the next "turn" for one client).
* The machines have different processors. In particular, floating point processors do *not* always return the same results as one another. You can get different results when the computation happens in registers vs. in memory, since they tend to have more bits of precision in registers. This leads to the butterfly/chaos effect. A little drift, a little more, and suddenly you are talking about real money!
* Any interaction with an outside system (I/O, time of day, keyboard input, kernel operations...) can return radically different results on the two clients.
* (Pseudo) Random number generation sequences take on radically different values even if you only call it one extra time. Keeping the seeds in sync call by call is hugely expensive, and so is controlling the replicated execution so exactly the same number of calls.
* And many other reasons that are too painful to control.

And that is the moral of the story. They got it working (miraculously), but spent an admittedly *huge* amount of time finding all the reasons things would drift, and finding workarounds for them.

They argued that there was no way to synchronize the state of all the game Entities, because there were so many, and the state size would swamp the network.

Even so, nobody wants to do that kind of cleanup or heroic debugging effort each time you ship a game. And all your work is out the window if a novice script writer breaks some rules.

So what is a better way? More on that...

1 comment:

  1. A couple of friends and I played multiplayer StarTrek Armada on a LAN with all three of us winning the same game instance!

    What we hadn't realised is that after half an hour of play, our clients had disconnected from the server and seemlessly launched their own disconneted servers.

    This was an RTS that (presumably) used replicated computing and probably got out of sync before disconnecting. Since all three clients had complete copies of the game state we carried on playing for another couple of hours and I remember being suprised when I won and there was no reaction from the others. They were still busy winning their own instances.

    We never played that game again multi-player.

    ReplyDelete