At least you would want to keep playing with the guys you've matched with; the rest of your clan or whatever. At best, you'd like to keep playing from exactly the point of the "failure" without noticing any hiccup (good luck). The steps to get session host migration/fail over to a new simulator:
- Restablish network interconnection
- Pick a new master simulator
- Coordinate with the Live Service about who is the master (or do this first)
- Have the entity data already there, or somehow gather it
- Convert replicated entities in to real owned/master entities
- Reestablish interest subscriptions, or whatever distributed configuration settings are needed
- To reconnect, someone needs the complete list of IP/ports used by the players. But that is consider a security issue. E.g. someone could use that info to DOS attack an opponent. Let's assume the Live Service renegotiates and handshakes your way back into business.
- If you aren't connected, how do you elect a new master? If you don't have a master yet, how would all the clients (in a strict client/server network topology) know who to connect to? So the answer has to be precomputed. E.g. designate a backup simulator before there is a fault (maybe the earliest joiner, lowest ip address...)
- If your game session service supports this, it can solve both of the previous issues by exposing IP addresses only to the master simulator, and since it has a fixed network address, each client can always make a connection to it and be told who is the new master.
- If the authoritative data is lost on the fault, you may as well restart the level, go back to the lobby or whatever. So instead, you have to send the entity state data to the backup simulator(s) as you go. This is actually more data than is necessary to exchange for an online game that is not fault tolerant, since you'd have to send hidden and possibly temporary data for master Entities. Otherwise you couldn't completely reconstruct the dead Entities. There may be data that only existed on the master, so gathering it from the remaining clients isn't going to be a great solution. Spreading the responsibility is that much more complicated.
- So the backup master starts converting replicated Entities into authoritative Entities. Any Entities it didn't know about couldn't get recreated, so the backup master has to have a full set of Entities. Think about the bandwidth of that. You should really want this feature before just building it. Now we hit a hard problem. If the Entities being recreated had in-flight Behaviors (e.g. you were using coroutines to model behavior), they can't be reconstructed. It is prohibitively expensive to continuously replicate the Behavior execution context. So you wind up "resetting" the Entities, and hoping their OnRecreate behavior can get it running again. You may have a self-driven Behavior that reschedules itself periodically. Something has to restart that sequence. Another thing to worry about: did the backup simulator have a truly-consistent image of the entity states, or was anything missing or out of order? At best this is an approximation of the state on the original session host.
- Unless you are broadcasting all state everywhere, you are going to have to redo interest management subscriptions to realize bandwidth limitation. This is like a whole bunch of late-joining clients coming in. They would get a new copy of each entity state. Big flurry of messages, especially if you do this naively.
- Now you are ready to go. Notify the players, give them a count-down...FIRE!
Note that the problem gets a lot easier if all you support is clean handoff from a running master to the new master. Would that be good enough for your game.
So is it worth the complexity, the continuous extra bandwidth and load on the backup simulator? Just to get an approximate recreation? With enough work, and game design tweaking, you could probably get something acceptable. Maybe give everyone a flash-bang to mask any error.
Or maybe you just reset the level, or go back to the lobby to vote on the next map. And put the bad player on your ban list.
Me? I'd probably instead invest the time of my network and simulator guys in something else, like smoothness, fun gameplay, voice, performance. Or ship earlier.