Thursday, October 28, 2010

Use of public/private keys to avoid a race condition

I inherited an interesting technique when I took over the server of SHSO. The use of signatures to avoid a race condition that can occur when a player is handed off between hosts. The use case is: the client requests that the Matchmaker connect them to a game "room" server that has a new game session starting. That room needs correct information about the player, what level they are, what they own, etc. How do you get it to the room before the client connects to that room and the data is needed? A couple of approaches:

  1. You don't. You just leave the client connected but in limbo until their data arrives through the back end asynchronously. This is a pretty normal approach. Sometimes the data arrives before the client makes the connection. So you have to cover both of those cases.
  2. You don't. You leave the client in limbo while the room fetches the data through the back end synchronously. This is also pretty normal, but blocks the thread the room is running on, which can suck, especially if other rooms on the same host and process also get blocked. Yes, you could multithread everything, but that is not easy (see my manifesto of multithreading!). Or you could create a little state machine that tries to remember what kind of limbo the client is in: just-connected, connected-with-data-fetch-in-progress, etc. Personally, I don't allow the processes that run the game logic to open a connection directly to the DB and do blocking queries. DB performance is pretty erratic in practice, and that makes for uneven user experience.
  3. Or have the data arrive *with* the connection. From the client. Interesting idea. But can you trust the client? That is where signed data comes in.

A quick review of cryptography. Yes, I am totally oversimplifying it, and skipping a lot of the interesting bits, and optimization and stuff. However, it is fun to talk about...

A public/private key works by taking advantage of mathematics that is easy to compute in one direction, but really hard to compute in the other direction. The most common is factoring very large integers that are the product of two very large prime numbers. There is only one way to factor the product, but you have to compute and try pretty much every prime number up to the square root of the product, and that can take a looong time.

A public key can be used to encrypt plain text, and the private key is the only thing that can be used to unencrypt it. That means only the owner of the private key can read the plain text (including any else that had access to the public key).

On the other hand, a signature is created by *unencrypting* the plain text using the private key. The public key can then be used to *encrypt* the signature and test if the result equals the plain text again, thereby verifying the signature, and proving that the signature and plain text came from the owner of the private key exactly as they sent it.

Back to the story...the player data is signed using the private key by the Matchmaker and delivered to the client when the Matchmaker directs the player to the correct room. The client cannot tamper with their data without getting caught. The client then sends the data to the game room with the signature when it connects. The room server checks the signature using the public key, and can tell that the data is unmodified and came indirectly from the Matchmaker.

Why not just encrypt the data? The room server could have been set up to be able to unencrypt. Answer:  The client wants to read and use its player's data. It wouldn't be able to do that if it were encrypted. And sending it twice (plain, and encrypted) is a waste.

One interesting thing to note is that the client never saw the public, nor the private key.

OK. I know it seems pretty wasteful to send all this data down to the client, just to have it sent back up to a different host in the server. After all, we are talking about bandwidth in and out of the data center, and more importantly, bandwidth in and out of the client's home ISP connection. Only half bad. The client needed the data anyway. It is not the greatest approach, but it is what we have. As Patton is quoted: A good solution applied with vigor now is better than a perfect solution applied ten minutes later.

BTW, another reason this was built this way was that originally the system couldn't predict which room the player would wind up in, so the client needed to bring their data with them.

And that is a segue to the topic of scaling across multiple data centers. It might start to make sense of that extra bandwidth question.

2 comments:

  1. Glad to see you're back to posting. This technique is essentially what is used when implementing Federation for web site identity. You would do well to look at Windows Communication Foundation and Windows Identity Foundation because they implement a similar technique called Claims Aware applications. In your scenario, the game data secured by the certificate would be a claim and using the WS-Federation standard and the WS-Trust standard, one could modify a Secure token service to deal with the signing and securing of the data. (a.k.a. claims)

    ReplyDelete