Thursday, June 6, 2019

Spring Boot Applications in Kubernetes

Goals

Spring Boot is a very convenient framework for developing Java applications. It supports lots of useful features just by mentioning the package in your Gradle or Maven files. 

Building the application into a container and running the image in Docker allows you to have a very consistent experience in development, testing, and live. It also solves a lot of configuration and networking issues that used to be solved with a lot of custom scripting that usually broke *during* a maintenance window.

Running a distributed system (like a game server) requires deployment, and monitoring (called orchestration). The best solution for that right now is Kubernetes (K8S). It makes use of Containers to run the app, and watches its health. It can restart failed instances, and can dynamically and automatically scale up new instances based on load. When integrated with a cloud provider like Google or AWS, it can also spin up new servers (and you get charged a bit more, or a bit less when they scale down again). Another awesome benefit of K8S is that you can redeploy your whole server without a maintenance window. It will shut down an old Pod, and spin up a replacement with your new build. There are challenges to doing that in some cases that I'll talk through in a later post.

But there are a few tricks I've had to discover to make all this work well together:

  • Have the docker Command be able to use variables so you can control things like startup memory usage.
  • Have the JVM catch a signal and pass it to the application so it can shut down cleanly when K8S decides to scale down or replace a Pod. K8S goes through a two step process: it sends SIGINT first, then waits, then sends SIGKILL if the app doesn't shut down on its own.
  • Get Spring Boot configuration into the application, and be able to override it with cluster-specific settings from a K8S ConfigMap.

Config Printing

The first challenge is *knowing* what configuration your app started with. There is no turn-key solution to this. So I include a little code that queries Spring and logs the config that was used to start the app. Very useful for debugging deployments. I also expose it to JMX (not shown):


    @Autowired
    ConfigurableEnvironment env;
    ... 

        MutablePropertySources sources = env.getPropertySources();

        // Find the name of every property...
        Set<String> uniqueNames = new HashSet<>();
        for (PropertySource<?> source : sources){
            if (source instanceof EnumerablePropertySource){
                uniqueNames.addAll(Arrays.asList(((EnumerablePropertySource) source).getPropertyNames()));
            }
        }

        // Use a TreeMap so the output is sorted.
        TreeMap<String, String> sortedProps = new TreeMap();

        for (String name : uniqueNames){
            // Read the property value, using Spring's resolution rules.
            String value = env.getProperty(name);
            if (name.toLowerCase().contains("pass") || name.toLowerCase().contains("secret")){
                value = "****"; // Don't show passwords.
            }
            sortedProps.put(name, value);
        }

        StringBuilder sb = new StringBuilder();
        sb.append("# Merged application properties.\n");

        // TODO: this doesn't really preserve the "file" exactly. E.g. special characters, continuations, comments, ...
        for (Map.Entry<String, String> el : sortedProps.entrySet()){
            sb.append(el.getKey()).append("\t=").append(el.getValue()).append("\n");
        }

        // String res = "EnvVars: " + envVars.toString() + " SystemProperties: " + systemProperties.toString() + " AppProperties: " + os.toString();
        String res = sb.toString();


This grabs all  the configuration names, then uses Spring's configuration system to look up the current value of each (based on its External Config priorities). It also hides passwords, because you shouldn't log such things.

Config Override

This is an example ConfigMap definition:

# kubectl apply -f server-config.yml
apiVersion: v1
kind: ConfigMap
metadata:
  name: server-config

data:
  server.port: "8090"
  api.server.port: "8090"
  management.server.port: "8090"

  logging.file: protoserver.log
  logging.level: INFO
  logging.level.com.protag.protoserver: INFO

  # The environment variable in the dockerfile that controls -Xmx${HEAP_SIZE} when the jvm starts in the container.
  HEAP_SIZE: 2400m

Deployment


This is an example deployment that picks up that configMap:

# kubectl apply -f server-deployment
# With "parameter" substitution:
# sed -s 's/BUILD_TAG/99/g' &lt; server-deployment.yml | kubectl apply -f -
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: server
spec:
  replicas: 2
  template:
    metadata:
      labels:
        configmap-version: v8 # Fake label to force rolling update upon configMap change. Could use its md5 hash too.    
    spec:
      containers:
        - name: server
          image: localhost:5000/ourorg/server:BUILD_TAG
          # Pull config data in as environment variables, where springboot will merge them w application.properties.
          # There is currently no way to auto-redeploy the pod when this config changes. Some people hash the config.yml
          # file, and sed the hash into the deployment.yml, which is recognized as a big enough change to trigger
          # a RollingUpdate. Sigh.
          envFrom:
          - configMapRef:
            name: server-config
          # Make sure to only use "bash" where needed. "sh" strips dotted env vars, which spoils application.properties coming
          # through configMaps, and "env:" settings.
          ports:
          # the service
          - containerPort: 8090
            name: web
          - containerPort: 9010
            name: jmx

This uses envFrom to pull the ConfigMap above into the environment variables of the container. When Spring Boot starts up, those override what is in application.properties.

One odd thing, here is that "sh" will discard variable names that contain a dot. So use "bash", as you will see in the Dockerfile below.

The label configmap-version is used to force a rolling update if I change the configuration. I update that label and update the deployment. K8S will then restart each pod so it picks up that new config.

The string BUILD_TAG is used to ensure that an image produced by Jenkins or whatever is the actual one that K8S pulls from. Using LATEST is not reliable, and you can visibly see which image was used when running "kubectl describe".

Dockerfile

This is an example Dockerfile, used to create an image containing your Spring Boot app:

# Creates ourorg/server
FROM openjdk:11-jre-stretch
# May need a new base image. I don't see it on dockerhub anymore (only a few weeks later)

# Pick up variable passed in from gradle.build file
ARG JAR_FILE
ENV JAR_FILE=${JAR_FILE}

# JVM memory. Default is 128m. Make this less than the resource request in statefulSet.yml
ARG HEAP_SIZE=900m
ENV HEAP_SIZE=${HEAP_SIZE}

# We want the application.properties file to be loose, so it can be seen and edited more easily.
COPY ${JAR_FILE} application.properties /app/

# Run in /app to pick up application.properties, and drop logs there.
WORKDIR /app

# App and jmx ports
EXPOSE 8090/tcp 9010/tcp

# Allow SIGINT signals hit java app directly (by exec'ing over the initial shell). But still use a shell to
# expand variables. SIGINT is used by k8s to trigger graceful shutdown.
# Don't use "sh", it will strip dotted env vars, and you'll lose the configMap settings to override application.properties
ENTRYPOINT [ "bash", "-c", \
    "exec java -Xmx${HEAP_SIZE} \
               -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9010 -Dcom.sun.management.jmxremote.local.only=false \
               -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false \
               -jar ${JAR_FILE} \
          && echo Completed unexpected. >  /dev/termination-log"]

This Dockerfile allows the build system to specify the jar name, and starting java heap size. The way we are using these variables allows them to be overridden by K8S directly, or in a ConfigMap. Using variables in an ENTRYPOINT is not normally possible in Docker because it doesn't use a shell. Here, however, we explicitly run the command using "bash" as the shell. This allows the dotted environment variables to be passed through.

The reason for using "exec" in the ENTRYPOINT is so that the jvm is the process that receives the INT and KILL signals from K8S. Without that, the wrapping shell would not pass the INT signal to our application. It is needed to tell Spring Boot to start shutting down gracefully.

Summary


Well, I think that covers all my tricks for getting a Spring Boot app to work properly in Docker and Kubernetes. Let me know if you have questions, or trouble getting this working as I suggest. Or if there are other difficulties you've hit that you'd like help with.

Tuesday, May 26, 2015

My new company: Combine Games; My new game: Rails and Riches

Thought I'd share the news, if you hadn't heard. A few friends and I just started a new company called Combine Games based in the Seattle area. We are running a Kickstarter to fund a game called Rails and Riches. You can think of it as a successor to Railroad Tycoon, because Franz Felsl, the designer and artist on Railroad Tycoon 2 and 3 is our lead designer and CCO. You can find us at http://www.railsandriches.com/, or on Facebook: https://www.facebook.com/RailsAndRiches. I just posted a long blog over there: http://www.railsandriches.com/blog/view/6. Come make an account, share or like on Facebook. And keep an ear out for when the Kickstarter goes live.

Tuesday, February 17, 2015

Thread pools, event dispatching, and avoiding sychronization

Writing ad hoc threaded code is almost always more difficult than it is worth. You might be able to get it to work, but the next guy in your code is probably going to break it by accident, or find it so hard to deal with, they will want to rewrite it. So you want some kind of simplifying policy about how to develop threaded code. It needs to perform well (the primary reason for doing threading in the first place, other than dealing with blocking), require little mental overhead or boiler plate code, and ideally be able to be used by more junior developers than yourself.

Functional parallelism only extracts a small fraction of available performance, at least in larger, "interesting" problems. Data parallelism is often better at extracting more performance. In the case of large online game servers, the ideal thing to parallelize across are game entities. There are other types of entities that we can generalize to, as well, once this is working.

Here's what I've come up with...

Generally, you want to avoid ticking entities as fast as possible. While this makes a certain amount of sense on the client (one tick per graphics frame) to provide the smoothest visual experience, on the server it will consume 100% of the CPU without significantly improving the player's experience. One big downside is that you can't tell when you need to add more CPUs to your cluster. So I prefer event driven models of computing. Schedule an event to occur when there is something necessary to do. In some cases, you may want to have periodic events, but decide what rate is appropriate, and schedule only those. In this way, you can easily see the intrinsic load on a machine rise as more work (entities) are being handled. So the core assumption is: we have a collection of entities, handling a stream of events.

We built an event dispatcher. It is responsible for distributing events (that usually arrive as messages) to their target entities. I've discussed a number of means to route those messages and events in other posts. When an event is available, the system has the entity consume the event, and runs an event handler designed for that type of event.

You don't want to spawn a thread per entity. There could be thousands of entities per processor, and that would cause inefficient context switching, and bloat memory use for all those threads sitting around when only a few can run at a time. You also don't want to create and destroy threads every time an entity is created or destroyed. Instead, you want to be in control of the number of threads spawned regardless of the workload. That allows you to tune the number of threads for maximum performance and adjust to the current state of affairs in the data center. You create a thread pool ahead of time, then map the work to those threads.

It would be easy enough to use an Executor (java), and Runnables. But this leads to an unwanted problem. If there are multiple events scheduled for a single entity, this naive arrangement might map two events for the same target to different threads. Consequently, you would have to put in a full set of concurrency controls, guarding all data structures that were shared, including the entity itself. This would counteract the effect of running on two threads, and make the arrangement worthless.

Instead, what I did was to create a separate event queue per entity, and make a custom event dispatcher that pulled only a single event off the queue. The Runnable and Executor in this better set up are managing entities (ones with one or more events on their private queues). Any entity can be chosen to run, but each entity will only work on a single event at a time. This makes more efficient use of the threads in the pool (no blocking between them), and obviates the need for concurrency control (other than in the queues). Event handlers can now be written as if they are single threaded. This makes game logic development easier for more junior programmers, and for those less familiar with server code or with Java (e.g. client programmers).

Obviously, this applies to data that is owned by a single entity. Anything shared between entities still needs to be guarded. In general, you want to avoid that kind of sharing between entities, especially if you have a large scale game where entities might be mapped across multiple machines. You can't share data between two entities if they are in different processes on different machines (well there is one way, but it requires fancy hardware or device drivers, but that is another story). So you will be building for the distributed case anyway. Why not do it the same way when the entities are local?

I found a number of posts about executors and event dispatching that touched on these ideas, but there was nothing official, and there seemed to be a lot of debate about good ways to do it. I'm here to say it worked great. I'd love to post the code for this. Again, maybe when Quantum is available, you'll see some of this.



Monday, February 16, 2015

Dynamic Serialization of Messages

Call it message marshalling, streaming, or serialization, you'll have to convert from a message data structure to a series of bytes and back, if you want to send a message between two processes. You might cast the message pointer to a byte pointer, take the size and slam the bytes into a the socket, but you won't be able to deal with different cpu architectures (byte ordering), there may be hidden padding between fields, or unwanted fields, there may be tighter representations on the wire than you have in memory. There are many reasons to have some kind of serialization system. The following version has some of the best trade offs I've come across. It was partially inspired by simple json parsers like LitJSON, and partially by a library called MessagePack, which discover the fields of a data structure automatically.

So I kind of hate the boiler plate and manual drudgery of ToStream/FromStream functions added to each message subclass. It always seemed like there should be an automatic way of implementing that code. Google Protocol Buffers, or Thrift and others make you specify your data structures in a separate language, then run a compiler to generate the message classes and serialization code. That always seemed clumsy to me, and was extra work to deal with excluding the generated files from source control, more custom stuff in your maven, makefile, or ms proj files. Plus I always think of the messages as *my* code, not something you generate. These are personal preferences that led to the energy needed to come up with the idea, not necessarily full justification for what resulted. In the end, the continued justification is that it is super easy to maintain, and has a very desirable side effect of fixing a long standing problem of mismatching protocols between client and server. So here's the outline (maybe I'll post the code as part of the Quantum system some day).

We have a large number of message classes, but are lazy, and don't want to write serialization code, and we always had bugs where the server's version of the class didn't match the client's. The server is in Java, and the client is in C#. Maybe the byte order of the client and server are different. Ideally, we could say: Send(m), where m is a pointer to any object, and the message is just sent. Here's how:
- Use introspection (Java calls it Reflection), to determine the most derived class of m, if you have a reflection based serializer constructed for that type, use it, else create one.
- To create one, walk each field of the type, and construct a ReaderWriter instance for that type, appending it to list for the message type. Do this recursively. ReaderWriter classes are created for each atomic type, and can be created for any custom type (like lists, dictionaries, or application classes that you think you can serialize more efficiently). Cache the result so you only do this once. You may want to do this for all message types ahead of time, but that is optional. You could do it on the first call to Send(m)
- As you send each message, find its serializer, and hand the message instance in. The system will walk the list of ReaderWriters and will serialize the whole message. To make this work, the ReaderWriter classes must use reflection to access the field (by reading during send, and by writing during message arrival). This is pretty easy to do in Java, and C#, or interpretted languages like Python, Lua and Ruby. C++ would be a special case where code generation or template and macro tricks would be needed. Or good old fashioned boiler plate. Sigh.
- As a message arrives, you allocate an instance of the type (again, using reflection), then look up the serializer, and fill it in from the byte buffer coming in.

This works fine across languages and architectures. There is a small performance hit in reading or setting the fields using reflection, but it is not too bad, since you only scan the type once at startup, and you keep the accessor classes around so you don't have to recreate anything for each message.

Once you have all this metadata about each message type, it is easy to see how you can make a checksum that will change any time you modify the message class definition. That checksum can be sent when the client first connects to the server. If you connect to an old build, you will get a warning or be disconnected. It can include enough detail that the warning will tell you exactly which class doesn't match, and which field is wrong. You may not want this debugging info in your shipping product (why make it easy for hackers by giving them your protocol description), but the checksums could be retained. The checksum would include the message field names, and types, so any change will trigger a warning. We chose to sort the field serialization alphabetically, and ignored capitalization. That way differences in field order on the client and server didn't matter, and capitalization differences due to language naming conventions were ignored. And atomic types were mapped appropriately.

Another consideration was to delay deserialization as long as possible. That way intermediate processes (like an Edge Server) didn't have to have every message class compiled in. Message buffers could be forwarded as byte buffers without having to pay deserialization/serialization costs. This also allowed deserialization to occur on the target thread of a multi-threaded event handling system.

One necessary code overhead in this system is that the class type of each arriving message has to be pre-registered with the system, otherwise we can't determine which constructor to run with reflection, and we don't know which reflection based serializer to use (or construct, if this is the first use of it since the app started). We need a mapping between the message type identifier in the message header, and the run time type. This registration code allows message types to be compact (an enum, or integer), instead of using the type name as a string (which could be used for reflection lookup, but seemed too much overhead per message send). It has a nice side effect, which is we know every type that is going to be used, so the protocol checking system can make all the checksums ahead of time and verify them when a client makes a connection (instead of waiting for the first instance of each message to detect the mismatch). We might have been able to make use of message handler registration to deduce this message type list, and ignore any messages that arrived that had no handlers.

Some of these features exist in competing libraries, but not all of them were available when we built our system. For example, MessagePack didn't have checksums to validate type matching.


Wednesday, August 15, 2012

Time Management and Synchronization

A key principle of online games is synchronization. Keeping the various players in synch means giving them practically identical and consistent experiences of the virtual world. The reason this is hard is that they are separated by network delays, and worse, those delays are variable.

The ideal situation would be that all players saw exactly the same thing, and that there were no delays. How do we approach that ideal? Consider a use case: there is an observer that is watching two other players (A and B) racing toward a finish line. The observer should see the correct player cross first, and see that happen at the correct time delay from the beginning of the race. To make that sentence make sense, we need a consistent notion of time. I call this Virtual Time, and my views are drawn from computer science theory of the same name, and from the world of Distributed Interactive Simulation.

Let's say the start of the race is time zero. Both racers' clients make the start signal go green at time zero, and they begin accelerating. They send position updates to each other and the observer throughout the race, and try to render the other's position "accurately". What would that look like? Without any adjustment, on A's screen, he would start moving forward. After some network latency, B would start moving forward, and remain behind A all the way to the finish line. But on B's screen, B would start moving forward. After some network latency, B would start receiving updates from A, and would render him back at the start line, then moving forward, always behind. Who won? The Observer would see A's and B's updates arrive at approximately the same time, and pass the finish line at approximately the same time (assuming they accelerate nearly equally). Three different experiences. Unacceptable.

Instead, we add time stamps to every update message. When A starts moving the first update message has a virtual time stamp of zero. Instead of just a position, the update has velocity, and maybe acceleration information as well as the time stamp. When it arrives at B, some time has passed. Let's say it is Virtual Time 5 when it arrives. B will dead reckon A using the initial position, time stamp, and velocity of that time zero update to predict A's position for time 5. B will see himself and A approximately neck and neck. Much better. The Observer would see the same thing. As would A (by dead reckoning B's position).

This approach still has latency artifacts, but they are much reduced. For example, if A veered off from a straightline acceleration to the finish line, B would not know until after a little network delay. In the mean time, B would have rendered A "incorrectly". We expect this. When B gets the post-veering update, it will use it, dead reckon A's position to B's current Virtual Time, and compute the best possible approximation of A's current position. But the previous frame, B would have rendered A as being still on a straight line course. To adjust for this unexpected turn, B will have to start correcting A's rendered position smoothly. The dead reckoned position becomes a Goal or Target position that the consumer incrementally corrects for. If the delta is very large, you may want to "pop" the position to the correct estimate and get it over with. You can use the vehicle physics limits (or a little above) to decide how rapidly to correct its position. In any case, there are lots of ways to believably correct the rendered position to the estimated position.

Note that during the start of the race, at first B would still see A sitting still for a period of time, since it won't get the first update until after a network delay. But after the first delay, it will see that the estimated position should be up near where B has already moved itself. The correction mechanism will quickly match A's rendered position to the dead reckoned one. This may be perceived as impossibly large acceleration for that vehicle. But incorrect acceleration is less easily detected by a person than incorrect position, or teleporting.

Another use case is if A and B are firing at each other. Imagine that B is moving across in front of A, and A fires a missile right when B appears directly in A's path. Let's assume that the client that  owns the target always determines if a missile hits. If we were not using dead reckoning, by the time the missile update arrived back at B, B would have been 2 network delays off to the side, and the missile would surely miss. If we are using dead reckoning, A would be firing when B's estimated position was on target. There might be some discrepancy if B was veering from the predictable path. But there is still one more network latency to be concerned about. By the time the firing message arrives at B, B would be one network latency off to the side. B could apply some dead reckoning for the missile, flying it out a few Virtual Time ticks, but the angle from the firing point might be unacceptable, and would definitely not be the same as what A experienced.

A better, more synchronized, more fair experience would be to delay the firing of the missile on A, while sending the firing message immediately. In other words, when A presses the fire button, schedule the execution of the fire event 5 Virtual Time units in the future. Send that future event to B, and to itself. On both A and B, the fire event will occur simultaneously. I call this a "warm up" animation. The mage winds up, the missile smokes a bit before it fires, or whatever. The player has to learn to account for that little delay if they want the missile to hit.

If you have a game where firing is instantaneous (a rifle, or laser), this is a pretty much impossible problem. You have to put in custom code for this. The shooter decides ahead of time where the bullet will hit, and tells the target. By the time the target is told, both shooter and target have moved, so the rendering of the bullet fly out is not in the same location in space, and will possibly go through a wall. The point of impact is also prone to cheating, of course. I don't like games that have aiming for this reason. They are very difficult to make fair and cheat proof. I prefer target based games. You select a target, then use your weapons on it. The distributed system can then determine whether it worked. Aiming, and determining hits, shooting around corners, etc. is a tough problem. FPS engines are really tough. You should seriously consider whether your game needs to bite off that problem, or if you can make it fun enough using a target based system, and a little randomness. Then add decorative effects that express what the dice said.

The key take away with Time Management, Virtual Time, and synchronization is that whichever client actions are occurring on (A, B, or Observer), they occur in the same time axis. There are nice time synchronization algorithms (look up NTP), that rely on periodically exchanging time stamps using direct messages. And you can solve a lot of problems using events that are scheduled to occur in the future based on Virtual Time timestamps. One challenge though is how to keep the simulation that is being run using Virtual Time in synch with animation and physics that you might be tempted to run using real time. Or what you think is real time. You may have a game that you can pause. How does that work? Do you freeze the animation or not? You are already dealing with different concepts of time. Consider adding Virtual Time. And the notion of Goal positions.

Tuesday, May 29, 2012

Let's Call it MLT.

I need a name for the Ultimate Online Game Framework we've been discussing so people know what I'm refering to (crazy idea, or something for that project). I want it to be the greatest thing in the world. And as everyone knows, that is a nice MLT (Mutton, Lettuce and Tomato sandwich). I was toying with calling it Quantum, because I like the thought of Quantum Entanglement to represent entity state replication. So maybe the Quantum Distributed Entity System (QDES), and later the Quantum this and that.

I'm also really stuck on what language to write it in. C++ or Java. I think it is critical that a game development team use the same language on client and server. Otherwise the team gets split in half, and winds up not talking enough. Being able to swap people onto what ever needs work is really important for keeping to a schedule. And hiring from two piles of expertise is just that much harder. Java has so many great libraries. But C++ is so much more popular in the game industry. On mobile, you've got Objective-C. If you're using Unity, you've got C#.

To be honest, most of your game logic should probably be written (at least initially) in your favorite scripting language. Java and C++ both support Python, Lua, Javascript for example. IKVM appears to allow Mono to be embedded in Java, so you could maybe even do C# on both sides. This argument (http://www.mono-project.com/Scripting_With_Mono) is certainly compelling. Ever heard of Gecko? It embeds a C++ compiler so you can compile and load/run C++ "scripts" at run time. The reason I bring up all these options is: you could implement client and server in different languages, but embed the same scripting engine in both sides, so the game developers get the benefit of reusing their staff on both sides.

If the architecture and design of QDES is awesome enough, someone might reimplement it in their favorite language anyway. Or port it to a very different platform that doesn't support whatever language choice we make. So maybe the choice I'm stuck on isn't that serious right now. Maybe I should just start with the architecture and design.

I'm drafting some high level requirements and am thinking I'll post them here on a "page" instead of a blog article. Not sure how comments and such would work with that however. But I wanted something stickier than a post and something I could update and have a single reference to. Same with the designs as they start to come.

Anyway. Thanks for being a sounding board. Maybe some of you and your friends will have time to contribute to (or try out) all this. (And I haven't completely forgotten about doing that tech survey to see what other similar projects are out there in the open source world. I found a few. Some are a little dusty. Some are a little off track from what I'm thinking.)

Friday, May 25, 2012

Essential Components of the New Framework

What are the components that are absolutely essential to get right in the Ultimate MMO Framework (still need a catchy name)? If had those components today, you'd be using them in your current game, even if you had to integrate them to a bunch of other stuff you have to get a full solution. If we could build those essential pieces, and they were free, I think they would slowly take over the industry, shouldering less effective solutions out of the way. If each piece can be made independent of the others, there is a better chance the system will be adopted even if someone thinks it isn't all a perfect fit.

It is tempting to start with message passing. After all, it isn't an online game without that. But there are a lot of great message passing systems. There might even be some decent event handling systems. And there are definitely some good socket servers that combine the two. While I see problems with these that could be improved, and I see missing features, I think message passing is a second priority. You could make do with an existing system, and swap it out later. Although, I don't know how you can get away from the assumption that you have a publish/subscribe api that the higher levels can use.

More essential is the Entity System. It is where game play programmers interact with your system, and when done well, much of the rest of the system gets abstracted away. The decisions made here are what enable a server to scale, a client to stay in synch, content to get reused in interesting ways, and game play to be developed efficiently. BTW, what I mean by an Entity is a game object. Something that represents a thing or concept in the game world. The term comes from discrete event simulation. Jumping ahead a bit, the Entity System needs to be Component oriented such that an Entity is aggregated from a collection of Components. The Entity State system is the basis of replication to support distributed computation and load balancing, and of persistence. Done well, the Entity System can be used for more than just visible game objects, but could support administrative and configuration objects as well.

Related, but probably separatable is the Behavior system. How do get Entities to do something, and how do they interact with one another? I don't mean AI, I mean behavior in the OO encapsulated-state-and-behavior sense. It will be interesting to distinguish between and tie together AI "plans", complex sequences of actions, and individual behavior "methods". And, of course, the question of languages, scripting and debugging land right here. (A second priority that relates to Behaviors is time management. How do you synchronize client and server execution, can you pause, can you persist an in-flight behavior?)

Content Tools are the next big ticket item. If the Framework is stable enough, a new game could be made almost exclusively using these tools. Close to 100% of your budget would be spent pushing content through them in some imaginary perfect world. These tools allow for rapid iteration and debugging across multiple machines, and multiple platforms. It is not clear how much of the tools can be made independent of Entity state and behavior choices.

What other systems are absolutely essential? They drive everything else? They don't already exist, and you feel like you always have to rebuild them?

What do you think? Wouldn't it be cool to have a standalone Entity System that you could use in your next game? What if it came with a collection of Components that solved a lot of major problems like movement, collision, visibility, containment?