Monday, March 16, 2009

Debugging leaked references

Everyone likes using smart pointers and reference counting to manage the lifetime of objects that are passed through modular interfaces. They can avoid a lot of copying and callers don't have to read a lot of documentation to figure out if they are responsible for deleting the object.

One big problem with using ref counted objects is when they "leak". It is extremely difficult to debug what code has taken an extra reference to an object that you think should have a single remaining reference. Why didn't this object destroy itself when I cleared this "last" smart pointer? When an object is passed through many layers and stored in various containers, there can be quite a few increment/decrement references occurring, so putting in breakpoints is not very useful. I have found this to be one of the most difficult and tedious problems to fix in a large application like a game.

Regular memory leak detectors are not too useful, since they just say "it leaked", not why. What you really want to know is the location of every outstanding reference to a block that you think should have already been deleted. You can ask this question at the end of a run on the unexpectedly outstanding blocks, or in the middle of a run when you have found something strange and want to backtrack.

It is "easy" but not much use getting the memory address of every smart pointer that points at the problem object. Make multi-map keyed on object addresses. Each time you assign or clear a smart pointer (increment or decrement a reference on an object), modify that map to include or exclude the address of that smart pointer. Done.

Making this "useful" is a matter of recording information about each smart pointer for later logging. Ideally, you would record a partial stack trace so you can see where the smart pointer was affected and what caused that. This requires another chunk of data beyond the smart pointer's memory location.

If you are only concerned about extra references, you could probably get away with recording the stack trace only on addRef. Note that for each smart pointer, there is only a single outstanding reference, so you only need the one stack trace per smart pointer.

If you have a situation where you have too many decRefs, you may need to record the stack trace of each addRef and decRef for the broken object. Garbage collecting that data is going to be interesting, because you can never be sure when an extra decRef might happen.

Putting this together, you can use regular memory leak detection to identying unexpectedly outstanding ref counted objects. Then ask this ref count debugging system for the stack trace(s) of the where the smart pointers where initialized that are still holding a reference to that object.

No comments:

Post a Comment