The 49 GB Was a Cache, Not a Leak

Watching Activity Monitor while Aoede synthesized a long book, I saw its memory climb to 49 GB. That looks like a catastrophic leak, the kind that ends with the app getting killed. It was not a leak at all.

The cause was MLX, the framework running Kokoro on the GPU. It keeps freed buffers in an uncapped reuse pool, which grows to match the peak working set of synthesis and then just sits there. The memory was reclaimable the whole time, never lost, but an app that idles at tens of gigabytes is still a bad neighbor on the machine. The fix is small: cap the pool at 512 MB when the model loads, and clear it when the engine goes idle. Live memory during synthesis is tiny, so the footprint drops to 1 to 2 GB with no change in behavior or speed.

The lesson I keep relearning is that a number going up is not the same as a number leaking. The honest fix was not chasing a phantom leak but telling the framework how much cache I was willing to pay for.