Unity Issue Tracker - CPU-PLM On-Demand Baking leaks memory on Linux

Fixed

Fixed in 2023.2.0a17

Votes

0

Found in

2023.1.0a18

2023.2.0a1

Issue ID

UUM-19081

Regression

Yes

CPU-PLM On-Demand Baking leaks memory on Linux

Nov 11, 2022

Memory leaks when baking many times.

Repro:
* Open Tests\EditModeAndPlayModeTests\GI\LightBaker
* Open the Test Runner
* Bake all tests
* Observe memory allocation for the Unity process increases

{noformat}
Important
Re-enable LightBaker, ProgressiveLightmapper, and Performance tests when you fix this. Look for UUM-19081 in the source code.
{noformat}

Info from Pema:

We recently switched the CPU lightmapper to destroy all its state and recreate it on each bake, primarily to avoid leaking state between bakes. This was what introduced the leak. The CPU lightmapper relies on a library called OpenRL - if you haven't heard of it, it's an ancient piece of tech that uses LLVM to JIT 'shaders' into SIMD cpu code. It was basically dead on arrival, but has stuck around in Unity. One thing to note about it is that it is very tedious to rebuild, and doing so involves loading a 10 year old system image into a VM.

The reason I mention OpenRL is that I've narrowed the leak down to being (probably indirectly) caused by it. Near the start of a bake, we call OpenRLCreateContext on a background thread, and at the end of the bake call OpenRLDestroyContext on that same background thread. If remove all other calls that interact with OpenRL, the leak still happens. If I remove the OpenRLCreateContext call, the leak stops happening. Also, if I move the call to the main thread, it stops leaking as well.

Based on that, one might infer that the leak is caused by calling OpenRLCreateContext on a background thread at all, but this doesn't seem to be the case. In fact, if I move the the call further down the callstack, very close to the main function, it stops leaking, even when running on a background thread! I bisected a bit manually, but it is pretty tricky to do. It seems (?) to me like moving the call to somewhere after initial setup of Mono causes it start leaking.

I've tried several memory diagnostic tools, on 2 separate Linux machines, but none of them have worked for me:
Valgrind leakcheck: Segfaults on Mono initialization
Valgrind massif: Segfaults on Mono initialization or immediately
Bytehound: Segfaults immediately
Heaptrack: Segfaults on Mono initialization on one machine, gives complete bogus results on the other
Asan/UBsan: Segfaults immediately
DRMemory: Segfaults immediately
Mtuner: Tried building this on Linux for a few hours, but failed
I tried each of these both with and without -systemallocator flag.

Considering how many tools segfault at around the same point, and that the leak doesn't happen if I put the repro before that point, ie. Mono initialization, my best guess is that Mono is corrupting the memory used by OpenRL somehow. Though in fairness, I haven't narrowed it down as much as I could. I made a few observations that seem to support my hypothesis:
The leak isn't 100% consistent. There are 2 scenarios: Either it never leaks in an editor session, or it leaks on every bake during an editor session. Which scenario you hit seems random, with about 80% chance of leaking or so.
It's leaking way more memory per bake than it should be allocating at all.
As I've described, whether it leaks is dependent on how far down the callstack you put the repro.
When the leak actually triggers a crash, the crash callstack seems to be in Mono related code most of the time. Sometimes it crashes even if you aren't out of memory yet, but are using like 60%.
IE. smells like some kind of memory corruption rather than just a simple leak

Search Issue Tracker

0

2023.1.0a18

2023.2.0a1

UUM-19081

Yes

CPU-PLM On-Demand Baking leaks memory on Linux

Add comment

All about bugs

Latest issues