Search Issue Tracker

Fixed

Votes

4

Found in

5.3.3f1

Issue ID

788356

Regression

No

[GLCore] Low performance due to synchronous OpenGL buffer access

OpenGL

-

This issue has no description.

Comments (2)

  1. Andy-Korth

    Apr 22, 2016 21:07

    Hi ---,

    Thanks for the reply and the explanation. I tested the projects in 5.3.4 and 5.4.0b14.

    The good news is that the synchronous waits that were seen in old traces are gone on the Intel HD 5000 and on the Radeon HD 6970M macs that we have tested on! So this has been fixed in 5.4.0b14 (and probably earlier, as you said). The overall performance of all the test cases is vastly improved on these computers! The new traces definitely reflect the internal changes. Awesome!

    Unfortunately, I'm still getting a lot of stuttering on my MacBook Pro Geforce GT 650M. I focused most of my testing on 5.4.0b14, I figured that would be most useful to you, so all the examples and traces included are for that version unless otherwise noted.

    Update:
    Okay, so on the suggestion of a coworker, I updated the macbook pro to OS X 10.11, and all the problems disappeared. It went from quite stuttery to quite buttery smooth. Apple updates drivers with their system updates. So for the Geforce 650M, I guess this was a driver issue.

    I made a bunch of traces, but, in light of this development, you probably don't need to see them:
    http://kortham.net/temp/10.10-Short-ReplicationTraceUnity540b14.txt
    * Made in 10.10, exhibits performance issue.
    * Note this has a nice fast glFenceSync
    * There were delays anywhere in the whole trace other than CGLFlushDrawable (which reflects the fix since 5.3.3)
    * Huge CGLFlushDrawable time on the last frame, longer than a vblsync wait should ever be.

    In 10.11, the average CGLFlushDrawable is~ 350 microseconds, the absolute worst one is 8 ms, and there's just a handful above 3 ms. Even those are quire infrequent compared to 40 to 150 ms flushes that were happening with the old drivers.

    Thanks for your time, and hopefully my pain is helpful so people can be reminded to update their OS.

    -Andy

  2. Andy-Korth

    Apr 15, 2016 17:32

    I'd like to publicly share the bug report we filed:

    Low performance due to synchronous OpenGL buffer access

    Replication project and gl traces are included.

    We've been debugging an issue that occurs in the OpenGLCore renderer. The issue can be missed on faster machines since they can finish rendering a frame with enough remaining time to swallow the extra delay. On slower machines that should still be more than capable of rendering a game at a smooth framerate, significant stuttering or stalling can occur.

    Looking through an OpenGL trace from OS X's OpenGL profiler, it's easy to confirm that there is a synchronous wait at the beginning of each frame. The renderer reuses buffers from one frame to the next, and the first time it's used in a frame it executes a synchronous call to glMapBufferRange(). Even on a top of the line Mac, the render thread will regularly stall for 8 ms when rendering a scene with a single object. This was confirmed on several different machines.

    When using a buffer in a streaming manner, the renderer needs to either orphan the existing buffer by creating a new one, or using the discard flag. This looks like it's safe to do since the buffer is never mapped as readable. More information can be found here: https://www.opengl.org/wiki/Buffer_Object_Streaming

    By accessing the buffer synchronously, it's causing the GPU and rendering thread to run serially. The renderer cannot prepare a new frame while the GPU is drawing the previous, and the GPU has nothing to do while the render thread is preparing the next frame. Even in relatively mild cases, this causes severe stuttering and can compound with other delays.

    Our annotated trace is here:
    https://gist.github.com/andykorth/299f7be4626548aac34e4d2da23165c8

    The attached replication project was built and used to generate the above gl trace. Due to differences in GPUs and drivers, we haven't been able to make a replication project that demonstrates the issue *visually*. However, even drawing when single sprite, it is easy to replicate the cause of the problem when you look at the profile trace.

    Switching to the OpenGL2 renderer exhibits a similar synchronization issue that causes a visually identical stuttering problem on the same hardware, but shows up differently in the trace. (There's an explicit glFinishFenceAPPLE)

    Thanks for your time and all your great work on Unity! We love working with it!

Add comment

Log in to post comment

All about bugs

View bugs we have successfully reproduced, and vote for the bugs you want to see fixed most urgently.