This "gl-context vs main thread" is not new. OGL was not designed to work multithreaded
externally. Internally it does a lot of multithreading, but you can not control it, unless you leave OGL and use Vulcan (or alike).
There is a mayor rule: Only a thread can be set as current to a gl-context. So far so good. What's the issue? OGL is designed for drawing, so it's pefectly logic that you associate a gl-context to a window. again, so far so good. The issue arises when not only OGL is using the screen, but the OS does. And likely the OS wants all drawing done by it to be done in the main thread. For example, user controls (sliders, textctrls, buttons, etc) created from a secondary thread will probably may the OS to complain.
This seems the case with Big Sur, only the main thread can draw (read: use the graphics card) despite it's OS or OGL drawing. And remember Apple has deprecated OGL, so don't expect any care in Mac's OGL implementation. I don't know if they keep this only-main-thread in Metal too.
Let's see if we can do all OGL stuff in the main thread, at least mostly.
If everything happens in a millisecond, there's not problem at all. If it takes too much time, we can suffer UI blocking due to some needed
glFlush or
glFinish command. And surely we may allow the user to cancel current rendering. And thus we start thinking of mulithreading.
There are four jobs: 1) Prepare data for the GPU. This can be done in a common working thread. 2) Transfer data to the GPU. 3) The GPU renders, no user control on this. 4) Display or transfer buffers from GPU to RAM.
Jobs 2 and 4 are where we can play some tricks.
We have some strategy points:
A) Don't submmit too many comands to the driver at once. Perhaps we can measure how long a group of commands takes to complete, so we can tell a "good number" of commands; and we use in a loop the sequence 'some commands' ==>> 'yield for UI' with some groups of commands. This will allow to cancel the sequence at any time.
B) Transfer data is optimized by using "buffer streaming". See
https://www.khronos.org/opengl/wiki/Buf ... _Streaming and Chapter 28 of
http://openglinsights.com/
C) If we need to know when a GPU-job is done, and we don't want to wait (blocking CPU thread) we can use a "
fence" See
https://www.khronos.org/opengl/wiki/Synchronization and
https://www.khronos.org/opengl/wiki/Sync_Object
The trick is to use a timer in the main thread, let's say it fires every 0.17 seconds (moreless 60 fps). When the timer fires we use
glClientWaitSync with a 'timeout' around 0.05 seconds and read its return. If it returns 'GL_TIMEOUT_EXPIRED' we know the GPU is still working and thus we'll check again in the next timer event; but in the while we are not blocking the CPU, just those 0.05 seconds.
You see, a fence can be used in many places, including buffer data transfers.