Compute shaders SSBOs behaviour

Alessandro · November 11, 2022, 4:12pm

Hello there, here I am with something I don’t understand about multiple compute shaders run.
Here’s the code

  import org.intellij.lang.annotations.Language
  import org.openrndr.application
  import org.openrndr.color.ColorRGBa
  import org.openrndr.draw.*
  import org.openrndr.extra.noise.uniform
  import org.openrndr.math.Vector2
  
  
  fun main() = application {
      configure {
          width = 1000
          height = 1000
          // fullscreen = Fullscreen.CURRENT_DISPLAY_MODE
      }
      program {
          var t = 0.0
  
          @Language("GLSL")
          val glslA = """
              #version 430
      
            layout(local_size_x = 4, local_size_y = 1) in;
            
            uniform float t;
            
            layout(std430, binding=0) buffer particlesColor {
               float col[];
           };
          
            
            void main(){
            
              const uint id = gl_GlobalInvocationID.x + gl_GlobalInvocationID.y * 4;
              col[id] = abs(cos(t * 2.0 * id));
             
            }
          """.trimIndent()
  
          val computeA = ComputeShader.fromCode(glslA, "computeA")
  
          @Language("GLSL")
          val glslB = """
              #version 430
      
            layout(local_size_x = 4, local_size_y = 1) in;
            
            uniform float t;
           
           struct Part {
            vec2 pos;
           };
     
           
           layout(std430, binding=1) buffer particlesBuffer {
               Part positions[];
           };
           
            void main(){
              const uint id = gl_GlobalInvocationID.x + gl_GlobalInvocationID.y * 4;
              positions[id].pos.x = 0.5 + 0.2 * sin( id * 0.1 * t);
              positions[id].pos.y = 0.5 + 0.2 * cos( id * 0.1 * t);
              
            }
          """.trimIndent()
  
          val computeB = ComputeShader.fromCode(glslB, "computeB")
  
  
  
  
          val bufferA = shaderStorageBuffer(shaderStorageFormat {
              member("cols", BufferMemberType.FLOAT, 4)
          }).also{
              it.put {
                  for (i in 0 until 4) {
                      write((1.0).toFloat())
                  }
              }
          }
  
          val bufferB = shaderStorageBuffer(shaderStorageFormat {
              member("parts", BufferMemberType.VECTOR2_FLOAT, 4)
          }).also{
              it.put {
                  for (i in 0 until 4){
                      write(Vector2.uniform(0.0, 1.0))
                  }
              }
          }
  
  
          //computeA.buffer("particlesBuffer", bufferB)
          computeA.buffer("particlesColor", bufferA)
          computeB.buffer("particlesBuffer", bufferB)
  
  
          extend {
              computeA.uniform("t", t)
              computeB.uniform("t", t)
  
              drawer.stroke = null
              drawer.fill = ColorRGBa.WHITE
              drawer.shadeStyle = shadeStyle {
                  fragmentTransform = """
                      vec2 uv = c_boundsPosition.xy;
                      
                      float c = 0.0;
                      for (int i = 0; i < b_parts.parts.length(); i++){
                          float l = length(b_parts.parts[i] - uv);
                          float d = 1.0 - smoothstep(0.01, 0.081, l);
                          c += d * b_cols.cols[i];
                      }
                      
                      x_fill.rgb = vec3(c);
                      
                  """.trimIndent()
  
                  buffer("cols", bufferA)
                  buffer("parts", bufferB)
  
              }
  
              computeA.execute(1, 1)
              computeB.execute(1, 1)
              drawer.rectangle(drawer.bounds)
              t += 0.01
          }
      }
  }

Here’s a little bit of context. I have two compute shaders, computeA and computeB. The shader computeA takes care of writing on a buffer called bufferA which controls the brightness, and the shader computeB writes on a buffer called bufferB which controls the position of the each particle. These particles are then rendered in the fragment shader. Notice that computeA does not have binding for bufferB, and computeB does not have a binding for bufferA.
Now, this is the strange thing that happens: if you run the code, you’ll see the particles moving but not “blinking”. On the other hand, if you uncomment the line

//computeA.buffer("particlesBuffer", bufferB)

the blinking will appear, even though, as said above, computeA has no binding to bufferB.
I suspect this has to do with some weird memory management behaviour, but it is nevertheless very confusing. For some added fun, just swap the lines

computeA.buffer("particlesColor", bufferA)
computeA.buffer("particlesBuffer", bufferB)

and you’ll get a different behaviour…
Let me know if you can replicate the issue.

Alessandro · November 14, 2022, 7:37pm

I think I have now an idea of what is going on here.
This is the code I showed at the 2nd OpenRNDR meetup in Berlin showing the issue

  import org.intellij.lang.annotations.Language
  import org.openrndr.application
  import org.openrndr.color.ColorRGBa
  import org.openrndr.draw.*
  import org.openrndr.extra.noise.uniform
  import org.openrndr.math.Polar
  import org.openrndr.math.Vector2
  import org.openrndr.math.Vector4
  
  
  fun main() = application {
      configure {
          width = 1000
          height = 1000
      }

      program {
          var t = 0.0
          val nBalls = 4
          val computeWidth = 1
          val computeHeight = 1
          @Language("GLSL")
          val glslA = """
              #version 430
      
            layout(local_size_x = $nBalls, local_size_y = 1) in;
            
            uniform float t;
            
            struct Col {
               vec4 c;
            };
            
            layout( binding = 0 ) buffer particlesColor {
               Col col[];
           };
  
            void main(){
            
              const uint id = gl_GlobalInvocationID.x + gl_GlobalInvocationID.y * $nBalls;
              col[id].c = vec4(1.0, 1.0, 0.5 - 0.3 * cos(t * 1.2 * (1 + id)), 1.0);         
             
            }
          """.trimIndent()
  
          val computeA = ComputeShader.fromCode(glslA, "computeA")
  
          @Language("GLSL")
          val glslB = """
              #version 430
      
            layout(local_size_x = $nBalls, local_size_y = 1) in;
            
            uniform float t;
           
           struct Part {
               float[2] pos;
           };
           
           struct Prop {
               float[2] vel;
           };
           
           layout( binding = 14) buffer particlesBuffer {
               Part[4] positions;
           };
           
           layout( binding = 23 ) buffer particlesVels {
               Prop[4] velocities;
           };
           
            void main(){
              const uint id = gl_GlobalInvocationID.x + gl_GlobalInvocationID.y * $nBalls;
              Part p = positions[id];
              Prop pv = velocities[id];
              vec2 pos = vec2(p.pos[0], p.pos[1]);
              vec2 vel = vec2(pv.vel[0], pv.vel[1]);
              
              pos += vel;
             
             if (pos.x < 0.0){
               pos.x = 0.0;
               vel.x = -vel.x;
             };
             
             if (pos.x > 1.0){
               pos.x = 1.0;
               vel.x = -vel.x;
             };
             
             if (pos.y < 0.0){
               pos.y = 0.0;
               vel.y = -vel.y;
             };
             
              if (pos.y > 1.0){
               pos.y = 1.0;
               vel.y = -vel.y;
             };
             
             positions[id].pos[0] = pos.x;
             positions[id].pos[1] = pos.y;
             velocities[id].vel[0] = vel.x;
             velocities[id].vel[1] = vel.y;
              
            }
          """.trimIndent()
  
          val computeB = ComputeShader.fromCode(glslB, "computeB")
  
  
          val bufferA = shaderStorageBuffer(shaderStorageFormat {
              member("cols", BufferMemberType.VECTOR4_FLOAT, nBalls)
          }).also{
              it.put {
                  for (i in 0 until nBalls){
                      write(Vector4(1.0, 0.0, 0.0, 1.0))
              }
          }
              }
  
          val bufferB = shaderStorageBuffer(shaderStorageFormat {
              member("pos", BufferMemberType.VECTOR2_FLOAT, nBalls)
          }).also{
              it.put {
                  for (i in 0 until nBalls) {
                      write(Vector2(0.5, 0.5)) }
              }
          }
  
          val bufferC = shaderStorageBuffer(shaderStorageFormat {
              member("vels", BufferMemberType.VECTOR2_FLOAT, nBalls)
          }).also{
              it.put {
                  for (i in 0 until nBalls) {
  
                      write(Vector2.fromPolar(Polar(Double.uniform(0.0, 360.0), 0.01)))
                  }
              }
          }

          computeA.buffer("particlesColor", bufferA)
          //computeB.buffer("particlesColor", bufferA)
          computeB.buffer("particlesBuffer", bufferB)
          computeB.buffer("particlesVels", bufferC)
  
          extend {
              computeA.uniform("t", t)
              computeB.uniform("t", t)
  
              drawer.stroke = null
              drawer.fill = ColorRGBa.WHITE
              drawer.shadeStyle = shadeStyle {
                  fragmentTransform = """
                      vec2 uv = c_boundsPosition.xy;
                      
                      vec4 c = vec4(0.0);
                      for (int i = 0; i < b_parts.pos.length(); i++){
                          float l = length(b_parts.pos[i] - uv);
                          
                          float d = 1.0 - smoothstep(0.1, 0.18, l);
                          c += d * b_cols.cols[i];
                      }
                      float d = smoothstep(0.1, 0.4, dot(c.xyz, vec3(1.0))/4);
                      vec3 col =  c.xyz;
                      x_fill.rgb = col;
                      
                  """.trimIndent()
  
                  buffer("cols", bufferA)
                  buffer("parts", bufferB)
  
              }
  
  
              computeA.execute(computeWidth, computeHeight)
              computeB.execute(computeWidth, computeHeight)
  
              drawer.rectangle(drawer.bounds)
              t += 0.01
          }
      }
  }

I think what it is happing here is that different instances of ComputeShader keep an internal counter for binding points for SSBOs (here there’s a nice diagram about block indices and binding points), and at the same time they override the binding points in the compute shaders’ layout (probably due to this). More specifically, the following lines

computeA.buffer("particlesColor", bufferA)
//computeB.buffer("particlesColor", bufferA)
 computeB.buffer("particlesBuffer", bufferB)
 computeB.buffer("particlesVels", bufferC)

will do the following

computeA will set its internal binding counter to 0 and bind bufferA to binding point 0;
computeB will set its internal binding counter to 0 and bind bufferB to binding point 0;
computeB will increase by 1 its internal binding counter and bind bufferB to binding point 1.

Most probably the binding happens via glBindBufferBase writing over GL_SHADER_STORAGE_BUFFER, and the counter is not shared amongst the instances of ComputeShader. To support this, try substituting the above lines with the following

computeB.buffer("particlesColor", bufferA)
computeA.buffer("particlesColor", bufferA)
computeB.buffer("particlesBuffer", bufferB)
computeB.buffer("particlesVels", bufferC)

According to my educated guess above, these line will do the following

computeB will set its internal binding counter to 0 and bind bufferA to binding point 0;
computeA will set its internal binding counter to 0 and bind bufferA to binding point 0;
computeB will increase by 1 its internal binding counter and bind bufferB to binding point 1.
computeB will increase by 1 its internal binding counter and bind bufferB to binding point 2.

and indeed, if you run the code you will see that you get the expected behavior.
If you have occasion to run the code or have a counter-example to my reverse engineering, let me know

abe · November 15, 2022, 1:25pm

In my programs I always call .buffer before .execute and that seems to produce the expected result.

          extend {
              computeA.buffer("particlesColor", bufferA)
              computeA.uniform("t", t)
              computeA.execute(computeWidth, computeHeight)

              computeB.buffer("particlesBuffer", bufferB)
              computeB.buffer("particlesVels", bufferC)
              computeB.uniform("t", t)
              computeB.execute(computeWidth, computeHeight)
              ...

Alessandro · November 15, 2022, 3:16pm

Oooh, great! Now I’m curious to know if .execute sets somewhere the binding index starting point for the next compute shader In any case, this could go on the documentation somewhere? Probably even a pointer to this post would do.

abe · November 15, 2022, 3:54pm

This is the file to study openrndr/ComputeShaderGL43.kt at master · openrndr/openrndr · GitHub

Yes, the compute shader page in the guide needs an update It doesn’t even mention SSBOs.

Maybe in our next (first) Jam in Berlin I could show how to update the guide

Alessandro · November 15, 2022, 4:52pm

Yeah, that’s a good idea for the first Jam!

Topic		Replies	Views
New computeStyle builder Tutorials	1	45	January 31, 2025
Particle system with compute shader How to?	3	2964	November 7, 2019
Unexpected behavior of instancing GLSL Shaders	3	333	November 5, 2022
Compute Shaders and buffers How to?	8	688	October 23, 2024
A (cheap) boid system with compute shaders GLSL Shaders	3	591	October 20, 2022

Compute shaders SSBOs behaviour

Related topics