Simple GPU Path Tracing, Part 10 : Little Optimizations

 

So I tried to Render some shots of the famous Sponza scene with our renderer, and it turned out to be very slow... So in this part, I just wanted to refactor some bits and bobs to speed up the path tracer a little bit.

It's going to be some quite random changes that will help the performances and memory footprint of our app.

Struct packing

For now our BVH structs are quite big, and there are some improvements we can make on that, especially on the triangles and the bvhNode structs because they're the one that are going to take most space as there are many many triangles and blas nodes in a scene.
 
struct triangle
{
    glm::vec4 PositionUvX0;
    glm::vec4 PositionUvX1;
    glm::vec4 PositionUvX2;
   
    glm::vec4 NormalUvY0;
    glm::vec4 NormalUvY1;
    glm::vec4 NormalUvY2;
   
    glm::vec4 Tangent0;
    glm::vec4 Tangent1;  
    glm::vec4 Tangent2;
   
    glm::vec3 Centroid;
    float padding3;
};

struct bvhNode
{
    glm::vec3 AABBMin;
    float LeftChildOrFirst;
    glm::vec3 AABBMax;
    float TriangleCount;
    bool IsLeaf();
};
 
I've now packed all the triangle data under the triangle struct, getting rid of the triangleExtraData struct.
I also packed the LeftChildOrFirst and TriangleCount fields of the bvhNode as float values, reducing its size by 16 bytes.

This change didn't impact performance, and reduced the gpu memory usage a bit.

Bsdf & Light Sampling

At the moment, we decide on wether we sample a light or the bsdf lobe based on a random number. We do 
if(RandomUnilateral(Isect.RandomState) < 0.5f)
The aim behind this is to sample one or the other with a 0.5 probability. What we could do is sample one or the other every other frame, which would have the exact same result, but without having to generate a random number.
There's a big performance gain behind this change, because now all the rays generated in an area will be sampling either the bsdf or the light, and the generated next rays will all likely be in about the same direction. This will cause the shader execution to be more coherent and faster. 
I'm not sure it's 100% correct to do that, but I didn't see a difference in the render, and it caused an almost 2x speed up, so I'll take that !
Here's the new code : 
if(GET_ATTR(Parameters, CurrentSample) % 2 ==0)

I also tried to optimize the bsdf and lighting sampling routine so that the direction sampling also returns the pdf of the generated direction, but didn't see a substantial change in performances, so I kept it the way it was before. Maybe it's totally wrong to do it that way, I don't know.


Thoughts for future optimizations

There are other things that could speed up our path tracers, that I may implement in the future if I find the time : 
  • If the camera stays static, we don't need to re-generate primary rays every frame. We could just store the first ray-scene hit information for each pixel, and use that as a base for subsequent frames. This would spare quite a lot of scene intersections, but would require more memory (we would need to store another float4 image)
  • Wavefront path tracing : Instead of having a bounce recursion inside of our gpu kernel, we could have multiple kernel invocations, each one processing a set of scene bounces and storing that into gpu buffers. That's how a lot of gpu path tracers are implemented (pbrt v4 and Blender's Cycles for example), so there must be a reason for it! However it would become quite a complex system, and will not remain a "Simple" path tracer as the title of our series disclaims. (Here are some links for reference)

 

Next post : Simple GPU Path Tracing, Part 11 : Multiple Importance Sampling 

Commentaires

Articles les plus consultés