Some weeks ago I decided to fix once and for all this bloody hdrr. As it requires the framebuffer average log luminance, I decided to compute it be generating a pixel accurate mipmap. So for a 320×240 framebuffer we will have the following downsampling steps : (4,4) (4,4) (5,5) (4,3).

Don’t ask me why but instead of working on a way to decompose a number, I decided that I needed to find the greatest common divisor of width and height! 10 seconds later thanks to the almighty web search engine I started reading the Wikipedia entry about Binary GCD. And after some interweb jumps I ended on this article. The Steinâ€™s algorithm looked cool.

If you look at Sputsoft code, you’ll realize that the **shift_to_uneven** function removes trailing zeros. This can be optimized using GCC builtins. The builtins we need is **__builtin_ctz** that counts trailing zeros.

We can then replace **shift_to_uneven** by:

u >>= __builtin_ctz(u);

And the GCD function now looks like this:

uint32_t gcd( uint32_t c, uint32_t d )

{

uint32_t u,v,s;

u = __builtin_ctz(c);

v = __builtin_ctz(d);

s = (u < v) ? u : v;
u = c >> u;

v = d >> v;

while(u != v)

{

if(u > v)

{

u -= v;

u >>= __builtin_ctz(u);

}

else

{

v -= u;

v >>= __builtin_ctz(v);

}

}

return u << s;
}[/sourcecode]
Tadam!

You have an infinite loop for gcd(x, 0), x != 0.

Yep, I forgot to add a simple sanity check.

And to make things worse __builtin_ctz(d) result is undefined for d = 0 (source).