Some weeks ago I decided to fix once and for all this bloody hdrr. As it requires the framebuffer average log luminance, I decided to compute it be generating a pixel accurate mipmap. So for a 320×240 framebuffer we will have the following downsampling steps : (4,4) (4,4) (5,5) (4,3).
Don’t ask me why but instead of working on a way to decompose a number, I decided that I needed to find the greatest common divisor of width and height! 10 seconds later thanks to the almighty web search engine I started reading the Wikipedia entry about Binary GCD. And after some interweb jumps I ended on this article. The Stein’s algorithm looked cool.
If you look at Sputsoft code, you’ll realize that the shift_to_uneven function removes trailing zeros. This can be optimized using GCC builtins. The builtins we need is __builtin_ctz that counts trailing zeros.
We can then replace shift_to_uneven by:
u >>= __builtin_ctz(u);
And the GCD function now looks like this:
uint32_t gcd( uint32_t c, uint32_t d )
{
uint32_t u,v,s;
u = __builtin_ctz(c);
v = __builtin_ctz(d);
s = (u < v) ? u : v;
u = c >> u;
v = d >> v;
while(u != v)
{
if(u > v)
{
u -= v;
u >>= __builtin_ctz(u);
}
else
{
v -= u;
v >>= __builtin_ctz(v);
}
}
return u << s;
}[/sourcecode]
Tadam!
You have an infinite loop for gcd(x, 0), x != 0.
Yep, I forgot to add a simple sanity check.
And to make things worse __builtin_ctz(d) result is undefined for d = 0 (source).