Bugzilla – Bug 824
Last modified: 2015-03-04 17:06:32 UTC
using "acos" function near 1.0 provides a terrible inaccuracy in result angle. Small inacuracy of dot product or quaternion length produce awesome angle error.
I recommend to use atan2 function having good accuracy in whole range
fix can look like
const QuaternionBase<OtherDerived> d = *this * other.conjugate();
return static_cast<Scalar>( Scalar(2) * atan2( d.vec().norm(), d.w() ));
the "w()" must be positive for positive angle
atan2( d.vec().norm(), fabs(d.w()) ));
Yes, that's certainly more stable and I guess the extra cost is justifiable.
However, it should be:
abs(d.w()) instead of fabs
I'll submit that if nobody objects.
Because of the quaternion product it is more than 5 times slower, though I admit its superior accuracy.
We can use branch, and apply slower algorithm near 1.0 and -1.0, or provide both methods separately.
(In reply to comment #3)
> Because of the quaternion product it is more than 5 times slower, though I
> admit its superior accuracy.
I admit that I did not benchmark this -- I wrongly assumed that acos or atan would be the significant cost producers here.
Then the question is, if there are people for which the speed of this method is more important than the accuracy.
I'll make this dependent on bug 560 so maybe we can offer different paths for unit and non-unit quaternions.
Unfortunately, even for unit quaternions we still have numerical inaccuracies for quaternions that are close to each other.
(In reply to comment #4)
> We can use branch, and apply slower algorithm near 1.0 and -1.0, or provide
> both methods separately.
I'm not really in favor of adding a branch for this.
I think this method is likely to be applied to quaternions close to each other, so the slower branch would be applied most of the time. We'd also introduce a discontinuity for values near the branch.
I observed a significant decrease of accuracy only when the quaternions are very very close. In my benchmark I generated pairs of quaternions from a random angle-axis and a slightly perturbed one as follow:
float a1 = internal::random<float>(-M_PI, M_PI);
Vector3f v1 = Vector3f::Random().normalized();
float a2 = a1+internal::random<float>(-1e-6,1e6);
Vector3f v2 = (v1+1e-6*Vector3f::Random()).normalized();
Then I compute the angular distance using float and the two variants and I compare them to the result of the stable version using double precision. For 100000 runs and using the absolute difference as the error measure, I get:
acos-based ; atan2-based
% >1e-5: 0.53% ; 0
% >1e-6: 5.3% ; 0
max : 0.000716858 ; 3.66889e-07
sum : 0.0565818 ; 0.00610966
"% >1e-X" is the percentage of runs with an error greater than 1e-X. So that's not that bad.
A compromise would be to make angularDistance accurate using atan2, and offer a "cosineDistance" function which is enough when the distance is used for comparison to some thresholds, and if someone want speed, he has to take his responsibilities and simply do: std::acos(q1.cosineDistance(q2)).
For the record, the new function would return:
return min(Scalar(1), Scalar(2) * numext::abs2(q1.dot(q2)) - Scalar(1))
If the precision-loss is not that bad either, how about implementing the more precise function under a different name?
The precision loss does somehow depend on the form of error, however.
E.g., if you have two near-unit-quaternions whose product of norms is y with angular distance of x, the error of the current method is approximately
Hi all, sorry for pause.
Gael, there is a problem with your benchmark, cause it uses just normalized quaternions.
Often , there is a quaternions with small "magnitude" drift from 1.0. For example , after quaternion multiplication, lerping and so on. All this inaccuracy in quaternion length, will produce significant error in angle calculation. So i propose take into account not only acos() inaccuracy but quaternion length too.
And we return back to the discussion about unit quaternions.
(In reply to Michael Norel from comment #10)
> Often , there is a quaternions with small "magnitude" drift from 1.0.
That's exactly what I wanted to illustrate with y=norm(q1)*norm(q2) in comment 9.
After this changeset: https://bitbucket.org/eigen/eigen/commits/4984501c3db8/, I observe a factor 2x only between the two versions.
If I enforce a re-normalization of the input quaternion in the acos based version, then both versions are as fast (or as slow) to each other.
After-all, I'm not against changing for the accurate version by default, and, if needed, add a fast-version under a new name.
I would prefer accuracy over speed here.
And if the atan and acos versions are equally fast/slow, then your benchmark in comment 7 suggests that the atan2 path is preferable.
I'm also ok with backporting to 3.2
We could think about specializing the implementation of q1*q2.conjugate() and q1.conjugate()*q2, but I guess this is hardly worth the effort, now that conjugate() is vectorized.
Before vectorizing conjugate() I tried a fully vectorized version of angularAngle which is only marginally faster if any, so vectorizing conjugate product seems to be pointless.
For the record, here it is:
Packet4f a = q1.coeffs().packet<Aligned>(0);
Packet4f b = q2.coeffs().packet<Aligned>(0);
Packet4f c = padd(pmul( vec4f_swizzle1(a,3,3,3,3), vec4f_swizzle1(b,3,0,1,2)),
pmul( vec4f_swizzle1(a,1,1,2,0), vec4f_swizzle1(b,1,2,0,1)));
__m128 mask = _mm_set_ss(-0.f);
c = _mm_xor_ps(c, mask);
__m128 d = psub(padd(pmul(vec4f_swizzle1(a,2,0,1,1),vec4f_swizzle1(b,2,3,3,0)),
c = _mm_andnot_ps(mask, d);
d = pmul(d,d);
d = _mm_sqrt_ss(_mm_add_ss(
return 2. * std::atan2( pfirst(d), pfirst(c) );