This bugzilla service is closed. All entries have been migrated to https://gitlab.com/libeigen/eigen

Bug 1318

Summary: More robust quaternion from matrix
Product: Eigen Reporter: Michael Norel <minorlogic>
Component: GeometryAssignee: Nobody <eigen.nobody>
Status: NEW ---    
Severity: Accuracy Problem CC: chtz, gael.guennebaud, hauke.heibel, jacob.benoit.1
Priority: Normal    
Version: 3.4 (development)   
Hardware: All   
OS: All   
Whiteboard:
Bug Depends on: 458    
Bug Blocks: 814    

Description Michael Norel 2016-09-29 15:27:36 UTC
Conversion code, assumes strong orthonormal 3x3 matrix as input. But in real data our matrix can have small errors. This errors produce nonunity of quaternion.

To demonstrate, how important to keep quaternion normalized, we can use this code:

 Eigen::Quaterniond qc = Eigen::Quaterniond(1.1, 2.2, 3.3, 4.4).normalized();
        for (int i = 0; i < 1000; i++)
        {
            qc *= qc;
            qc = Eigen::Quaterniond(qc.toRotationMatrix()*qc.toRotationMatrix());

            std::cout << "1.0 - qc.squaredNorm(): " << 1.0 - qc.squaredNorm() << std::endl;
        }

Using VC12, i have INF result with only 1000 iterations.


I propose use normalization instead of analytic scale of quaternion in this conversion. It can be slower, but produce unit quaternion. The performance depends on performance of "normalize" and its vectorization.

In Eigen it can look:
{
      ...
      q.w() = t + Scalar(1.0);
      q.x() = mat.coeff(2,1) - mat.coeff(1,2));
      q.y() = mat.coeff(0,2) - mat.coeff(2,0));
      q.z() = mat.coeff(1,0) - mat.coeff(0,1));

      // same with other cases
      ... 
 
      
      normalize(); // normalize on exit    
}
I can provide full version of function if required.
Comment 1 Gael Guennebaud 2016-09-30 07:31:38 UTC
hm, with clang or gcc, after 10000 iterations, I get:

1.0 - qc.squaredNorm(): -4.44089e-16
1.0 - qc.squaredNorm(): 1.11022e-16
1.0 - qc.squaredNorm(): -2.22045e-16
1.0 - qc.squaredNorm(): -2.22045e-16
1.0 - qc.squaredNorm(): 0
1.0 - qc.squaredNorm(): -2.22045e-16
1.0 - qc.squaredNorm(): 1.11022e-16
1.0 - qc.squaredNorm(): 0
1.0 - qc.squaredNorm(): 1.11022e-16
1.0 - qc.squaredNorm(): 0
1.0 - qc.squaredNorm(): 0
1.0 - qc.squaredNorm(): 0
1.0 - qc.squaredNorm(): 2.22045e-16
1.0 - qc.squaredNorm(): 0
1.0 - qc.squaredNorm(): 1.11022e-16
1.0 - qc.squaredNorm(): 1.11022e-16
1.0 - qc.squaredNorm(): 0
Comment 2 Gael Guennebaud 2016-09-30 08:04:04 UTC
Arf, sorry for the noise, I still had the fix of bug 458 when testing your code snippet (i.e., implicit re-normalization in Quaternion::toRotationMatrix()).

Actually, simply doing qc *= qc; is enough to get significant accuracy loss as it quickly converges to qc=0;

It is still unclear to me how to address these issue without huge impact on performance when unitary constraints of quaternions and matrices is already handled by the user. 

My rationale is that if a method can be generalized with reasonable impact on performance (say 20%), then let's go for it. Otherwise, e.g. for Quaternion::toRotationMatrix() as in bug 458, we could add some variants or optional parameters or decorators to indicate whether we guarantee the input is unitary or not. For instance, if we default to slow but safe:

Matrix3 m;
Quaternion q;

q = m; // slow conversion
q = tagged<unitary>(m); // fast conversion
AngleAxis aa(theta, tagged<unit>(vec)); // fast
AngleAxis aa(theta, vec); // shortcut for AngleAxis(theta,tagged<unit>(vec.normalized())
m = q.toRotationMatrix();
m = tagged<unit>(q).toRotationMatrix();

This a bit verbose and might look cumbersome, but this has the advantage to work with operator=, and it is easily extensible and generalizable.
Comment 3 Michael Norel 2016-09-30 08:22:39 UTC
(In reply to Gael Guennebaud from comment #2)
> It is still unclear to me how to address these issue without huge impact on
> performance when unitary constraints of quaternions and matrices is already
> handled by the user. 

Here , we can have different situation than "toRotationMatrix", i expect lower slow-down than 2x. Sorry, this week, i cant run benchmarks with different compilers. But my code introduce a bit more operations that can be vectorized (normalize()). 

If you can check its performance, ill be happy. It should look like 

template<typename Other>
struct quaternionbase_assign_impl<Other,3,3>
{
  typedef typename Other::Scalar Scalar;
  typedef DenseIndex Index;
  template<class Derived> static inline void run(QuaternionBase<Derived>& q, const Other& mat)
  {
    using std::sqrt;
    // This algorithm comes from  "Quaternion Calculus and Fast Animation",
    // Ken Shoemake, 1987 SIGGRAPH course notes
    Scalar t = mat.trace();
    if (t > Scalar(0))
    {
      q.w() = t + Scalar(1.0);
      q.x() = mat.coeff(2,1) - mat.coeff(1,2));
      q.y() = mat.coeff(0,2) - mat.coeff(2,0));
      q.z() = mat.coeff(1,0) - mat.coeff(0,1));
    }
    else
    {
      DenseIndex i = 0;
      if (mat.coeff(1,1) > mat.coeff(0,0))
        i = 1;
      if (mat.coeff(2,2) > mat.coeff(i,i))
        i = 2;
      DenseIndex j = (i+1)%3;
      DenseIndex k = (j+1)%3;

      q.coeffs().coeffRef(i) = mat.coeff(i,i)-mat.coeff(j,j)-mat.coeff(k,k) + Scalar(1.0);
      q.w() = mat.coeff(k,j)-mat.coeff(j,k);
      q.coeffs().coeffRef(j) = mat.coeff(j,i)+mat.coeff(i,j);
      q.coeffs().coeffRef(k) = mat.coeff(k,i)+mat.coeff(i,k);
    }

    q.normalize();
  }
Comment 4 Michael Norel 2016-09-30 08:26:11 UTC
P.S. i wander if trick with indexes is faster than explicit hardcode of all 4 cases, is it correct? i mean , the indexes magic can break the CPU prediction pipeline. 

I mean some kind of explicit code like 
 .... // cut from public sources, matrix can be transposed
if (tr > 0.0f){
            set( m[1][2] - m[2][1], m[2][0] - m[0][2], m[0][1] - m[1][0], tr+1.0f );
        }else                
        if( (m[0][0] > m[1][1] ) && ( m[0][0] > m[2][2]) ) {
            set( 1.0f + m[0][0] - m[1][1] - m[2][2], m[1][0] + m[0][1],
                 m[2][0] + m[0][2], m[1][2] - m[2][1] );
         }else 
        if ( m[1][1] > m[2][2] ){
            set( m[1][0] + m[0][1], 1.0f + m[1][1] - m[0][0] - m[2][2],
                 m[2][1] + m[1][2], m[2][0] - m[0][2] ); 
         }else{
            set( m[2][0] + m[0][2], m[2][1] + m[1][2],
                 1.0f + m[2][2] - m[0][0] - m[1][1], m[0][1] - m[1][0] );
         }
...

can be faster ?
Comment 5 Gael Guennebaud 2016-09-30 13:39:12 UTC
It's even worse, like more than x3 slower:

Current:
float:  0.000559502s
double: 0.00140823s

Proposal:./a.out
float:  0.0026088s
double: 0.00404474s

There is no measurable difference between the index trick and if/then/else.
Comment 6 Gael Guennebaud 2016-09-30 13:44:36 UTC
and if you're puzzled, here is the generated ASM for the q.normalize() part:

	movaps	(%rdi), %xmm0
	movaps	%xmm0, %xmm1
	mulps	%xmm1, %xmm1
	haddps	%xmm1, %xmm1
	haddps	%xmm1, %xmm1
	xorps	%xmm2, %xmm2
	ucomiss	%xmm2, %xmm1
	jbe	LBB1_7
	xorps	%xmm2, %xmm2
	movss	%xmm1, %xmm2            ## xmm2 = xmm1[0],xmm2[1,2,3]
	sqrtss	%xmm2, %xmm2
	shufps	$0, %xmm2, %xmm2        ## xmm2 = xmm2[0,0,0,0]
	divps	%xmm2, %xmm0
	movaps	%xmm0, (%rdi)
LBB1_7:
Comment 7 Michael Norel 2016-09-30 15:21:44 UTC
Sorry , you wasted time for this.
Again big surprise. Same situation as "toRotationMatrix".
Comment 8 Nobody 2019-12-04 16:23:07 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to gitlab.com's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.com/libeigen/eigen/issues/1318.