Currently SSE module has a packet sqrt function only for Packet4f, not for Packet2d, so any double-type coefficient-wise expression containing sqrt has to be calculated using single-data commands. I suggest you add template<> EIGEN_STRONG_INLINE Packet2d psqrt<Packet2d>(const Packet2d& a) { return _mm_sqrt_pd(a); } so that sqrt can be vectorized. Also, it is my understanding that the algorithm used to calculate sqrt in the current implementation of psqrt<Packet4f> (via inverse sqrt) has reduced accuracy, so maybe use it only under EIGEN_FAST_MATH flag, letting the user use the full accuracy of _mm_sqrt_ps otherwise?
I fully agree. I guess the main problem with the fast implementation are wrong results for negative values (should be NaN), +inf (should be inf) and denormalized values (rsqrt treats them as 0.0f, resulting in inf which leads to a NaN after the refinement).
Related discussion on the ML (old!): http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2009/03/msg00114.html And I agree with Christoph, the main pb is not on the accuracy but on the handling of negative, inf and denorms.
https://bitbucket.org/eigen/eigen/commits/97d9a7772106/ Changeset: 97d9a7772106 User: ggael Date: 2013-08-19 16:02:27 Summary: Fix bug 642: add vectorization of sqrt for doubles, and make sqrt really safe if EIGEN_FAST_MATH is disabled
-- GitLab Migration Automatic Message -- This bug has been migrated to gitlab.com's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.com/libeigen/eigen/issues/642.