Bug 642 - Add vectorization of sqrt for doubles
Add vectorization of sqrt for doubles
Status: RESOLVED FIXED
Product: Eigen
Classification: Unclassified
Component: Core - vectorization
3.2
x86 - SSE All
: Normal enhancement
Assigned To: Nobody
:
Depends on:
Blocks: 3.3
  Show dependency treegraph
 
Reported: 2013-08-08 20:34 UTC by panda-34
Modified: 2013-08-19 16:01 UTC (History)
3 users (show)



Attachments

Description panda-34 2013-08-08 20:34:26 UTC
Currently SSE module has a packet sqrt function only for Packet4f, not for Packet2d, so any double-type coefficient-wise expression containing sqrt has to be calculated using single-data commands. I suggest you add

template<> EIGEN_STRONG_INLINE Packet2d psqrt<Packet2d>(const Packet2d& a) { return _mm_sqrt_pd(a); }

so that sqrt can be vectorized. Also, it is my understanding that the algorithm used to calculate sqrt in the current implementation of psqrt<Packet4f> (via inverse sqrt) has reduced accuracy, so maybe use it only under EIGEN_FAST_MATH flag, letting the user use the full accuracy of _mm_sqrt_ps otherwise?
Comment 1 Christoph Hertzberg 2013-08-13 14:41:45 UTC
I fully agree. I guess the main problem with the fast implementation are wrong results for negative values (should be NaN), +inf (should be inf) and denormalized values (rsqrt treats them as 0.0f, resulting in inf which leads to a NaN after the refinement).
Comment 2 Gael Guennebaud 2013-08-19 15:43:39 UTC
Related discussion on the ML (old!): http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2009/03/msg00114.html

And I agree with Christoph, the main pb is not on the accuracy but on the handling of negative, inf and denorms.
Comment 3 Gael Guennebaud 2013-08-19 16:01:59 UTC
https://bitbucket.org/eigen/eigen/commits/97d9a7772106/
Changeset:   97d9a7772106
User:        ggael
Date:        2013-08-19 16:02:27
Summary:     Fix bug 642: add vectorization of sqrt for doubles, and make sqrt really safe if EIGEN_FAST_MATH is disabled

Note You need to log in before you can comment on or make changes to this bug.