New user self-registration is disabled due to spam. Please email eigen-core-team @ if you need an account.
Bug 642 - Add vectorization of sqrt for doubles
Summary: Add vectorization of sqrt for doubles
Alias: None
Product: Eigen
Classification: Unclassified
Component: Core - vectorization (show other bugs)
Version: 3.2
Hardware: x86 - SSE All
: Normal enhancement
Assignee: Nobody
Depends on:
Blocks: 3.3
  Show dependency treegraph
Reported: 2013-08-08 20:34 UTC by panda-34
Modified: 2013-08-19 16:01 UTC (History)
3 users (show)


Description panda-34 2013-08-08 20:34:26 UTC
Currently SSE module has a packet sqrt function only for Packet4f, not for Packet2d, so any double-type coefficient-wise expression containing sqrt has to be calculated using single-data commands. I suggest you add

template<> EIGEN_STRONG_INLINE Packet2d psqrt<Packet2d>(const Packet2d& a) { return _mm_sqrt_pd(a); }

so that sqrt can be vectorized. Also, it is my understanding that the algorithm used to calculate sqrt in the current implementation of psqrt<Packet4f> (via inverse sqrt) has reduced accuracy, so maybe use it only under EIGEN_FAST_MATH flag, letting the user use the full accuracy of _mm_sqrt_ps otherwise?
Comment 1 Christoph Hertzberg 2013-08-13 14:41:45 UTC
I fully agree. I guess the main problem with the fast implementation are wrong results for negative values (should be NaN), +inf (should be inf) and denormalized values (rsqrt treats them as 0.0f, resulting in inf which leads to a NaN after the refinement).
Comment 2 Gael Guennebaud 2013-08-19 15:43:39 UTC
Related discussion on the ML (old!):

And I agree with Christoph, the main pb is not on the accuracy but on the handling of negative, inf and denorms.
Comment 3 Gael Guennebaud 2013-08-19 16:01:59 UTC
Changeset:   97d9a7772106
User:        ggael
Date:        2013-08-19 16:02:27
Summary:     Fix bug 642: add vectorization of sqrt for doubles, and make sqrt really safe if EIGEN_FAST_MATH is disabled

Note You need to log in before you can comment on or make changes to this bug.