This bugzilla service is closed. All entries have been migrated to https://gitlab.com/libeigen/eigen
Bug 1678 - _mm_load_pd1 not defined in VS2019 Preview2 emmintrin using AVX512
Summary: _mm_load_pd1 not defined in VS2019 Preview2 emmintrin using AVX512
Status: RESOLVED FIXED
Alias: None
Product: Eigen
Classification: Unclassified
Component: Core - vectorization (show other bugs)
Version: 3.3 (current stable)
Hardware: x86 - AVX512 Windows
: Normal Compilation Problem
Assignee: Nobody
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-02-12 14:27 UTC by neumann
Modified: 2019-12-04 18:28 UTC (History)
5 users (show)



Attachments
git-patch-for-avx512 (2.80 KB, patch)
2019-02-13 14:14 UTC, neumann
no flags Details | Diff

Description neumann 2019-02-12 14:27:46 UTC
_mm_load_pd1 is not defined in VS2019 emmintrin.h. Already posted that as a bug to visual studio 

my workaround (added at the beginning of AVX512/PacketMath.h):

inline auto _mm_load_pd1(double const* _Dp) {
    return _mm_load1_pd(_Dp);
}


After fixing this error I see: 
error C2676: Binärer Operator "+": "const Packet" definiert diesen Operator oder eine Konvertierung in einen für den vordefinierten Operator geeigneten Typ nicht (basically does not define binary operator + for const Packet)

According to the Output:
From src/Core/GenericPacketMath.h(162)
Packet = PacketI
Over: src\Core\arch\SSE\../Default/GenericPacketMathFunctions.h(403) & (446)
Packet = Eigen::internal::Packet16f
To: src\Core/arch/AVX512/MathFunctions.h(387)
typedef __m512 Packet16f;
Comment 1 Christoph Hertzberg 2019-02-12 15:04:18 UTC
For gcc/clang it seems to make no difference if we use `_mm_load_pd1` or `_mm_load1_pd`, so I'm ok with just replacing all first by the second. (Btw: Avoid using `auto` inside the Eigen code base. We try to stay C++03 compatible as far as possible.)

For the follow-up error: Could you attach the complete stack of that error?
Comment 2 Christoph Hertzberg 2019-02-12 15:31:37 UTC
(In reply to Christoph Hertzberg from comment #1)
> For gcc/clang it seems to make no difference if we use `_mm_load_pd1` or
> `_mm_load1_pd`, so I'm ok with just replacing all first by the second.

In fact all usages of `_mm_load_pd1` followed by a broadcast should be replaced by a simple _mm512_set1_pd or _mm256_set1_pd. No compiler (which supports AVX512) produces worse results with that:

https://godbolt.org/z/Qx7IFo

I don't have AVX512f available, so I can't run any tests on that (I could set up an emulator, if I find the time ...)
Comment 3 neumann 2019-02-12 15:41:34 UTC
I can also give you a list of functions which are not inlined by VS2019 even in /Ob3 mode.... the funniest one was a call to: Eigen::internal::ignore_unused_variable<__int64>

Example Output of the follow up error:
5>C:\Sources\Extern\vcpkg\installed\x64-windows\include\Eigen\src/Core/GenericPacketMath.h(162): error C2676: Binärer Operator "+": "const Packet" definiert diesen Operator oder eine Konvertierung in einen für den vordefinierten Operator geeigneten Typ nicht
5>        with
5>        [
5>            Packet=PacketI
5>        ] (Quelldatei wird kompiliert C:\Sources\Repos\Everything\Particle_Simulation\Simulator\SimulationApplication.cpp)
5>C:\Sources\Extern\vcpkg\installed\x64-windows\include\Eigen\src\Core\arch\SSE\../Default/GenericPacketMathFunctions.h(403): note: Siehe Verweis auf die Instanziierung der gerade kompilierten Funktions-Vorlage "Packet Eigen::internal::padd<PacketI>(const Packet &,const Packet &)".
5>        with
5>        [
5>            Packet=PacketI
5>        ] (Quelldatei wird kompiliert C:\Sources\Repos\Everything\Particle_Simulation\Simulator\SimulationApplication.cpp)
5>C:\Sources\Extern\vcpkg\installed\x64-windows\include\Eigen\src\Core\arch\SSE\../Default/GenericPacketMathFunctions.h(446): note: Siehe Verweis auf die Instanziierung der gerade kompilierten Funktions-Vorlage "Packet Eigen::internal::psincos_float<true,Packet>(const Packet &)".
5>        with
5>        [
5>            Packet=Eigen::internal::Packet16f
5>        ] (Quelldatei wird kompiliert C:\Sources\Repos\Everything\Particle_Simulation\Simulator\SimulationApplication.cpp)
5>C:\Sources\Extern\vcpkg\installed\x64-windows\include\Eigen\src/Core/arch/AVX512/MathFunctions.h(387): note: Siehe Verweis auf die Instanziierung der gerade kompilierten Funktions-Vorlage "Packet Eigen::internal::psin_float<Eigen::internal::Packet16f>(const Packet &)".
5>        with
5>        [
5>            Packet=Eigen::internal::Packet16f
5>        ] (Quelldatei wird kompiliert C:\Sources\Repos\Everything\Particle_Simulation\Simulator\SimulationApplication.cpp)
Comment 4 neumann 2019-02-12 15:44:10 UTC
(In reply to neumann from comment #3)
> I can also give you a list of functions which are not inlined by VS2019 even
> in /Ob3 mode.... the funniest one was a call to:
> Eigen::internal::ignore_unused_variable<__int64>
> 
> Example Output of the follow up error:
> 5>C:\Sources\Extern\vcpkg\installed\x64-windows\include\Eigen\src/Core/
> GenericPacketMath.h(162): error C2676: Binärer Operator "+": "const Packet"
> definiert diesen Operator oder eine Konvertierung in einen für den
> vordefinierten Operator geeigneten Typ nicht
> 5>        with
> 5>        [
> 5>            Packet=PacketI
> 5>        ] (Quelldatei wird kompiliert
> C:
> \Sources\Repos\Everything\Particle_Simulation\Simulator\SimulationApplication
> .cpp)
> 5>C:\Sources\Extern\vcpkg\installed\x64-
> windows\include\Eigen\src\Core\arch\SSE\../Default/
> GenericPacketMathFunctions.h(403): note: Siehe Verweis auf die
> Instanziierung der gerade kompilierten Funktions-Vorlage "Packet
> Eigen::internal::padd<PacketI>(const Packet &,const Packet &)".
> 5>        with
> 5>        [
> 5>            Packet=PacketI
> 5>        ] (Quelldatei wird kompiliert
> C:
> \Sources\Repos\Everything\Particle_Simulation\Simulator\SimulationApplication
> .cpp)
> 5>C:\Sources\Extern\vcpkg\installed\x64-
> windows\include\Eigen\src\Core\arch\SSE\../Default/
> GenericPacketMathFunctions.h(446): note: Siehe Verweis auf die
> Instanziierung der gerade kompilierten Funktions-Vorlage "Packet
> Eigen::internal::psincos_float<true,Packet>(const Packet &)".
> 5>        with
> 5>        [
> 5>            Packet=Eigen::internal::Packet16f
> 5>        ] (Quelldatei wird kompiliert
> C:
> \Sources\Repos\Everything\Particle_Simulation\Simulator\SimulationApplication
> .cpp)
> 5>C:\Sources\Extern\vcpkg\installed\x64-windows\include\Eigen\src/Core/arch/
> AVX512/MathFunctions.h(387): note: Siehe Verweis auf die Instanziierung der
> gerade kompilierten Funktions-Vorlage "Packet
> Eigen::internal::psin_float<Eigen::internal::Packet16f>(const Packet &)".
> 5>        with
> 5>        [
> 5>            Packet=Eigen::internal::Packet16f
> 5>        ] (Quelldatei wird kompiliert
> C:
> \Sources\Repos\Everything\Particle_Simulation\Simulator\SimulationApplication
> .cpp)

Extra Definitions set with CMAKE:
  add_definitions(-DEIGEN_ENABLE_AVX512;-DEIGEN_FAST_MATH;-D__AVX512F__;-D__FMA__;-D__AVX512DQ__;-D__AVX512ER__)
Comment 5 Christoph Hertzberg 2019-02-13 10:50:09 UTC
For the non-inlined functions, could you try adding EIGEN_STRONG_INLINE to these and make a pull request or provide a patch? (If you provide a patch, better open a new bug for that)

For the other part, can you try adding the following to Eigen/src/Core/arch/AVX512/PacketMath.h (after replacing all _mm_load1_pd):

  template <>
  EIGEN_STRONG_INLINE Packet16i padd<Packet16i>(const Packet16i& a,
                                                const Packet16i& b) {
    return _mm512_add_epi32(a, b);
  }

If that works, try also to implement psub, pmul, etc (implementations should be more or less equivalent to AVX or SSE). I don't have MSVC (nor AVX512), so I can't check this.
Comment 6 Christoph Hertzberg 2019-02-13 10:54:39 UTC
(In reply to neumann from comment #4)
> (In reply to neumann from comment #3)
> > [...]
> Extra Definitions set with CMAKE:
>  
> add_definitions(-DEIGEN_ENABLE_AVX512;-DEIGEN_FAST_MATH;-D__AVX512F__;-
> D__FMA__;-D__AVX512DQ__;-D__AVX512ER__)

Instead of `-D__AVX512F__` does `/arch:AVX512` work? (is there an equivalent to gcc's `-march=native`?)
I could not find anything for `__FMA__`, but perhaps for MSVC we could activate `__FMA__` automatically, if AVX512 is enabled.


Btw: Please don't Full-Quote. Everyone can see previous comments.
Comment 7 neumann 2019-02-13 11:00:52 UTC
(In reply to Christoph Hertzberg from comment #6)
> (In reply to neumann from comment #4)
> > (In reply to neumann from comment #3)

I use /arch:AVX512 but it seemed not to set __AVX512F__. The same for the others. I set it because i thought i saw it usage somewhere else than in Macros.h

BTW: please repair preview ;)

<h1>Software error:</h1> <pre>Can't locate JSON.pm in @INC (you may need to install the JSON module) (@INC contains: . lib/i586-linux-gnu-thread-multi-64int lib /etc/perl /usr/local/lib/i386-linux-gnu/perl/5.20.2 /usr/local/share/perl/5.20.2 /usr/lib/i386-linux-gnu/perl5/5.20 /usr/share/perl5 /usr/lib/i386-linux-gnu/perl/5.20 /usr/share/perl/5.20 /usr/local/lib/site_perl) at lib/JSON/RPC/Legacy/Server.pm line 7. BEGIN failed--compilation aborted at lib/JSON/RPC/Legacy/Server.pm line 7. Compilation failed in require at lib/JSON/RPC/Legacy/Server/CGI.pm line 6. BEGIN failed--compilation aborted at lib/JSON/RPC/Legacy/Server/CGI.pm line 6. Compilation failed in require at Bugzilla/WebService/Server/JSONRPC.pm line 22. BEGIN failed--compilation aborted at Bugzilla/WebService/Server/JSONRPC.pm line 25. Compilation failed in require at /data/web/75/91/37/eigen.tuxfamily.org/htdocs/bz/jsonrpc.cgi line 24. BEGIN failed--compilation aborted at /data/web/75/91/37/eigen.tuxfamily.org/htdocs/bz/jsonrpc.cgi line 24. </pre> <p> For help, please send mail to the webmaster (<a href="mailto:modo@staff.tuxfamily.org">modo@staff.tuxfamily.org</a>), giving this error message and the time and date of the error. </p>
Comment 8 Christoph Hertzberg 2019-02-13 13:07:56 UTC
(In reply to neumann from comment #7)
> I use /arch:AVX512 but it seemed not to set __AVX512F__. The same for the
> others. I set it because i thought i saw it usage somewhere else than in
> Macros.h

On Godbolt, __AVX512F__ and __AVX512DQ__ get defined with MSVC 19.10 or later:
https://godbolt.org/z/JRAAlx

I think the EIGEN_VECTORIZE_AVX512ER is actually nowhere used, so we can just remove that. And for FMA, I suggested to always assume that FMA is available if AVX512 is activated (at least on MSVC. On GCC/Clang this can be activated using `-mfma`)


> BTW: please repair preview ;)

I noticed this some time ago as well. Maybe it's possible to deactivate that button entirely. No high priority for me ...
Comment 9 neumann 2019-02-13 14:14:33 UTC
Created attachment 924 [details]
git-patch-for-avx512

I just have git (no mercurial installed).
Comment 10 Gael Guennebaud 2019-02-14 23:55:12 UTC
With MSVC 19.14 I had to define __FMA__ (nothing more) and disable pdiv on Packet16i:

https://godbolt.org/z/cgC3jC
Comment 11 neumann 2019-02-15 08:04:41 UTC
sry for pdiv. I didn't check if _mm512_div_epi32 is actually AVX512. 
As it turns out it is SVML according to https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm512_div_epi32&expand=2102
Since VS2019 has SVML intrinsics I did not notice that one.  

Seems like i have to wait for https://bugs.llvm.org/show_bug.cgi?id=40701#c1
to actually use SVML intrinsics and also have fast runtimes...
Comment 12 Christoph Hertzberg 2019-02-15 08:20:13 UTC
Sorry for not getting back to this yet.

As you also found, _mmXX_div_epiXX are only part of the SVM library, so I agree on not including that. For all other architectures pdiv<PacketXi> asserts.
We may activate those optionally, if SVML is available. We may also (optionally) take their math functions instead of implementing our own (I never checked how they perform regarding speed and accuracy) -- but that is a different issue.

Shall we automatically define __FMA__ (or just EIGEN_ENABLE_FMA) on MSVC if __AVX512F__ is defined?

Also, for ploadquad<Packet8d> we should probably replace the first _mm512_insertf64x4 by a _mm512_castpd256_pd512 (unfortunately, this is not available on GCC 4.9, though we could workaround that):
https://godbolt.org/z/QZYE2e
Comment 13 Gael Guennebaud 2019-02-15 09:50:08 UTC
https://bitbucket.org/eigen/eigen/commits/8bd064534bb6/
Summary:     Bug 1678: workaround MSVC compilation issues with AVX512

https://bitbucket.org/eigen/eigen/commits/76a63ae63269/
Summary:     Bug 1678: Fix lack of __FMA__ macro on MSVC with AVX512

Should be enough for this issue.

More generally, many functions in AVX512/PacketMath.h looks sub-optimal to me. For instance ploadquad should likely be implemented using a single _mm512_permutexvar_ps (I already have a patch), but I'm waiting to receive a Skylake machine to properly check and bench before pushing this kind of changes.
Comment 14 Nobody 2019-12-04 18:28:34 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to gitlab.com's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.com/libeigen/eigen/issues/1678.

Note You need to log in before you can comment on or make changes to this bug.