Bug 590 - NEON Duplicate lane load (minor optimization)
NEON Duplicate lane load (minor optimization)
Status: RESOLVED FIXED
Product: Eigen
Classification: Unclassified
Component: Core - vectorization
unspecified
ARM - NEON All
: Normal enhancement
Assigned To: Nobody
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-04-23 18:44 UTC by Simon Pilgrim
Modified: 2013-06-23 14:18 UTC (History)
2 users (show)



Attachments
NEON Duplicate lane load (717 bytes, patch)
2013-04-23 18:44 UTC, Simon Pilgrim
no flags Details | Diff

Description Simon Pilgrim 2013-04-23 18:44:08 UTC
Created attachment 332 [details]
NEON Duplicate lane load

NEON implementations of ploaddup can be improved by using the vld1_dup_*() intrinsics instead of splitting the scalar loads from the vdup_n_*() splat/duplication. Patch for Eigen/src/Core/arch/NEON/PacketMath.h attached.

I found gcc 4.6.3 to go from (pseudo asm):

ldmia.w r0, {r2, r3}
vdup.32 d0, r2
vdup.32 d1, r3

to

vld1.32 {d0[]}, [r0]!
vld1.32 {d1[]}, [r0]
Comment 1 Gael Guennebaud 2013-06-10 16:14:17 UTC
I known enough ARM & NEON, so I'm not sure to understand why this version is better? vdup seems to be exactly what we want. The fact GCC added a register load instruction seems to be unrelated?
Comment 2 Simon Pilgrim 2013-06-22 16:55:29 UTC
Sorry for the slow reponse.

I admit this patch is very minor, but the vld1_dup_*() intrinsics were provided with exactly the ploaddup style operation in mind.

They discourage the compiler from using the gp registers (and then the additional transfer cost to neon registers) or from loading scalar floats that may result in use of the vfp pipeline (which will cause stalls when neon pipeline takes over again).
Comment 3 Gael Guennebaud 2013-06-23 14:18:53 UTC
Alright:

https://bitbucket.org/eigen/eigen/commits/03c0153b9f2f/
Changeset:   03c0153b9f2f
User:        Simon Pilgrim
Date:        2013-06-23 14:13:21
Summary:     Fix bug 590: NEON Duplicate lane load

Note You need to log in before you can comment on or make changes to this bug.