This bugzilla service is closed. All entries have been migrated to https://gitlab.com/libeigen/eigen

Bug 590

Summary: NEON Duplicate lane load (minor optimization)
Product: Eigen Reporter: Simon Pilgrim <rk_eigen>
Component: Core - vectorizationAssignee: Nobody <eigen.nobody>
Status: RESOLVED FIXED    
Severity: enhancement CC: gael.guennebaud, jacob.benoit.1
Priority: Normal    
Version: unspecified   
Hardware: ARM - NEON   
OS: All   
Whiteboard:
Attachments:
Description Flags
NEON Duplicate lane load none

Description Simon Pilgrim 2013-04-23 18:44:08 UTC
Created attachment 332 [details]
NEON Duplicate lane load

NEON implementations of ploaddup can be improved by using the vld1_dup_*() intrinsics instead of splitting the scalar loads from the vdup_n_*() splat/duplication. Patch for Eigen/src/Core/arch/NEON/PacketMath.h attached.

I found gcc 4.6.3 to go from (pseudo asm):

ldmia.w r0, {r2, r3}
vdup.32 d0, r2
vdup.32 d1, r3

to

vld1.32 {d0[]}, [r0]!
vld1.32 {d1[]}, [r0]
Comment 1 Gael Guennebaud 2013-06-10 16:14:17 UTC
I known enough ARM & NEON, so I'm not sure to understand why this version is better? vdup seems to be exactly what we want. The fact GCC added a register load instruction seems to be unrelated?
Comment 2 Simon Pilgrim 2013-06-22 16:55:29 UTC
Sorry for the slow reponse.

I admit this patch is very minor, but the vld1_dup_*() intrinsics were provided with exactly the ploaddup style operation in mind.

They discourage the compiler from using the gp registers (and then the additional transfer cost to neon registers) or from loading scalar floats that may result in use of the vfp pipeline (which will cause stalls when neon pipeline takes over again).
Comment 3 Gael Guennebaud 2013-06-23 14:18:53 UTC
Alright:

https://bitbucket.org/eigen/eigen/commits/03c0153b9f2f/
Changeset:   03c0153b9f2f
User:        Simon Pilgrim
Date:        2013-06-23 14:13:21
Summary:     Fix bug 590: NEON Duplicate lane load
Comment 4 Nobody 2019-12-04 12:16:33 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to gitlab.com's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.com/libeigen/eigen/issues/590.