This bugzilla service is closed. All entries have been migrated to
Bug 590 - NEON Duplicate lane load (minor optimization)
Summary: NEON Duplicate lane load (minor optimization)
Alias: None
Product: Eigen
Classification: Unclassified
Component: Core - vectorization (show other bugs)
Version: unspecified
Hardware: ARM - NEON All
: Normal enhancement
Assignee: Nobody
Depends on:
Reported: 2013-04-23 18:44 UTC by Simon Pilgrim
Modified: 2019-12-04 12:16 UTC (History)
2 users (show)

NEON Duplicate lane load (717 bytes, patch)
2013-04-23 18:44 UTC, Simon Pilgrim
no flags Details | Diff

Description Simon Pilgrim 2013-04-23 18:44:08 UTC
Created attachment 332 [details]
NEON Duplicate lane load

NEON implementations of ploaddup can be improved by using the vld1_dup_*() intrinsics instead of splitting the scalar loads from the vdup_n_*() splat/duplication. Patch for Eigen/src/Core/arch/NEON/PacketMath.h attached.

I found gcc 4.6.3 to go from (pseudo asm):

ldmia.w r0, {r2, r3}
vdup.32 d0, r2
vdup.32 d1, r3


vld1.32 {d0[]}, [r0]!
vld1.32 {d1[]}, [r0]
Comment 1 Gael Guennebaud 2013-06-10 16:14:17 UTC
I known enough ARM & NEON, so I'm not sure to understand why this version is better? vdup seems to be exactly what we want. The fact GCC added a register load instruction seems to be unrelated?
Comment 2 Simon Pilgrim 2013-06-22 16:55:29 UTC
Sorry for the slow reponse.

I admit this patch is very minor, but the vld1_dup_*() intrinsics were provided with exactly the ploaddup style operation in mind.

They discourage the compiler from using the gp registers (and then the additional transfer cost to neon registers) or from loading scalar floats that may result in use of the vfp pipeline (which will cause stalls when neon pipeline takes over again).
Comment 3 Gael Guennebaud 2013-06-23 14:18:53 UTC
Changeset:   03c0153b9f2f
User:        Simon Pilgrim
Date:        2013-06-23 14:13:21
Summary:     Fix bug 590: NEON Duplicate lane load
Comment 4 Nobody 2019-12-04 12:16:33 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance:

Note You need to log in before you can comment on or make changes to this bug.