New user self-registration is disabled due to spam. Please email eigen-core-team @ lists.tuxfamily.org if you need an account.
Bug 884 - Eigen::Ref stores redundant data
Summary: Eigen::Ref stores redundant data
Status: DECISIONNEEDED
Alias: None
Product: Eigen
Classification: Unclassified
Component: Core - general (show other bugs)
Version: 3.2
Hardware: All All
: Low Internal Design
Assignee: Nobody
URL:
Whiteboard:
Keywords: performance
Depends on: 58
Blocks: 912
  Show dependency treegraph
 
Reported: 2014-09-24 14:20 UTC by Emily
Modified: 2014-12-05 15:03 UTC (History)
3 users (show)



Attachments

Description Emily 2014-09-24 14:20:49 UTC
Consider the following example:

    Eigen::ArrayXXd data = Eigen::ArrayXXd::Random(1000, 1000);

    typedef Eigen::Ref<const Eigen::ArrayXXd> datacref_t;
    auto block = data.block(500, 500, 5, 5);
    Eigen::internal::set_is_malloc_allowed(false);
    const datacref_t dataref = block;
    const datacref_t ref = dataref; // Calls malloc with size = 0
    Eigen::internal::set_is_malloc_allowed(true);

The copy assign operator causes a call to malloc with what looks to be a zero sized alloc. This is still a system call and is causing performance issues for me. 

Also we're using application verifier to catch memory handling problems early, it works by putting each memory allocation in it's own page and doing some verifications. A lot of 0 size requests will cause the virtual address space to be rapidly depleted with this tool running.
Comment 1 Christoph Hertzberg 2014-09-25 00:37:08 UTC
The straight-forward solution would be to short-cut
  template<typename T, bool Align> 
  inline T* conditional_aligned_new_auto(size_t size)
(located in src/Core/util/Memory.h) and immediately let it return 0, if size==0.
The overhead for valid allocs should be negligible.

Alternative/better solutions/suggestions, e.g., can we catch this earlier?
Comment 2 Christoph Hertzberg 2014-09-25 16:28:09 UTC
Fixed in devel and 3.2 branches. Also extended nomalloc unit tests.

https://bitbucket.org/eigen/eigen/commits/cbe72de483f9
https://bitbucket.org/eigen/eigen/commits/cc70c8c56d04
Comment 3 Emily 2014-09-26 09:15:45 UTC
> Alternative/better solutions/suggestions, e.g., can we catch this earlier?

I'm not confident enough around the Eigen internals so this is just an observation. Looking at the data structure of Eigen::Ref in the visual studio debugger shows a member called "m_object" that seems to be un-used (0 data pointer, 0 rows, 0 cols) while the data values in "m_data", "m_rows" etc from Eigen::MapBase are filled with valid data. Is it possible to remove the "m_object" all together? This would make my data structures smaller and more cache friendly (which is another performance hit I'm looking at).
Comment 4 Christoph Hertzberg 2014-09-26 11:40:53 UTC
The m_object is for cases where Ref refers to an object which is not direct access or which has an incompatible storage layout (different strides or majorness).
Solving Bug 58 we could save the extra m_rows, m_cols members. We could/should also consider having a Ref alternative which never makes a copy (and fails to compile if the storage layout is incompatible) -- this would also avoid introducing hidden costs.

I'm renaming and reopening this bug.
Comment 5 Emily 2014-09-26 12:00:51 UTC
> Alternative/better solutions/suggestions, e.g., can we catch this earlier?

I have had some time to debug this closer. Here's what happens:

    const datacref_t ref = dataref; // Calls malloc with size = 0

causes the compiler generated default copy constructor on Eigen::Ref to be invoked. Instead of the constructor at Ref.h@227: Ref(const DenseBase<Derived>& expr) which I believe is the intended constructor.

The compiler generated copy ctor calls the copy ctor of m_object, which eventually trickles down to DenseStorage(0, 0, 0) which causes the zero sized malloc.

I have solved this locally by adding:

    Ref(const Ref& that){
        construct(that.derived(), internal::true_type);
    }

to Ref.h which leaves m_object default constructed and empty, just like the original. It seems to be working with the limited testing I've done this far. Although I'm not sure if it will work properly if "that" has a non-empty m_object.
Comment 6 Christoph Hertzberg 2014-09-26 13:05:25 UTC
(In reply to Emily from comment #5)
> [...] Although I'm not sure if it will work properly if "that" has a
> non-empty m_object.

You are right, I have not thought about copying Refs with non-empty m_objects, either.

Your suggestion should work in general, the problem here is if the copied Ref object is used after the original Ref object gets out of scope and gets destructed. 
OTOH, the name `Ref' actually implies that it (usually) becomes invalid as soon as the original object gets out of scope, so I would agree that calling `Ref(const Ref&)' shall not copy the m_object member in any case. After all, if you explicitly want a copy, use MatrixXd directly.
Comment 7 Christoph Hertzberg 2014-09-26 13:53:13 UTC
(In reply to Emily from comment #5)
>     Ref(const Ref& that){
>         construct(that.derived(), internal::true_type);
>     }

Just adding this is not sufficient, because there shall also be no malloc if the parameters of the copy a compatible with (but not identical to) the original.
E.g., if one copies a Ref without strides to a Ref with strides, or an aligned Ref to an unaligned Ref.
Implementing that is not too complicated either:
  template<typename OtherRef>
  inline Ref(const RefBase<OtherRef>& other) {
    construct(other.derived(), typename Traits::template match<OtherRef>::type());
  }
I found, however, that assigning a transposed object to a Ref with Stride<Dynamic,Dynamic> requires a copy, which technically is not necessary (by setting the outer and inner stride their counterparts of the original object).

Another question: Shall we provide a Ref object, which checks at compile-time that the layout "might" match, but only asserts at run-time that it actually does?
For example, this would allow passing M.block(...) to an Ref<Matrix, Aligned> as long as the user takes care that the start address of the block is aligned.
Also, we could introduce a variant which does not assert, but (at run-time) decides to copy if the parameters are in fact incompatible.
Comment 8 Emily 2014-09-26 14:59:53 UTC
In general, as the name of the class is "Ref", as a user I expect that no copies or memory allocations will ever be performed. And I want to be able to store Ref's in containers and expect them to work as long as the original is not destructed. If that means I'll have to check if it is column or row major to order my loops for example, I can live with that (personal opinion) by sticking to column major.

I understand the issue with incompatible types and why they would cause a copy. Can these incompatibilities be detected and made to cause errors on compile time? Can the reference simply mirror the properties? I don't have a clear image of what would be the best way to deal with it to be honest. 

From a general perspective, I think that allowing references that "might" be compatible during runtime is a good idea but it would have to be clearly labeled that you have to make sure it is compatible. One possible application is for example if an ArrayXXd is used to represent an image, then using RefMaybe(data.block(...)) to split off processing chunks of the image to different threads. Preserving alignment here is important for many reasons but it is difficult (impossible?) to determine at compile time for example.


Also, thank you for your quick replies on this.
Comment 9 Christoph Hertzberg 2014-09-26 17:18:13 UTC
Ok, first of all we need to distinguish between const and non-const references.
For Ref<const Matrix>& as well as a simple (const Matrix&) the C++ programmer could/would/should expect that a temporary is created if the passed type is not exactly the expected type but (implicitly) convertible to it. Simplest case, if you pass an int to a function expecting (const double&) a temporary object is created.
If a function takes a non-const (Matrix& m) it will only work, if the passed type (or one of its base classes) matches exactly to Matrix. Currently its similar for Ref<Matrix, Options, Stride>: only if the passed type directly (and at compile-time) "fits" into the specification the program compiles.

Regarding your actual questions: All that is generally possible to implement, the basic question is what the API shall be and what shall be the default behavior.
API-wise, I would suggest setting additional bits in the Options-template-parameter and letting the current behavior be the default.
Basically, we could introduce an "NoCopy" flag, which will prohibit copies in general (implicitly set for non-const Ref) and a CheckAtRuntime flag which only checks at run-time if the passed type actually fits into the Ref-object without a temporary.
So with:
  Options=(NoCopy & CheckAtRuntime) an assertion would be raised at runtime if types are incompatible
  Options=(NoCopy) code will not compile, if it can't be guaranteed at compile-time that options are compatible.
  Options=CheckAtRuntime it will be determined at runtime if a copy is necessary
and with
  Options=0 the current behavior is kept (i.e. copy, if types might be incompatible)

The Align-bit would be orthogonal to these new flags.

A non-const Ref will always be NoCopy (We briefly discussed once that it would be possible to pass a temporary and copy it back at in the destructor of the Ref object, but I doubt that will be practical).
Comment 10 Gael Guennebaud 2014-09-26 22:34:39 UTC
I think that we already had this kind of discussions when designing Ref<>. Regarding the no-copy variant, i.e., compilation error if the types are not compatible, then what you actually want here is a Map<>. Indeed, the role of Map<> is to wrap existing data into an Eigen compatible object. To make it more convenient we already considered adding a constructor taking DenseBase<> objects, and I'd be ok to do so. Here, alignment requirement is already checked at runtime through an assert, and I don't see needs for another behavior.

So if we agree that Map<> is good for the NoCopy variant, it only remains the question on runtime versus compile-time branching for Ref<>. If that only concerns alignment, then we should probably decide at runtime by default, and honestly, I'm not convinced there is real needs to make this behavior configurable.
Comment 11 Christoph Hertzberg 2014-09-27 14:17:07 UTC
I agree that the NoCopy variant boils down to a Map with a DenseBase<> constructor, but so does Ref<Matrix> at the moment.
But conceptually, I think the name `Map' more implies mapping to raw data, whereas `Ref' sounds more like referring to something which is already an Eigen object. So API-wise I like the name Ref more. 
Reading the old discussion, I remember having suggested a no-copy alternative back then, either using Ref<const Matrix&> (instead of Ref<const Matrix>) or by an additional Option flag. I slightly prefer the Option flag now, since the meaning of the additional & is harder to guess.

The mailing list discussion can be found here (initially dates back to January 2012, our archive does not interlink between months, unfortunately):
http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2012/06/msg00028.html
http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2012/07/msg00003.html
The related bug was Bug 481.
Comment 12 Gael Guennebaud 2014-09-27 15:05:29 UTC
It is easier to read the ML archive on gmane: 
http://thread.gmane.org/gmane.comp.lib.eigen/2717

The advantage of extending Map is that we do not have to introduce new options, on the other hand, I concede that users will more likely search around Ref<> rather than Map<> for such a purpose, so why not.
Comment 13 Christoph Hertzberg 2014-09-29 13:25:25 UTC
Ok, I will start trying to make Ref(const RefBase<...>&) constructors avoid to copy the m_object whenever possible (what Emily requested in comment 5). 
There will still be some border cases, e.g., copying a Ref<...,Stride<Dynamic,Dynamic>> to an ordinary Ref<> where the original Ref already holds a copy (therefore has compatible strides at run-time but not guaranteed at compile-time), or initializing a Ref<MatrixXd,Aligned> from a Ref<Matrix3d,Aligned> which technically can't be Aligned (at compile-time).

Also, does someone see any problems making Ref<...,Stride<Dynamic,Dynamic>> accept references with incompatible ordering?

For the remaining optimization, we'll need to decide about the API first:
 A) Add a NoCopy flag (other name proposals?)
 B) A new class with a different name
 C) Use Ref<const Matrix&, ...> for no-copy variant

And orthogonal to that:
 X) Add a CheckAtRuntime flag (I admit the usage is limited).
Actually, dimensions are already checked at runtime only, unless they are guaranteed not to fit:
  MatrixXd A1(3,3), A2(3,4);
  Matrix2d A3;
  Ref<const Matrix3d, NoCopy> A=Ax; // `x`={1,2,3}
    // ok for A1;
    // A2 fails at runtime
    // A3 fails at compile-time
And I agree that use cases for making A1 and A2 fail at compile-time are very limited, so indeed CheckAtRuntime would mostly concern alignment (passing a dynamic stride object to a fixed stride object is also quite rare).

To actually implement either solution, I think it makes sense to fix bug 58 first (something I admittedly intended to do for a long time ...)
Comment 14 Gael Guennebaud 2014-09-29 14:37:01 UTC
(In reply to Christoph Hertzberg from comment #13) 
> There will still be some border cases, e.g., copying a
> Ref<...,Stride<Dynamic,Dynamic>> to an ordinary Ref<> where the original Ref
> already holds a copy (therefore has compatible strides at run-time but not
> guaranteed at compile-time),

This will cost one "if" for users who already accepted to loose speed by dealing with non compile-time inner strides.

> or initializing a Ref<MatrixXd,Aligned> from a
> Ref<Matrix3d,Aligned> which technically can't be Aligned (at compile-time).

That one is pretty odd, and we have to make sure that Ref<Matrix3d,Aligned> do not claim to aligned, ignoring the Aligned requirement. This has to be documented because if the user plays with ref.data() and expect it to be aligned... 

> Also, does someone see any problems making Ref<...,Stride<Dynamic,Dynamic>>
> accept references with incompatible ordering?

This essentially means that we will iterate over the data in the wrong order (wrt memory caches) every time we access the Ref object. Copying once seems to be a better option to me. Moreover, I'm not 100% sure that it is ok to have innerstrides larger than the outerstride.

> For the remaining optimization, we'll need to decide about the API first:
>  A) Add a NoCopy flag (other name proposals?)
>  B) A new class with a different name
>  C) Use Ref<const Matrix&, ...> for no-copy variant

if B) then I'd rather extend/generalize Map<> , so not B).

I'm tempted by C) even though it might be a bit opaque for readers at the first read.

> And orthogonal to that:
>  X) Add a CheckAtRuntime flag (I admit the usage is limited).
> Actually, dimensions are already checked at runtime only, unless they are
> guaranteed not to fit:
>   MatrixXd A1(3,3), A2(3,4);
>   Matrix2d A3;
>   Ref<const Matrix3d, NoCopy> A=Ax; // `x`={1,2,3}
>     // ok for A1;
>     // A2 fails at runtime
>     // A3 fails at compile-time
> And I agree that use cases for making A1 and A2 fail at compile-time are
> very limited, so indeed CheckAtRuntime would mostly concern alignment
> (passing a dynamic stride object to a fixed stride object is also quite
> rare).

I'm still against a CheckAtRuntime option. First of all, this should be the default behavior, so let's think about a MatchAtCompileTime option. What would be the motivations to declare a Ref<> with such an option? If the input type can be validated at compile-time, then no overhead. Otherwise, if the user knows the input is compatible, he will have to play with the expression flags only to make the compiler and Eigen happy, for instance using Perhaps something If someone declares a function argument with Ref<?,MatchAtCompileTime> to possibly and calls it with an object which cannot be validated at compiletime he knows to be compatible,
Comment 15 Gael Guennebaud 2014-09-29 14:42:04 UTC
(please forget the last paragraph of my previous reply)


I'm still against a CheckAtRuntime option. First of all, this should be the default behavior, so let's think about a MatchAtCompileTime option. What would be the motivations to declare a Ref<> with such an option? If the input type can be validated at compile-time, then no overhead. Otherwise, if the user knows the input is compatible, he will have to play with the expression flags only to make the compiler and Eigen happy, for instance using a Map<> to overwrite the strides and alignement or even a Ref<> without the MatchAtCompileTime option to then pass it to the one having the option... thus losing the initial purpose of the MatchAtCompileTime option.

In other words I think that avoiding the possible minor runtime overhead of runtime checks is the responsibility of the input, not of the Ref<> object.
Comment 16 Christoph Hertzberg 2014-09-29 15:26:23 UTC
(In reply to Gael Guennebaud from comment #14)
> (In reply to Christoph Hertzberg from comment #13) 
> > There will still be some border cases, e.g., copying a
> > Ref<...,Stride<Dynamic,Dynamic>> to an ordinary Ref<> where the original Ref
> > already holds a copy (therefore has compatible strides at run-time but not
> > guaranteed at compile-time),
> 
> This will cost one "if" for users who already accepted to loose speed by
> dealing with non compile-time inner strides.

Ok, admittedly a very rare border case, I agree that we can simply always copy here.

> > or initializing a Ref<MatrixXd,Aligned> from a
> > Ref<Matrix3d,Aligned> which technically can't be Aligned (at compile-time).
> 
> That one is pretty odd, and we have to make sure that Ref<Matrix3d,Aligned>
> do not claim to aligned, ignoring the Aligned requirement. This has to be
> documented because if the user plays with ref.data() and expect it to be
> aligned... 

Agreed. I guess checking something like IMPLIES(Aligned, MatrixType::IsAligned) should be sufficient.

> > Also, does someone see any problems making Ref<...,Stride<Dynamic,Dynamic>>
> > accept references with incompatible ordering?
> 
> This essentially means that we will iterate over the data in the wrong order
> (wrt memory caches) every time we access the Ref object. Copying once seems
> to be a better option to me. Moreover, I'm not 100% sure that it is ok to
> have innerstrides larger than the outerstride.

I guess there are use cases, e.g., for rather small (dynamic) matrices where avoiding malloc is more important than inefficient data access. I think allowing dynamic InnerStride in general expresses that efficient data access is not the primary concern of the user. 
About problems with InnerStride > OuterStride I'm not sure either, even though I can't think of anything that would not work in general. (But that's hard to test extensively)

> > For the remaining optimization, we'll need to decide about the API first:
> >  A) Add a NoCopy flag (other name proposals?)
> >  B) A new class with a different name
> >  C) Use Ref<const Matrix&, ...> for no-copy variant
> 
> if B) then I'd rather extend/generalize Map<> , so not B).
> 
> I'm tempted by C) even though it might be a bit opaque for readers at the
> first read.

I'm ok with C). Apparently, we need to make the documentation more clear anyways, as it did not seem to be evident why the m_object was required after all (comment 3).


(In reply to Gael Guennebaud from comment #15)
> I'm still against a CheckAtRuntime option. First of all, this should be the
> default behavior, so let's think about a MatchAtCompileTime option. [...]

Making it default behavior is fine for me, too. Regarding alignment this requires only one simple check of a pointer which we need to calculate anyways. It does however avoid a malloc+copy in about 25%/50% of all cases (for float/double). If the compiler knows enough about the expression, it might even be smart enough to optimize that check away.

Regarding matching strides at runtime, this has admittedly barely any use cases. Basically the only practical benefit would arise when converting a Stride<Dynamic,Dynamic>(1, ?) to an OuterStride<Dynamic> Ref, but wanting optimal code here might better be achieved by reconsidering one's storage/input mechanisms.
OTOH, it would be inconsistent making runtime checks for size and alignment but not for strides and chances that someone deliberately passes (or even uses) Refs with Dynamic inner strides to other strides is not too big either.
Comment 17 Emily 2014-09-29 21:58:57 UTC
> For Ref<const Matrix>& as well as a simple (const Matrix&) the C++ 
> programmer could/would/should expect that a temporary is created if the 
> passed type is not exactly the expected type but (implicitly) convertible to 
> it. Simplest case, if you pass an int to a function expecting (const 
> double&) a temporary object is created.
> If a function takes a non-const (Matrix& m) it will only work, if the passed 
> type (or one of its base classes) matches exactly to Matrix. Currently its 
> similar for Ref<Matrix, Options, Stride>: only if the passed type directly 
> (and at compile-time) "fits" into the specification the program compiles.

This is only true for PODs. For classes that manage dynamic memory, I do not expect a copy to be made. None of the STL classes do this either.
Comment 18 Christoph Hertzberg 2014-09-30 17:21:52 UTC
We should not have unnecessary copies when initializing a Ref from another Ref anymore (except for some border cases, which I guess will barely happen in practice):
https://bitbucket.org/eigen/eigen/commits/0a993a2


(In reply to Emily from comment #17)
> This is only true for PODs. For classes that manage dynamic memory, I do not
> expect a copy to be made. None of the STL classes do this either.

That is indeed true for std containers (I guess STL is basically the same in that aspect). They avoid copies by making every constructor which could copy or transfer ownership explicit.
For example std::queue has only an explicit constructor from std::deque. Therefore: 
  std::queue<int> q1(std::deque<int>());  // possible, explicit copy
  std::queue<int> q2 = std::deque<int>(); // not possible


If we made every constructor in Eigen explicit then things like:

  Matrix3d A = Matrix3d::Map(data);

would not be possible. Also passing an expression to a function expecting a const MatrixXd& would require explicit conversion.
I think that would make using Eigen a lot less convenient. Also, we can't change that anyways, since we would break a lot of working code.
Comment 19 Emily 2014-10-01 10:21:59 UTC
I agree it's not possible to change, and it would make usage difficult.

I just wanted to comment that the motivational example of why one should expect a memory copy as was given in comment #9 didn't convince me.

Good job on fixing this, it's really nice to see a project move with such swiftness.

Note You need to log in before you can comment on or make changes to this bug.