This include matrix products because they may hold the result of the product. But we have a similar issue with solve expressions and the like. Matrix and Array objects are also nested by reference. The problem is when someone want to return an expression object containing some temporaries, like, for instance the expression of a chained product: XXX foo(const MatA& A, const MatB& B) { return A * (B * A); } See for instance this thread: http://forum.kde.org/viewtopic.php?f=74&t=91119 Perhaps there exist a solution allowing to always nest products by values while avoiding memory and performance overheads. Some ideas in the case of dynamic sized object: * we could add a "temporary" runtime flag enabling shallow copies * in ProductBase, we could try to explicitly do a shallow copy in the copy ctor using swap For statically allocated objects, however, I'm stuck. I don't see any alternative than relying on the smartness of the compiler. Also note that temporaries might be created at different stage, some tricky examples that have to be seriously tested before pushing any changes: (A+B) * C + D + E + F; For general products, the expression (A+B) is evaluated during the evaluation of the product, so this temporary is not an issue. However, for simple coefficient base products (small ones) this might be an issue.

Ideas that work only for dynamic sized objects can't work for us :( In the worst case, I guess we can switch to the NT2 approach of handling these evaluations of temporaries at evaluation time, not at construction time, however there are two major problems with this: 1. this would prevent us from taking advantage of evaluations of temporaries to limit expression tree depth; 2. while the main advantage would be to allow to write functions returning matrix product expressions, these would be less efficient, because: auto myProductXpr = my_product_xpr_returning_function(A, B); C = myProductXpr + D; E = myProductXpr + F; Here, if we evaluated the temporaries only at evaluation time, they would get evaluated twice, instead of only once in our current approach of evaluating them at construction time.

... perhaps more fundamentally, that wouldn't solve the issues with Solve expressions which have to store a reference to the Decomposition object anyway, even though they don't evaluate any temporary. This problem prevented me from adding a solveLeastSquares() method to MatrixBase (returning jacobiSvd().solve()).

Would it be an acceptable policy to nest a) stack objects by value b) use some sort of smart pointer for heap objects ? Comparing the old and proposed approach yields: 1) With const references, d = a*(b*c) needs 6 temporaries. 4 for (a-d), 1 per product. 2) When nesting by value, this changes to 8 temporaries; 4 (a-d), 2 per product (i.e. 4) At least MSVC no compiler optimizations are carried out. For heap objects nothing should change at all. The penalty we would thus take is 1 temporary per product for stack objects.

There are several difficulties here, the biggest being that it's hard to contain the cost of having these temporaries. For example, consider this expression, a*b + c + d + e The temporary for a*b would then be copied again and again when nested in each subsequent +. Is that worth it? The only problem with our current code is that one can't write functions returning nested products, right? Is that really a big deal? Also, as mentioned in comment 2, the problem is not specific to matrix products, we basically have this problem everytime we do expensive (i.e. O(n^3)) operations that benefit from using a temporary. In this respect, matrix products are closer to the various matrix decompositions, than they are to the other (cheaper) matrix arithmetic expressions. I think we shouldn't be misled into trying to treat matrix products as just another matrix arithmetic expression, it's too special a case. Really we are fighting against a basic language limitation here (that one can't trade stack boundaries between functions). Naive question -- does c++0x change anything here? Are rvalue references relevant here?

Created attachment 28 [details] Test case for NestByValue Thanks for opening this bug and taking the time to discuss in so much depth. While the test case I originally posted is very simplistic, the real reason I want to do this is to be able to use boost::proto, in order to be able to write high-level expressions that are specific to our set of problems, but that may evaluate to an Eigen matrix expression in the end. When evaluating the (non-Eigen) expression in boost::proto, a set of nested function calls is made, invalidating any references to temporary matrices inside Eigen expressions such as GeneralProduct. My options are to either store the temporaries somewhere (I don't know if this is possible, and certainly not how), or pass the temporaries by value (works now, see below). None of this should be of consequence for "normal" Eigen usage. Thanks to Gael's code with the specialization for NestByValue, I was able to make long chained products work from within proto. Attached is a test case that mimics my boost::proto test case. It induces a compile error, showing the type of one of the nested products. I had to add an ei_nested specialization for NestByValue, otherwise GeneralProduct would still store its operands by reference. Regarding compiler behavior, I never had any problems as long as I built using -O3. This seems to inline all the boost::proto stuff, so everything works. I'm not sure if this inlining will be preserved with the NestByValue approach, but if it is I would assume that the copying of temporaries would also be avoided?

Thanks for explaining what your use case is, that helps a lot. The point of this whole comment is to put back on the table the idea of an EXPRESSION EVALUATOR. I have been reconsidering my earlier stated opinion: I now believe that we need to switch to a top-down expression evaluator, as suggested by Gael on his forum reply, and as I objected against in comments 1 and 2. First let me clarify what I mean by top-down expression evaluator. In this approach, all Eigen arithmetic would return trivial expression types, e.g. just Product<Lhs, Rhs> for the expression a*b, nothing at all would happen at expression construction time. Then, when evaluating this expression (or using it in another loop such as dot product), it would go through an "expression evaluator" metaprogram that would evaluate the needed temporaries at the places in the expression tree where temporaries are useful. *** KILLING MY ABOVE ARGUMENTS *** Let me then revisit my objection in comment 1, comment 2, and comment 4: * (comment 1) about expression tree depth: not a powerful argument since all other expressions (other than product) already allow arbitrarily deep trees, e.g. a+b+c+d+e. * (comment 1) about the fact that directly using such an expression would be inefficient since it wouldn't by itself have evaluated intermediate steps into temporaries where that is an optimizations: this is just a generalization of the fact that using matrix products as expressions (accessing them coeff by coeff) is inefficient. Moreover, for advanced users who really want to evaluate such expressions multiple times, we could let them construct an *evaluator object* (see below) and use that multiple time. * (comment 2) about the fact that that doesn't solve other problems with dying temporaries: the big difference here is that products create temporaries automatically without the user explicitly asking for it, so that's much harder to justify. In other cases, when the user creates his own temporaries, he can understand that there are things he can't do. * (comment 4) about product being too special an expression: two counterarguments: 1) users _expect_ it to be an expressions like others, and matrix algebra is all about products being algebraic expressions as usual; 2) there are special product cases, such as inner and outer products, where they actually are usual expressions with no need for temporaries, which makes it only more confusing that in other cases they would be special. *** PROS/CONS *** Let me list pros/cons of going the expression-evaluator route: Pros: 1) gives users like Bart what they expect. Allows Bart to easily implement what he wants. 2) the return type of any operator/expression is always easy to guess. Hiding complexity behind typedefs was helpful, but having the type itself be simple is even better (debugging, etc). 3) solves this whole bug 99 -- no temporaries at all nested into xprs anymore 4) Enables whole-expression optimizations on expressions involving products, such as optimizing chained products, e.g. matrix*matrix*vector becomes faster when transformed into matrix*(matrix*vector). Cons: 1) nontrivial to implement. more work for us. 2) (false disadvantage) in presence of nested products, the constructed expressions are no longer pre-optimized at construction time. The user would need to explicitly evaluate or use an evaluator object to get them optimized with temporaries. That is a false disadvantage, because plain products are already exactly like that. It's not efficient to use a plain product as an abstract expression and access it coeff by coeff, whatsoever. *** IMPLEMENTATION *** Since the only real disadvantage is difficulty of implementation, let's discuss that: it's actually not that bad: introduce a new recursive templated struct, call it evaluator<>, whose role is, given a NEW expression (not nesting any temporary), to return the corresponding OLD expression (nesting temporaries as we currently do). In other words, do exactly what Bart says in comment 5 he's trying to do --- except that it's our job, not his. Of course, such an evaluator object can't be returned by value (that would kill the temporaries that it holds, which is the whole point of this bug 99) but the point is that we would never need to: evaluator objects would only be used as local variables inside of the function where they are constructed. For example, in operator= we would do template<typename T> template<typename U> void DenseBase<T>::operator=(const U& other) { assign_impl(*this, evaluator<U>(other)); } and similarly in reductions, etc. So the whole point is that one'd never need to return an evaluator, only use it locally. The evaluator<> struct would have to be implemented for each Eigen expression class. In almost all cases it would be trivial, for example in Transpose: template<typename MatrixType> struct evaluator <Transpose<MatrixType> > { typedef Transpose<typename evaluator<MatrixType>::type> type; type m_expr; evaluator(const Transpose<MatrixType> & x) : m_expr(typename evaluator<MatrixType>::type (x).nestedExpr()) { } }; In the case of matrix products, it would become more subtle as the evaluator would have to store as member variable the potentially nested Lhs and Rhs, however the template logic here is already implemented, this is our nested<> template as used in Product.h.

Shall we in the meantime make the copy constructor of ProductBase protected? Of course with a comment... This should allow the current Eigen internal functions to keep on working as expected but at the same time it should prevent code which directly uses Product<Lhs,Rhs> from compiling.

(In reply to comment #7) > Shall we in the meantime make the copy constructor of ProductBase protected? Of > course with a comment... > > This should allow the current Eigen internal functions to keep on working as > expected but at the same time it should prevent code which directly uses > Product<Lhs,Rhs> from compiling. Sure, go ahead!

What I proposed does not work, since we are relying on public copy ctors in this function: template<typename Derived> template<typename OtherDerived> inline const typename ProductReturnType<Derived,OtherDerived>::Type MatrixBase<Derived>::operator*(const MatrixBase<OtherDerived> &other) const; Befriending all ProductReturnTypes with MatrixBase seems to be a bit overkill in particular when we are going for the "evaluator" fix.

Created attachment 30 [details] evaluator proof of concept Here's a proof-of-concept patch. It implements the evaluator for a few types: Matrix Array MatrixWrapper ArrayWrapper Transpose Here's a test program that compiles and runs correctly with this patch applied: #include <Eigen/Dense> #include <iostream> using namespace Eigen; using namespace std; int main() { Vector2d v(1,2); Vector2d w(3,4); cout << w <<endl<<endl; w=v; cout << w <<endl<<endl; Array<double,1,2> u = w.transpose().array(); cout << u <<endl<<endl; } Business plan: 1. specialize evaluator<> for other non-product dense expressions. See the implementation e.g. in Transpose to see how it's done. This part is easy, just 5 lines of code per expression. 2. specialize evaluator<> for Product (dense matrix products). This is normally hard, except that all the hard template logic (e.g. to decide when to evaluate lhs/rhs into temporaries) is already there, so it's mostly a matter of correctly understanding the current code. Gael, will you help here? 3. currently evaluator is only used in Assign.h. Need to use it in other places such as reductions/visitors too. Basically everywhere expressions get evaluated. 4. ??? 5. Profit!!!

I'm glad you changed your mind, and great job for the proof of concept. However, I'm wondering if we could not try to better decouple the semantic and the implementation. Typically, at construction time we should ideally only build a tree of as simple as possible classes representing the semantic of the operation, doing some basic checks (sizes), and proving the respective common API (i.e., chose the correct base class). And then the evaluator<SemanticExpr<>> specializations could directly be our implementation classes, i.e., our current ExprImpl<> classes. Let me recall that currently we have for instance Transpose<> that only provides a nestedExpression() method and that inherits TransposeImpl<> which is specialized for the different storage types, and this implements the mechanism to select the appropriate base class (sparse, matrix or array). But I agree that you current proposal has some advantages : simplicity since it preserves all our current mechanisms. I'm just feeling that there is perhaps an opportunity for some simplifications and to integrate the "solve" objects into the expression tree. Currently an expression is: - a traits class - a semantic class - an implementation class - an evaluator specialization.... plus some nesting rules, some special matrices with different rules, ....

(In reply to comment #11) > However, I'm wondering if we could not try to better decouple the semantic and > the implementation. Typically, at construction time we should ideally only > build a tree of as simple as possible classes representing the semantic of the > operation, doing some basic checks (sizes), and proving the respective common > API (i.e., chose the correct base class). I agree that the expressions constructed need to avoid carrying any of the evaluator complexity; and most importantly the expression _types_ need to be as simple and transparent as possible. > And then the > evaluator<SemanticExpr<>> specializations could directly be our implementation > classes, i.e., our current ExprImpl<> classes. Be careful to keep direct coeff access on expressions working! as in (a+b)(i,j). In your approach, the only way to make that work would be to have operator() in DenseBase construct a temporary evaluator object. Which I believe is acceptable (it will get inlined away; sure such coeff based access will be slow when there are temporaries but such xprs are not suitable for coeff access anyway). > > Let me recall that currently we have for instance Transpose<> that only > provides a nestedExpression() method and that inherits TransposeImpl<> which is > specialized for the different storage types, and this implements the mechanism > to select the appropriate base class (sparse, matrix or array). > > But I agree that you current proposal has some advantages : simplicity since it > preserves all our current mechanisms. I'm just feeling that there is perhaps an > opportunity for some simplifications and to integrate the "solve" objects into > the expression tree. Currently an expression is: > > - a traits class > - a semantic class > - an implementation class > - an evaluator specialization.... > > plus some nesting rules, some special matrices with different rules, .... So, I agree with all of that, but at the same time we need to release 3.0 quickly. Would you agree with going my approach for now, and then going on with your proposal in 3.1 ? The crucial thing to check is that it wouldn't break API, but afaics it's OK.

Wait, I have a couple more objections :) In order to be able to produce good compile-time errors, we still needs to have expressions know their RowsAtCompileTime etc. So actually we still need traits<>. Together with the need for expressions to be coeff-addressable, hence keep inherit and implement DenseBase, that's quite a lot of stuff that can't be taken out of them. Still interested to see what we can do in 3.1 but at this point I still believe that right now we should proceed with my plan.

(In reply to comment #7) > Shall we in the meantime make the copy constructor of ProductBase protected? Of > course with a comment... I would vote against this, as the current workaround with NestByValue allows me to continue, while the more elegant solution matures. Let me know when you want me to test stuff!

(In reply to comment #13) > Wait, I have a couple more objections :) > > In order to be able to produce good compile-time errors, we still needs to have > expressions know their RowsAtCompileTime etc. So actually we still need > traits<>. Together with the need for expressions to be coeff-addressable, hence > keep inherit and implement DenseBase, that's quite a lot of stuff that can't be > taken out of them. Still interested to see what we can do in 3.1 but at this > point I still believe that right now we should proceed with my plan. yes after sending my previous comment I've been thinking that your approach is not that bad after all, it is a good way to have the best of each approach. There is also some ambiguities for expressions with storage like Matrix which anyway has to provide both the semantic and implementation. Side note: operator[] could return an expression... (e.g., a degenerated block)

After long thinking... I think we should do this for 3.1, not 3.0, unless you already have worked on it. It's a really far-reaching change: did you consider that it completely changes internal::nested, as this should no longer do any cost estimation at all. Basically, nested would only take care to nest xprs by value and plain objects by reference; and then the cost estimation would go to a separate helper should_evaluate_temporary computing a boolean value, used mostly in the evaluator<> for certain xprs like products. Since it's far-reaching, I wanted to try to do it in 3.0, but personnally don't have time. I'm afraid of delaying 3.0 more, and i'm afraid that it could take some time for such deep changes to stabilize. Let's release 3.0 ASAP, and work on this for 3.1. The sad part is that this kind of blocks writing internal documentation on these aspects of our expression templates.

I've been thinking about the nested thing too, and this also has some consequences about the way users have to deal with function arguments that can be expressions, and probably many more... I agree to delay this feature for 3.1, but we have to make sure to propose a mechanism to handle function arguments that is flexible enough to support the 3.0->3.1 changes. I remember we already discussed about this, so I'll try to find it back and open a new bug of it. While delaying, we can still continue the discussion. For instance I'm thinking to some temporary elimination optimization in complex expressions such as: E.noalias() = A*B + C*D; that can be evaluated without any temporary as: E.noalias() = A*B; E.noalias() += C*D; This implies that the evaluator<> has in hand the destination object.

(In reply to comment #17) > I've been thinking about the nested thing too, and this also has some > consequences about the way users have to deal with function arguments that can > be expressions, and probably many more... > > I agree to delay this feature for 3.1, but we have to make sure to propose a > mechanism to handle function arguments that is flexible enough to support the > 3.0->3.1 changes. I remember we already discussed about this, so I'll try to > find it back and open a new bug of it. I dont really understand the concern, an example would help me. Currently people do: template<typename Derived> void myfunction(const DenseBase<Derived> & xpr) { typename Derived::Nested x(xpr); // do some work with x ... } With the proposed change, this Nested thing here would become completely unnecessary, but it would still work --- it would just make a reference in case of a plain object, or just copy in case of an abstract xpr object. So the documentation on how to write functions taking Eigen types would become simpler, but the old way would still work, right? I am scared by your "many more" above, please clarify :) > > While delaying, we can still continue the discussion. For instance I'm thinking > to some temporary elimination optimization in complex expressions such as: > > E.noalias() = A*B + C*D; > > that can be evaluated without any temporary as: > > E.noalias() = A*B; > E.noalias() += C*D; > > This implies that the evaluator<> has in hand the destination object. Cool, I see you filed bug 110 about that. Great idea. This is only an optimization, with no API consequences, right?

(In reply to comment #18) > (In reply to comment #17) > > While delaying, we can still continue the discussion. For instance I'm thinking > > to some temporary elimination optimization in complex expressions such as: > > > > E.noalias() = A*B + C*D; > > > > that can be evaluated without any temporary as: > > > > E.noalias() = A*B; > > E.noalias() += C*D; > > > > This implies that the evaluator<> has in hand the destination object. > > Cool, I see you filed bug 110 about that. Great idea. This is only an > optimization, with no API consequences, right? sure.

Created attachment 143 [details] new proof of concept creating and storing evaluators, not expressions

Live from the eigen meeting today (with Gael and Thomas): Ongoing work at: https://bitbucket.org/bjacob/eigen-bug-99 *** TODO items *** 1. Implement the evaluator for all expressions except Product and ReturnByValue. -> who: Benoit, Thomas -> in progress in above repo 2. Implement the evaluator for Product -> who: Gael 3. Implement the evaluator for solvers -> depends on 1 4. Implement the evaluator for inverse() -> depends on 3 5. Finish the refactoring around ReturnByValue -> depends on 2 and 4 6. Figure if this allows us to refactor TransposeImpl, perhaps move stuff to evaluator<Transpose> -> depends on 1 7. Refactor BLAS traits -> depends on 2 8. Allow optimizations like D=A+B*C --> D=A;D+=B*C -> who: Gael -> depends on 2 -> we agreed that this will be best done by partially specializing the implementation of operator= as a template struct (like the current assign_selector/assign_impl). 9. Allow optimizations like A.inverse()*B --> A.lu().solve(B) -> depends on 4 10. General tree optimizations, chained products optimizations

given the complexity of this task, the technical plan is now updated and refined on this wiki page: http://eigen.tuxfamily.org/index.php?title=Working_notes_-_Expression_evaluator

I did some work on the train last night. As I understand it, the copy_using_evaluator should in the long term support vectorization and unrolling because it needs to do what assign_impl is doing now. With this in mind, I changed copy_using_evaluator to traverse the matrix/array by inner/outer instead of row/column. I also changed copy_using_evaluator to use the evaluator only and not the expression objects. See repo. At that point, I hit a problem. The working notes say that the expression object computes the size (number of rows / columns). However, copy_using_evaluator also needs to know this, and it can only get this information from the evaluator. So, the evaluator should provide rows() and cols() methods. Perhaps I missed this discussion, but it is not clear to me how should we implement these. Possibilities I see: 1. The evaluator computes the size itself - duplicates the work from the expression object. 2. The rows() / cols() methods are moved from the expression object to the evaluator. This means that the expression object cannot do any sanity checks on the size. 3. The evaluator stores a reference to the expression object. Does this have implications for performance or will the compiler optimize everything away?

what about a 4th possibility that is to adopt the multiple passes strategy of NT². Well not exactly, but in theory the assignment does take as input expressions, not the evaluator, and then it creates evaluator: template<Lhs,Rhs> void assign(Lhs& lhs, Rhs& rhs) { Index rows = rhs.rows(); Index cols = rhs.cols(); // at this stage the lhs should have been properly resized, but if not // we can still do it: // lhs.resize(rows,cols); Evaluator<Lhs> lhs_eval(lhs); Evaluator<Rhs> rhs_eval(rhs); for(i,j) lhs_eval.coeff(i,j) = rhs_eval.coeff(i,j); // done :) }

I don't really know what NT^2 is doing. Gael's last comment solves the problem with assign, but how can evaluators do the actual evaluation without knowing the size? For instance, how does the Product evaluator work? It seems the coeff() function needs to know the size of the arguments of the product: template<LhsType, RhsType> struct evaluator<Product<LhsType, RhsType> > { typedef Product<LhsType, RhsType> ProductType; evaluator(const LhsType& lhs, const RhsType& rhs) : m_lhs_eval(lhs), m_rhs_eval(rhs) {} typename ProductType::CoeffReturnType coeff(Index i, Index j) const { Scalar res = 0; for (Index k = 0; k < ??? WHAT TO PUT HERE ??? ; ++k) res += m_lhs_eval.coeff(i,k) * m_rhs_eval.coeff(k,j); return res; } protected: typename evaluator<LhsType>::type m_lhs_eval; typename evaluator<RhsType>::type m_rhs_eval; }

I've just imported the bug-99 repo into main Eigen devel branch (changeset 5faaae183ff7), the files are: Eigen/src/Core/AssignEvaluator.h Eigen/src/Core/CoreEvaluators.h test/evaluators.cpp

I'm starting to get vectorization going. It's rather less scary than I'd expected; I guess that indicates that the implementation in Eigen3 is well designed. My current work is ugly in that a lot of code is replicated. I think that some of this can be avoided if the evaluator inherits from DenseCoeffBase. However, I don't understand the overall design well enough to decide whether this is a good idea. Comments, anybody?

if I remember correctly, I think that the only methods that can be factorized are the copy* methods. Currently they are only used by the self-binary-op (for optimized +=, -=, ...) and for swap. For these use cases, perhaps a cleaner design would be to generalize the dense traversal routines by a little struct allowing to customize the assignments. There would be a default implementation simply doing dst.coeffRef(i,j) = src.coeff(i,j), and then the swap/self-cwise-binary-op could implement their owns. Basically, this boils down to moving the copy* methods from DenseBase into a separate, well defined, class.

Indeed, when I look at the actual methods provided by DenseCoeffBase, it becomes clear that it does not make sense for evaluators to inherit from there. But your explanation was very useful in explaining why the copyCoeff() and the copyPacket() methods are there. I was wondering about that because they seem to do something completely trivial, but they are apparently used to implement += etc. When I tried to implement an evaluator for Block expression (needed to test slice vectorization) I hit a snag: Block does not expose its nested argument - the expression that the block is taken out of. Would it be okay to add a nested() method for this purpose, surrounded by #ifdef EIGEN_ENABLE_EVALUATORS guards? That would not break any tests.

sure feel free to add a nestedExpression() method, and no need to guard it.

Created attachment 156 [details] Use evaluators in assignments and VERIFY_IS_APPROX A small progress report. At the moment, copy_using_evaluator() supports vectorization and unrolling. It's a bit ugly because the responsibilities between evaluator and expression objects is not clearly separated at the moment. It would help if the evaluator objects could map from (outer,inner) indices to (row,col) indices. Indeed, they should be able to do so eventually because this depends on storage order, and storage order is the responsibility of the evaluator object. Some of the core classes have evaluators: Matrix, Array, Transpose, CwiseNullaryOp, CwiseUnaryOp, CwiseBinaryOp, CwiseUnaryView, Map, Block, Select, Replicate, PartialReduxExpr. The attached patch make the assignment operator and VERIFY_IS_APPROX use evaluators. This breaks all tests that use expression for which there are no evaluators. However, at least two unit tests pass: array and basicstuff. The biggest missing evaluator is the one for products, and that is also the most interesting / difficult case because it is where there is a use for temporary objects for intermediate results. I'm still leaving this for Gael.

Created attachment 165 [details] Use evaluators in assignments and VERIFY_IS_APPROX After changeset 3e4f673bd316, .swap() and compound assignments (operator+= etc.) also work with evaluators, and the expressions MatrixWrapper, ArrayWrapper, Reverse and Diagonal also have evaluators. The attached patch is the updated version for changing lazyAssign() and VERIFY_IS_APPROX to use evaluators. After this, about half the tests run fine (see http://eigen.tuxfamily.org/CDash/buildSummary.php?buildid=5970 ), so we are making some progress. Most of failing tests are caused by products, which are still not supported, or some of the expressions in the Geometry module. I don't know how to continue so I would like some guidance on what to do next. There is some more low-hanging fruit, but I am not sure whether it's wise to do so because we may have to change the design of evaluators when we try to do products. Thus, I'd like to wait till somebody tackles products. I feel rather unsure on whether I'm able to do so myself; for instance it's not clear to me where the decision should be taken whether a product should be evaluated or not. Perhaps a first step would be to decide this in the evaluator for the expression in which the product is nested, just as it is done now.

I just committed a bunch of stuff to do with evaluators for products (revisions dee6dd368db3 - 178f7f2e8375). I felt it necessary to change the design of evaluators and I want to explain my reasoning here. Probably the design will require further changes; for one it's a bit more complicated than I'd like, but perhaps that's necessary. Consider the statements: MatrixXf A, B, C; A = B * C; // without noalias A.noalias() = B * C; // with noalias The expression B * C evaluates to an expression object of type Product which contains references to B and C but is not responsible for evaluating the product. The assignment operator construct a product evaluator which takes care of the evaluation. In the first assignment (without noalias), the product evaluator should evaluate the product in a temporary; in the second assignment (with noalias), it should not but instead write the result immediately into A. However, the current design says that all evaluators need to implement .coeff(). I propose - and implemented - the following change to the design: *** There are two kinds of evaluators: *** *** one kind which implements .coeff(), and *** *** another kind which implements .evalTo() *** Products (at least in the dynamic-size case) are of the latter kind, and the ReturnByValue class will probably be merged with this. An evaluator_traits class is used to distinguish between the two cases. In the assignment operator, we check the traits of the evaluator of the right-hand side, and either traverse coefficient-wise if the evaluator implements .coeff(), or call .evalTo() if the evaluator implements that. There remain some details. Firstly, in an assignment without noalias(), if the right-hand side is a product, it should be evaluated in a temporary so that stuff like A = A * B still works. For this, I introduced an EvalToTemp expression object which indicates that the expression should be evaluated into a temporary, and I added a function to the assignment operator for adding this EvalToTemp if necessary. Secondly, in expressions like A * B + C, the evaluator for the sum needs to access the coefficients of both arguments, A * B and C. However, the evaluator for A * B does not implement .coeff() but only .evalTo(). So, in this case we also need to evaluate the product A * B into a temporary using EvalToTemp, and the evaluator for the EvalToTemp expression exposes .coeff() . Comments welcome!

sorry, I will need some time to remember all the details, but wouldn't it be possible to simply rely on partial specialization to handle theses use cases? I mean something like: template<typename Dest, typename ProdLhs, typename ProdRhs> struct assign<Dest,Product<ProdLhs,ProdRhs> { ...; }; Of course this requires some changes on how assignments are handled. I will elaborate more on this later. Regarding EvalToTemp, I have to look more closely how it is used but at the moment, it sounds good.

btw, regarding temporaries, we already had a quite precise plan for them: http://eigen.tuxfamily.org/index.php?title=Working_notes_-_Expression_evaluator#Temporaries

(In reply to comment #35) > btw, regarding temporaries, we already had a quite precise plan for them: > > http://eigen.tuxfamily.org/index.php?title=Working_notes_-_Expression_evaluator#Temporaries Yes, I tried to follow that as closely as possible. In particular, every evaluator stores its "sub" evaluator(s) by value, and temporaries are also stored by value. However, I was unsure how to implement: "Then, when an expression knows that it has to be evaluated itself, it can simply store a PlainObject result member ..." How can an expression know that it has to be evaluated itself? The EvalToTemp evaluator is, as I see it, the combination Lhs::PlainObject + evaluator<Lhs::PlainObject> in the final example in that section but wrapped into one object.

Created attachment 295 [details] proof of concept v2 I attached a new proof of concept with evaluators, tree optimizers, a generic Assignment<> mechanism, etc.

I did an important refactoring there: https://bitbucket.org/eigen/eigen/commits/c0b27cc027eb/ Changeset: c0b27cc027eb User: ggael Date: 2013-11-06 18:17:59 Summary: Bug 99: refactor assignment and compound assignment mechanism through "assignment functors" and "assignement kernels". The former is very low level and generic. The later abstarct the former for dense expressions. This refactoring permits to get rid of the very ugly SwapWrapper and SelfCwiseBinaryOp classes. In the future, this will also permit to simplify all these evaluation loops and perhaps to reuse them for reductions. That will also permit to specialize for operations like expr1 += expr2 outside Eigen, and so for any kind of expressions (dense, sparse, tensor, etc.) This change will also permit some more cleaning. After that I plan to work on the products.

Created attachment 400 [details] Proof of concept V3 New version of the minimalist proof of concept example. This version is closer to what we currently have in Eigen, but more importantly it shows how to handle different kind of evaluation mechanisms (dense, sparse, etc.). Indeed, it is important to be able to specialize both evaluators and assignments for the different kind of storages. To this end, the proposal is to add a template parameters to the Evaluator and Assignment classes. For the former, I currently see only two possibilities: - one with read-write Coeff/Packet accessors for sequential storage (dense, triangular, band, etc.) - one based on iterators for reading, and "push back" for writing. Perhaps, "eval-to-temp" is one more evaluator possibility. The proof-of-concept use enums to distinguish the two cases. In Eigen I think we should use types so that this mechanism can be more easily extended outside Eigen. For Assignment, the situation is more complicated because there are many more situations to handle as the destination and source can be: Dense, Diagonal, Band, Triangular, SelfAdjoint, Sparse, etc. This makes 25 possibilities among which some are not allowed (e.g., Dense to Sparse, or Dense to Diagonal) and some can probably be handled with the same general mechanism (e.g., Band, Triangular, SelfAdjoint). So here the idea is to select the kind of assignment based on assignment_kind<DstEvaluatorKind,SrcEvaluatorKind>::Kind. Again, here Kind should be a type and not an enum as in the proof-of-concept. The big question is whether these two additional template parameters (one of Evaluator, and one or Assignment) are enough? What if someone want to specialize an evaluator for a given scalar type, e.g., for autodiff at the matrix level? What if someone want to specialize on any other attributes?

In order to keep the devel branch quiet with respect to compilation issues, the developement of this feature is now appening there: https://bitbucket.org/ggael/eigen-evaluators

I'm making good progress. Most of the features of the Core module are now supported by the evaluators and I've just synced the evaluator branch with the default one. Regarding the Core module, it mostly remain to port the 'solve' interface which should not be difficult. However, I'm facing a though problem regarding coefficient-base products. Indeed, in some configurations, evaluator<Product<...> >::coeff(i,j) is best implemented as: lhs.row(i).transpose().cwiseProduct(rhs.col(j)).sum() See for instance src/Core/ProductEvalautors.h, line 504 (https://bitbucket.org/ggael/eigen-evaluators/src/84b068fb83db03f9ec4a60b58fecc7d61dc3ddfd/Eigen/src/Core/ProductEvaluators.h?at=default#cl-504) The problem is that here 'lhs' and 'rhs' have to be evaluators, not expressions. Therefore the row(i) and col(j) methods are not available and we do not want to re-implement here this reduction because there are many cases to handles regarding vectorization... I've no clue to address this issue elegantly, so any ideas are welcome. I'll try with a kind of evaluator-to-expression wrapper, but that seems rather ugly.

Finally, I workaround the issue by simplifying the implementation of evaluator<Product<...> >::coeff(i,j): https://bitbucket.org/ggael/eigen-evaluators/commits/419051e454bad67410139614fc9dfa6b125ef1ee

Open question: what should be the return type of A.solve(B) ? Here 'A' can be any object having a solve method (TriangularView, LDLT, JacobiSVD), etc. I'm tempted to return a combined expression of the form: Product<Inverse<type_of_A>, type_of_B> because we want to support such an API anyway. In other word, A.solve(B) would be an alias A.inverse() * B. The problem is that A.solve(B) does not always mean A.inverse() * B. Some examples: * For SVD or QR, it's rather A.pseudoInverse() * B. * For an iterative solver, inverse does not really make sense. Same for a preconditioner. * For a non-linear solver, this is even worse. So it seems we would need a Solve expression anyway, but what about the solve that are really aliases of an inverse product? Shall they return a Solve expression to be consistent across all solve methods, or shall they return an inverse product to avoid redundancy and a more uniform treatment across the two API?

First of all, thanks for the great work you are doing on this! Since you explicitly added me to the cc-list, I try to give some half-educated suggestions (maybe they rise even more questions, though). Basically, your alternatives for the Solve-expression seem to be Product<Inverse<type_of_A>, type_of_B> and Solution<type_of_A, type_of_B> (or maybe "Backslash<type_of_A, type_of_B>", following matlab nomenclature?) I assume, from the evaluators viewpoint the second comes more natural. Would it be a big problem, to let A.inverse() * B automatically return Solution<A,B>? A pro of the first alternative, I assume, would be that expressions like A.inverse()*B.inverse()*C* (...) will be more natural. Furthermore, B*A.inverse() does not require its own ReturnType, nor some complicated tree such as Transpose<Solution<Transpose<type_of_A>, Transpose<type_of_B> > >. And the evaluator will have to decide in which order to best evaluate that anyways? E.g., `B * A.inverse() * x` should usually first evaluate A.solve(x) then multiply by B (unless B is smaller than x) Is there a point, where we need to distinguish between Inverse and PseudoInverse? If the Inverse exists, it is equal to the PseudoInverse. However, the pseudo-inverse usually needs some kind of an additional "epsilon" parameter and the iterative solvers need some kind of stopping criterion. And I think the non-linear solvers require a different interface anyways, starting from the way the non-linear functor is passed.

Somewhat related is bug 631: What shall happen if a failed decomposition is used for solve() or inverse()?

(In reply to comment #43) > Open question: what should be the return type of A.solve(B) ? > [...] > I'm tempted to return a combined expression of the form: > > Product<Inverse<type_of_A>, type_of_B> > [...] [ this was written before I read Christoph's reply ] I did not study what you did in your branch yet, so this may well be off the mark, but my first thought is almost opposite to yours. I would have A.solve(B) return a Solve expression and A.inverse() * B return a Product<Inverse> expression which is then rewritten as a Solve expression before evaluation. I think this would separate the three phases (build expression tree, rewrite tree, evaluate) most cleanly. In my mind, building the tree should be trivial, the maths should go in the rewriting, and the computer science in the evaluation. (I know this is an over-simplication and especially the last two phases are not that well separated). Even if A is a TriangularView, there is a difference between computing its inverse and multiplying it with b and solving without computing the inverse. If I remember correctly, there are different implementations for A.inverse() * b depending on the size of A? Another concern with having A.solve(B) return Product<Inverse> is what if we decide later that we want A.solve(B) have a different result than A.inverse() * B in some exceptional circumstances (e.g., A not invertible)? This is perhaps theoretical as I can't see a good reason for such a decision at the moment, but it would be a shame if we had to change the return type.

(In reply to comment #41) > However, I'm facing a though problem regarding coefficient-base products. > Indeed, in some configurations, evaluator<Product<...> >::coeff(i,j) is best > implemented as: > > lhs.row(i).transpose().cwiseProduct(rhs.col(j)).sum() > > See for instance src/Core/ProductEvalautors.h, line 504 > (https://bitbucket.org/ggael/eigen-evaluators/src/84b068fb83db03f9ec4a60b58fecc7d61dc3ddfd/Eigen/src/Core/ProductEvaluators.h?at=default#cl-504) > > The problem is that here 'lhs' and 'rhs' have to be evaluators, not > expressions. Therefore the row(i) and col(j) methods are not available and we > do not want to re-implement here this reduction because there are many cases to > handles regarding vectorization... I encountered a similar problem with the evaluator for A.colwise().sum() (type PartialRedux or something like that), which I did not know how to solve. I think I did the same as you, storing the expression in the evaluator object, which does not feel good but seems to work.

At some point, we definitely want to support the more natural API: B * A.inverse() A.inverse() * B and even: A.transpose().inverse() * B Again, here 'A' can be a decomposition, e.g.: PartialPivLU<MatrixXd> lu(mat); B * lu.transpose().inverse(); which is much more readable than a Lapack-like style: lu.solve<MakeTranspose,OnTheRight>(B) Actually, I like to see the decomposition classes just as a different kind of storage for matrices and in the future I'd like to extend their API to reflect that vision... I was just wondering about the return type on the 'solve' API... Perhaps I'm just over-bothering here, and so far, I've taken the simple way of introducing a Solve<> expression.

(In reply to comment #47) > (In reply to comment #41) > > The problem is that here 'lhs' and 'rhs' have to be evaluators, not > > expressions. Therefore the row(i) and col(j) methods are not available and we > > do not want to re-implement here this reduction because there are many cases to > > handles regarding vectorization... > > I encountered a similar problem with the evaluator for A.colwise().sum() (type > PartialRedux or something like that), which I did not know how to solve. I > think I did the same as you, storing the expression in the evaluator object, > which does not feel good but seems to work. Yes, I cannot see any better option at the moment.

(In reply to comment #46) > I did not study what you did in your branch yet, so this may well be off the > mark, but my first thought is almost opposite to yours. I would have A.solve(B) > return a Solve expression and A.inverse() * B return a Product<Inverse> > expression which is then rewritten as a Solve expression before evaluation. Good, finally that's the path I followed.

Question: - Shall we keep the packet/writePacket accessors? They are documented as internal, so in theory they are not part of the "public/stable" API. In Eigen they are only used internally to implement some SSE specializations for inverse and quaternion, but we can explicitly create an evaluator. If we keep them, there implementation should look like the implementation of coeff: return typename internal::evaluator<Derived>::type(derived()).packet(index);

(In reply to comment #51) > - Shall we keep the packet/writePacket accessors? They are documented as > internal, so in theory they are not part of the "public/stable" API. [...] In my user code I sometimes use them as convenience functions when writing the results of custom SSE-operations into Eigen::Arrays. But I can easily replace writePacket() by the corresponding SSE function along with `&(arr.coeff(i,j))` if you don't like to keep them anymore. Actually, it would be even nicer to make the implementation of custom Unary/Binary functors with vectorization more easy -- including functors where input and output operands don't all have the same size.

I forgot to update this thread, but I'm glad to announce that as the 1st of August, all unit tests compile and run fine with evaluator enabled and deprecated code disabled :) I benched a few representative unit tests and I did not observed any change in compilation speed or running time, which is pretty good too ;) However, this does not mean that everything as been properly ported to the new evaluator mechanism. For instance Transform still use custom types as the return type of products. So there are still a lot of work to do! Nonetheless, that's still a great achievement as this branch can now be tested on real world user codes. We can now safely continue the migration step by step.

(In reply to Gael Guennebaud from comment #53) > I forgot to update this thread, but I'm glad to announce that as the 1st of > August, all unit tests compile and run fine with evaluator enabled and > deprecated code disabled Great news! Time to merge?

Merge done. The old code has been removed too. Next steps: - get rid of the remaining pseudo expressions implemented on top of ReturnByValue<> - migrate the Geometry module, in particular the handling of products with Transform<> - update unsupported modules such as MatrixFunctions and KroneckerProduct (they are working, but their design can be simplified to take advantage of the new engine) - more cleaning and documentation

Hi Gael, There seems to be a bug in the recent big merge. The following no longer compiles: Eigen::Vector3d a = Eigen::Vector3d::Random(), b = Eigen::Vector3d::Random(); Eigen::Matrix3d M = Eigen::Matrix3d::Random(); Eigen::Vector3d c = a.cross(M*b) << std::endl; I am using g++ (Ubuntu 4.8.2-19ubuntu1) 4.8.2 and changeset 6440:c18103ee31bd The error message is: eigen/Eigen/src/Core/util/StaticAssert.h:114:9: error: ‘THIS_METHOD_IS_ONLY_FOR_INNER_OR_LAZY_PRODUCTS’ is not a member of ‘Eigen::internal::static_assertion<false>’ if (Eigen::internal::static_assertion<static_cast<bool>(CONDITION)>::MSG) {} ^ eigen/Eigen/src/Core/Product.h:199:7: note: in expansion of macro ‘EIGEN_STATIC_ASSERT’ EIGEN_STATIC_ASSERT(EnableCoeff, THIS_METHOD_IS_ONLY_FOR_INNER_OR_LAZY_PRODUCTS); Thank you!

Sorry, there was a mistake in my code snippet. The line causing the compilation error should have been: Eigen::Vector3d c = a.cross(M*b);

(In reply to jsmt3491 from comment #56) Thank you for the report, fixed: https://bitbucket.org/eigen/eigen/commits/714148561d4f/ Changeset: 714148561d4f User: ggael Date: 2014-09-24 07:39:09+00:00 Summary: Fix nested_eval<Product<> > which wrongly returned a Product<> expression

Thanks Gael. Here is another one: Eigen::Vector3d a = Eigen::Vector3d::Random(), b = Eigen::Vector3d::Random(); Eigen::SimplicialLLT<Eigen::SparseMatrix<double>,Eigen::Upper> M; // ... a = M.matrixU()*b; The error message is eigen/Eigen/src/Core/ProductEvaluators.h:138:69: error: incomplete type ‘Eigen::internal::generic_product_impl<Eigen::TriangularView<Eigen::Transpose<const Eigen::SparseMatrix<double> >, 2u>, Eigen::Matrix<double, 3, 1>, Eigen::internal::SparseTriangularShape, Eigen::DenseShape, 1>’ used in nested name specifier generic_product_impl<Lhs, Rhs>::evalTo(dst, src.lhs(), src.rhs()); There is also a minor issue with the conditional ? : operator. I am now using eval() for both results if they are expressions, but previously it wasn't necessary...

(In reply to jsmt3491 from comment #59) > Here is another one: Fixed: https://bitbucket.org/eigen/eigen/commits/c8f4a2e8b571/ Changeset: c8f4a2e8b571 User: ggael Date: 2014-10-06 14:11:26+00:00 Summary: Re-enable products with triangular views of sparse matrices: we simply have to treat them as a sparse matrix. > There is also a minor issue with the conditional ? : operator. I am now > using eval() for both results if they are expressions, but previously it > wasn't necessary... I would recommend to use a if/then/else construct because the ternary operator does not play nicely with expression templates (both the the and else operands must have the same type, and both operands have to be evaluated).

At this stage, we can close this entry and open new specific ones if needed.

-- GitLab Migration Automatic Message -- This bug has been migrated to gitlab.com's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.com/libeigen/eigen/issues/99.