New user self-registration is disabled due to spam. Please email eigen-core-team @ lists.tuxfamily.org if you need an account.
Before reporting a bug, please make sure that your Eigen version is up-to-date!
Bug 1001 - infer dimensions of Dynamic-sized temporaries from the entire expression (if possible)
Summary: infer dimensions of Dynamic-sized temporaries from the entire expression (if ...
Status: RESOLVED WONTFIX
Alias: None
Product: Eigen
Classification: Unclassified
Component: Core - matrix products (show other bugs)
Version: 3.4 (development)
Hardware: All All
: Low Optimization
Assignee: Nobody
URL:
Whiteboard:
Keywords:
Depends on: ExprEval
Blocks: 3.4
  Show dependency treegraph
 
Reported: 2015-04-20 15:33 UTC by Emily
Modified: 2019-02-18 21:27 UTC (History)
2 users (show)



Attachments

Description Emily 2015-04-20 15:33:50 UTC
I'm calculating "scalar = vector1.transpose() * matrix * vector2". This is the inner most time critical portion of my code, the expression is evaluated using a temporary matrix on the heap. Vectors 1 & 2 are compile-time fixed size, and the matrix is a block to a dynamically allocated matrix. There should be no need to allocate a temporary on the heap in this case. The heap allocation attributes to over 50% of my total runtime.

I would very much like if this could be optimized to not cause a temporary on the heap.

The following code illustrates my use case:

    Eigen::internal::set_is_malloc_allowed(true);
    Eigen::ArrayXXd data(1000, 1000);
    data.setRandom();

    Eigen::internal::set_is_malloc_allowed(false);

    Eigen::Ref<const Eigen::Array<double, Eigen::Dynamic, Eigen::Dynamic>> m_coeff(data.block(30,30,4, 4));

    double x0 = 0.5;
    double x1 = 0.5;

    Eigen::Vector4d X0;
    X0[0] = x0*(4 - 3 * x0) - 1;
    X0[1] = x0*(9 * x0 - 10);
    X0[2] = x0*(8 - 9 * x0) + 1;
    X0[3] = x0*(3 * x0 - 2);

    Eigen::Vector4d X1;
    X1[0] = x1*((2 - x1)*x1 - 1);
    X1[1] = x1*x1*(3 * x1 - 5) + 2;
    X1[2] = x1*((4 - 3 * x1)*x1 + 1);
    X1[3] = x1*x1*(x1 - 1);

    // Will trigger assert as transpose() is evaluated to a temporary on the heap.
    double ans = double(X0.transpose() * (m_coeff.matrix() * X1)) / 4.0;
Comment 1 Christoph Hertzberg 2015-04-22 02:23:41 UTC
X0.transpose() is not allocated on the heap, but m_coeff.matrix() * X1 is because it results in a Dynamic-sized vector.
This will eventually be fixed with evaluator-tree optimizations, but certainly not in 3.2.
If you know the size of the block (and the Ref) at compile-time, write:
  Ref<const Array<double, 4, 4> > m_coeff(data.block<4,4>(30,30));
If you access m_coeff always in a matrix-like manner, you can also use
  Ref<const Matrix4d> m_coeff(data.block<4,4>(30,30));

Furthermore, you may want to replace the last line by
  X0.dot(m_coeff.matrix() * X1) / 4.0;
Comment 2 Emily 2015-04-22 09:14:40 UTC
Thanks, those tips gave me another 30% on top of the 50% or so I gained from removing the temporary by hand (evaluated into a vector4d temporary).

I love you guys!
Comment 3 Christoph Hertzberg 2015-04-22 09:19:22 UTC
I renamed the bug to describe the remaining issue.

Further examples, that might eventually be influenced:
  MatrixXd A, B;
  Matrix4d C;
  Vector4d x,y;

  C+A*B;   // A*B must be 4x4
  x+A*y;   // A must be 4x4, A*y must be 4x1


Probably not important enough to block 3.3
Comment 4 Gael Guennebaud 2019-02-18 21:27:07 UTC
I'm not sure this is worth the effort: increase in internal logic complexity, increase compilation time for very little gain in practice because this is something that is expected to be handled by user code.

There are many more interesting optimization opportunities, e.g., mat*mat*vec just to name one.

Note You need to log in before you can comment on or make changes to this bug.