Expression templates

From Eigen
Revision as of 14:16, 25 June 2008 by Bjacob (Talk | contribs)

Jump to: navigation, search

Expression templates refers to a c++ coding technique that was discovered in the 90's, which can greatly improve the performance and the API cleanness and expressiveness of certain kinds of c++ template libraries, and is especially useful for a vector/matrix library such as Eigen. First, some links: wikibooks and the links given there.

Motivation : better performance through lazy evaluation

Suppose that we want to write a 3D Vector class. The standard approach is:

class Vector
{
    float x, y, z;
  public:
    Vector() {}
    Vector(float _x, float _y, float _z) : x(_x), y(_y), z(_z) {}
    Vector operator+(const Vector& other) const
    {
      return Vector(x+other.x, y+other.y, z+other.z);
    }
    void operator=(const Vector& other)
    {
      x = other.x;
      y = other.y;
      z = other.z;
    }
};

With such a class, we can compute sums of vectors using operator+, like this:

Vector v1, v2, v3, v4;
v1 = v2 + v3 + v4;

As operator+ and operator= get inlined, the compiler will produce this code:

Vector tmp1, tmp2;
tmp1.x = v2.x + v3.x;
tmp1.y = v2.y + v3.y;
tmp1.z = v2.z + v3.z;
tmp2.x = tmp1.x + v4.x;
tmp2.y = tmp1.y + v4.y;
tmp2.z = tmp1.z + v4.z;
v1.x = tmp2.x;
v1.y = tmp2.y;
v1.z = tmp2.z;

This is performing no less than 9 float copies, while it was possible to do the same with only 3 copies:

v1.x = v2.x + v3.x + v4.x;
v1.y = v2.y + v3.y + v4.y;
v1.z = v2.z + v3.z + v4.z;

Depending on cases, it is possible that the compiler can do that optimization automatically, but in our experience, outside of really simple operations (like perhaps the one we are discussing), they can't. This is a difficult operation for the compiler to make, as it requires to reorder instructions.

Thus, the removal of temporaries, also known as "lazy evaluation", can be a major optimization. For example, for arithmetic on 3D vectors, we typically measure speed improvements between x1.5 and x2. We will see that expression templates allow that.

Motivation : better API through lvalue-returning methods

Now suppose that we have a 2D vector class, like this:

class Vector2
{
    float x, y;
  public:
    Vector2() {}
    Vector2(float _x, float _y) : x(_x), y(_y) {}
    Vector2 operator+(const Vector2& other) const
    {
      return Vector(x+other.x, y+other.y);
    }
    void operator=(const Vector2& other)
    {
      x = other.x;
      y = other.y;
    }
    void operator+=(const Vector2& other)
    {
      x = other.x;
      y = other.y;
    }
};

We now want to write a 3D vector class that can interplay with the 2D vector class to manipulate the two first coefficients, which we call the "head". The standard approach is as follows:

class Vector3
{
    float x, y, z;
  public:
    Vector3() {}
    Vector3(float _x, float _y, float _z) : x(_x), y(_y), z(_z) {}
    Vector3 operator+(const Vector3& other) const
    {
      return Vector3(x+other.x, y+other.y, z+other.z);
    }
    void operator=(const Vector3& other)
    {
      x = other.x;
      y = other.y;
      z = other.z;
    }
    Vector2 head() const
    {
      return Vector2(x, y);
    }
    void setHead(const Vector2& other)
    {
      x = other.x;
      y = other.y;
    }
};

Let us leave aside the performance issues with head() returning a vector by value: we already discussed them in the previous section. Instead, let us focus on the API's cleanness and expressiveness.

It is clearly impossible to set the head of a vector by writing

Vector2 v2; Vector3 v3;
v3.head() = v2;

as this would copy v2 into the temporary vector returned by head(), not into v3 itself. Hence, this would have no effect on v3. This is why we introduced setHead() so that it is possible to do:

v3.setHead(v2);

Thus, we already have introduced two different methods to reach the current API expressiveness. Now the problem is, even with that, the API is still not very powerful. Suppose that we now want to add v2 into the head of v3. Again, ideally we would like to do

v3.head() += v2;

but again this is not possible, for the same reason. The object returned by head() is not a lvalue -- at least it makes no sense to use it as such, and it should probably be marked as const to make sure that trying to use it as a lvalue generates a compiler error.

So if we want to add v2 to the head of v3, either we can do

Vector2 tmp = v3.head();
tmp += v2;
v3.setHead(tmp);

which is slow for the same reason discussed in the previous section, or we need to add another method Vector3::addToHead(), which makes our API more complicated. Vector libraries taking this way end up with dozens of such methods to handle many kinds of usual operations.

The ideal solution is to make head() return an actual lvalue so that

v3.head() += v2;

just works. This is made possible by expression templates.

Simple Vector class with expression classes, without templates

Simple Vector class with expression templates, without inheritance

Simple Vector class with expression templates in Eigen's style

Guided tour of Eigen's expression templates