Expression templates

From Eigen
Jump to: navigation, search

Expression templates refers to a c++ coding technique that was discovered in the 90's, which can greatly improve the performance and the API cleanness and expressiveness of certain kinds of c++ template libraries, and is especially useful for a vector/matrix library such as Eigen. First, some links: wikibooks and the links given there.

Motivation : better performance through lazy evaluation

Suppose that we want to write a 3D Vector class. The standard approach is:

class Vector
{
    float x, y, z;
  public:
    Vector() {}
    Vector(float _x, float _y, float _z) : x(_x), y(_y), z(_z) {}
    Vector operator+(const Vector& other) const
    {
      return Vector(x+other.x, y+other.y, z+other.z);
    }
    void operator=(const Vector& other)
    {
      x = other.x;
      y = other.y;
      z = other.z;
    }
};

With such a class, we can compute sums of vectors using operator+, like this:

Vector v1, v2, v3, v4;
v1 = v2 + v3 + v4;

The problem here is that operator+ stores the result of its computation in a temporary variable. Introducing temporaries here is in fact useless and inefficient. Indeed, assuming that operator+ and operator= get inlined, the compiler will produce this code:

Vector tmp1, tmp2;
tmp1.x = v2.x + v3.x;
tmp1.y = v2.y + v3.y;
tmp1.z = v2.z + v3.z;
tmp2.x = tmp1.x + v4.x;
tmp2.y = tmp1.y + v4.y;
tmp2.z = tmp1.z + v4.z;
v1.x = tmp2.x;
v1.y = tmp2.y;
v1.z = tmp2.z;

This is performing no less than 9 float copies, while it was possible to do the same with only 3 copies:

v1.x = v2.x + v3.x + v4.x;
v1.y = v2.y + v3.y + v4.y;
v1.z = v2.z + v3.z + v4.z;

Depending on cases, it is possible that the compiler will do that optimization automatically, but in our experience, outside of really simple operations (like perhaps the one we are discussing), it can't. This is a difficult optimization for the compiler to make, as it involves reordering instructions.

Thus, the removal of temporaries, also known as "lazy evaluation", can be a major optimization. For example, for arithmetic on 3D vectors, we typically measure speed improvements around x1.5. We will see that expression templates allow that.

Motivation : better API through lvalue-returning methods

Now suppose that we have a fixed-size vector class, with the size as template parameter, and with start() and setStart() methods to manipulate sub-vectors formed of the first coefficients, like this:

template<int Size>
class Vector
{
    float data[Size];
  public:
    Vector() {}
    void operator=(const Vector& other)
    {
      for(int i=0;i<Size;i++) data[i] = other.data[i];
    }
    void operator+=(const Vector& other)
    {
      for(int i=0;i<Size;i++) data[i] += other.data[i];
    }
    template<int N> Vector<N> start() const
    {
      Vector<N> result;
      for(int i=0;i<N;i++) result.data[i] = data[i];
      return result;
    }
    template<int N> void setStart(const Vector<N>& other)
    {
      for(int i=0;i<N;i++) data[i] = other.data[i];
    }
};

Let us leave aside the performance issues with start() returning a vector by value: we already discussed them in the previous section. Instead, let us focus on the API's cleanness and expressiveness.

It is clearly impossible to set the start of a vector by writing

Vector<2> v2; Vector<3> v3;
v3.start<2>() = v2;

as this would copy v2 into the temporary vector returned by start(), not into v3 itself. Hence, this would have no effect on v3. This is why we introduced setStart() so that it is possible to do:

v3.setStart(v2);

Thus, we already have introduced two different methods to reach the current API expressiveness. Now the problem is that the API still is not very powerful. Suppose that we now want to add v2 into the start of v3. Again, ideally we would like to do

v3.start<2>() += v2;

but again this is not possible, for the same reason. The object returned by start() is not a lvalue -- at least it makes no sense to use it as such, and it should probably be marked as const to make sure that trying to use it as a lvalue generates a compiler error.

So if we want to add v2 to the start of v3, we can either do

Vector2 tmp = v3.start<2>();
tmp += v2;
v3.setStart(tmp);

which is slow for the same reason discussed in the previous section, or we need to add another method Vector3::addToStart(), which makes our API more complicated. Vector libraries taking this route end up with dozens of such methods to handle many kinds of usual operations, and still aren't very expressive.

The ideal solution would be to make start() return an actual lvalue so that

v3.start<2>() += v2;

just works. Now it is certainly possible to achieve that by letting start() return a "proxy" class implementing its own operator+=, but what we really want is to reuse the operator+= from class Vector2. Indeed, if we want this to be a powerful solution, we must make it possible to use any vector method on the result of start(). Obviously, it would be far too heavy to rewrite all vector methods for all kinds of expressions such as start().

As we will see, all these issues are solved by expression templates.

Simple Vector class with expression templates, without inheritance

Let us now modify the Vector class from the previous section to implement start() and operator+() in a way that addresses the issues raised in the two previous sections.

template<int Size>
class Vector
{
    float data[Size];
  public:
    Vector() {}
    void operator=(const Vector& other)
    {
      for(int i=0;i<Size;i++) data[i] = other.data[i];
    }
    void operator+=(const Vector& other)
    {
      for(int i=0;i<Size;i++) data[i] += other.data[i];
    }
    template<int N> Vector<N> start() const
    {
      Vector<N> result;
      for(int i=0;i<N;i++) result.data[i] = data[i];
      return result;
    }
    template<int N> void setStart(const Vector<N>& other)
    {
      for(int i=0;i<N;i++) data[i] = other.data[i];
    }
};


Simple Vector class with expression templates in Eigen's style

Guided tour of Eigen's expression templates

---