Difference between revisions of "Expression templates"

From Eigen
Jump to: navigation, search
m
 
(11 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
''Expression templates'' refers to a c++ coding technique that was discovered in the 90's, which can greatly improve the performance and the API cleanness and expressiveness of certain kinds of c++ template libraries, and is especially useful for a vector/matrix library such as Eigen. First, some links: [http://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Expression-template wikibooks] and [http://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Expression-template#References the links given there].
 
''Expression templates'' refers to a c++ coding technique that was discovered in the 90's, which can greatly improve the performance and the API cleanness and expressiveness of certain kinds of c++ template libraries, and is especially useful for a vector/matrix library such as Eigen. First, some links: [http://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Expression-template wikibooks] and [http://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Expression-template#References the links given there].
  
== General discussion ==
+
== Motivation : better performance through lazy evaluation ==  
  
You can safely skip this section, it is less precise and more demanding than the rest of this article. Concrete examples are discussed in the next sections below.
+
Suppose that we want to write a 3D Vector class. The standard approach is:
 
+
C++ method chaining is nice, but for certain use cases it suffers from limitations. Suppose we have a class as follows:
+
  
 
<source lang="cpp">
 
<source lang="cpp">
class C
+
class Vector
 
{
 
{
  // ...
+
    float x, y, z;
 
+
 
   public:
 
   public:
     C();
+
     Vector() {}
     C a();
+
     Vector(float _x, float _y, float _z) : x(_x), y(_y), z(_z) {}
     C b();
+
     Vector operator+(const Vector& other) const
     C& operator=(const C&);
+
    {
 +
      return Vector(x+other.x, y+other.y, z+other.z);
 +
     }
 +
    void operator=(const Vector& other)
 +
    {
 +
      x = other.x;
 +
      y = other.y;
 +
      z = other.z;
 +
    }
 
};
 
};
 
</source>
 
</source>
  
We can then chain the methods as follows,
+
With such a class, we can compute sums of vectors using operator+, like this:
  
 
<source lang="cpp">
 
<source lang="cpp">
C c1, c2;
+
Vector v1, v2, v3, v4;
c1 = c2.a().b().a();
+
v1 = v2 + v3 + v4;
 
</source>
 
</source>
  
However there are two problems with that. The first problem is that the methods a() and b() create everytime a new object of C and return it by value, copying it onto the stack. This can introduce a large overhead. The second problem is a corollary of the first one: since the return value of a() and of b() is a new temporary object, it is impossible to use them like this,
+
The problem here is that operator+ stores the result of its computation in a temporary variable. Introducing temporaries here is in fact useless and inefficient. Indeed, assuming that operator+ and operator= get inlined, the compiler will produce this code:
  
 
<source lang="cpp">
 
<source lang="cpp">
C c1, c2;
+
Vector tmp1, tmp2;
c1.a() = c2.b();
+
tmp1.x = v2.x + v3.x;
 +
tmp1.y = v2.y + v3.y;
 +
tmp1.z = v2.z + v3.z;
 +
tmp2.x = tmp1.x + v4.x;
 +
tmp2.y = tmp1.y + v4.y;
 +
tmp2.z = tmp1.z + v4.z;
 +
v1.x = tmp2.x;
 +
v1.y = tmp2.y;
 +
v1.z = tmp2.z;
 
</source>
 
</source>
  
because this would assign to the temporary object of C returned by a() which has nothing to do with c1 -- hence this would have no effect on c1.
+
This is performing no less than 9 float copies, while it was possible to do the same with only 3 copies:
 
+
The usual idea to overcome the second problem, is to let a() and b() return "proxy objects" instead of objects of C. However this is not optimal as these objects could not be used as objects of C when that is what one wants, and in particular that wouldn't allow method chaining like this,
+
  
 
<source lang="cpp">
 
<source lang="cpp">
C c1, c2;
+
v1.x = v2.x + v3.x + v4.x;
c1.a().b() = c2;
+
v1.y = v2.y + v3.y + v4.y;
 +
v1.z = v2.z + v3.z + v4.z;
 
</source>
 
</source>
  
unless of course the API of class C is replicated in every proxy class, which also address the first problem but at the cost of massive code duplication.
+
Depending on cases, it is possible that the compiler will do that optimization automatically, but in our experience, outside of really simple operations (like perhaps the one we are discussing), it can't. This is a difficult optimization for the compiler to make, as it involves reordering instructions.
  
Now you are probably starting to think about a trick with templates and inheritance to do this automatically... and that is expression templates.
+
Thus, the removal of temporaries, also known as "lazy evaluation", can be a major optimization. For example, for arithmetic on 3D vectors, we typically measure speed improvements around x1.5. We will see that expression templates allow that.
  
Let us first see what we could do with templates only:
+
== Motivation : better API through lvalue-returning methods ==
  
<source lang="cpp">
+
Now suppose that we have a fixed-size vector class, with the size as template parameter, and with start() and setStart() methods to manipulate sub-vectors formed of the first coefficients, like this:
template<typename T> class A;
+
template<typename T> class B;
+
template<typename T>
+
class C
+
{
+
  // ...
+
  
  public:
 
    C();
 
    C<A<T> > a();
 
    C<B<T> > b();
 
    template<typename U> C& operator=(const C<U>&);
 
};
 
</source>
 
 
Now one could imagine that C<int> is the actual class C that we referred to above; class C<A<T> > could be the "proxy" for a() and could be implemented by partial template specialization.
 
 
There is, however, a big problem: every specialization of C<T> needs to reimplement all the methods, there is no code reuse between specializations even though typically many methods could be shared.
 
 
Our solution is given by the [http://en.wikipedia.org/wiki/Curiously_Recurring_Template_Pattern CRT inheritance pattern]. The general idea is sketched in the following code snippet, read the rest of this article for more implementation details.
 
 
<source lang="cpp">
 
<source lang="cpp">
template<typename Derived>
+
template<int Size>
class CBase
+
class Vector
 
{
 
{
 +
    float data[Size];
 
   public:
 
   public:
     // implement most (if not all) of the API here.
+
     Vector() {}
    // when you need some code/data that is specific to the Derived class, implement it as follows:
+
     void operator=(const Vector& other)
    Derived& derived() { return *static_cast<Derived*>(this); }
+
     void someMethodSpecificToDerived() { return derived()._someMethodSpecificToDerived(); }
+
 
+
    A<Derived> a();
+
    B<Derived> b();
+
    template<typename U> Derived& operator=(const CBase<U>&);
+
};
+
 
+
class C : public CBase<C>
+
{
+
    // implement here what is specific to this derived class
+
    void _someMethodSpecificToDerived()
+
 
     {
 
     {
       // ...
+
       for(int i=0;i<Size;i++) data[i] = other.data[i];
 
     }
 
     }
 
+
     void operator+=(const Vector& other)
  public:
+
     // constructors aren't inherited and are specific anyway
+
    C()
+
 
     {
 
     {
       // ...
+
       for(int i=0;i<Size;i++) data[i] += other.data[i];
 
     }
 
     }
 
+
     template<int N> Vector<N> start() const
    // assignment operators are not inherited, unfortunately. This can be automated by a macro.
+
     template<typename U>
+
    C& operator=(const CBase<U>& other) { return CBase<C>::operator=(other); }
+
};
+
 
+
template<typename Derived>
+
class A : public CBase<A<Derived> >
+
{
+
  // ... same remarks as in class C ...
+
};
+
 
+
template<typename Derived>
+
class B : public CBase<B<Derived> >
+
{
+
  // ... same remarks as in class C ...
+
};
+
</source>
+
 
+
== Motivation : better performance through lazy evaluation ==
+
 
+
Suppose that we want to write a 3D Vector class. The standard approach is:
+
 
+
<source lang="cpp">
+
class Vector
+
{
+
    float x, y, z;
+
  public:
+
    Vector() {}
+
    Vector operator+(const Vector& other) const
+
 
     {
 
     {
       return Vector(x+other.x, y+other.y, z+other.z);
+
       Vector<N> result;
 +
      for(int i=0;i<N;i++) result.data[i] = data[i];
 +
      return result;
 
     }
 
     }
     void operator=(const Vector& other)
+
     template<int N> void setStart(const Vector<N>& other)
 
     {
 
     {
       x = other.x;
+
       for(int i=0;i<N;i++) data[i] = other.data[i];
      y = other.y;
+
      z = other.z;
+
 
     }
 
     }
 
};
 
};
 
</source>
 
</source>
  
With such a class, we can compute sums of vectors using operator+, like this:
+
Let us leave aside the performance issues with start() returning a vector by value: we already discussed them in the previous section. Instead, let us focus on the API's cleanness and expressiveness.
 +
 
 +
It is clearly impossible to set the start of a vector by writing
  
 
<source lang="cpp">
 
<source lang="cpp">
Vector v1, v2, v3, v4;
+
Vector<2> v2; Vector<3> v3;
v1 = v2 + v3 + v4;
+
v3.start<2>() = v2;
 
</source>
 
</source>
  
This is in fact a case of method chaining, since it amounts to
+
as this would copy v2 into the temporary vector returned by start(), not into v3 itself. Hence, this would have no effect on v3. This is why we introduced setStart() so that it is possible to do:
  
 
<source lang="cpp">
 
<source lang="cpp">
v1 = v2.operator+(v3).operator+(v4);
+
v3.setStart(v2);
 
</source>
 
</source>
  
As operator+ and operator= get inlined, the compiler will produce this code:
+
Thus, we already have introduced two different methods to reach the current API expressiveness. Now the problem is that the API still is not very powerful. Suppose that we now want to ''add'' v2 into the start of v3. Again, ideally we would like to do
  
 
<source lang="cpp">
 
<source lang="cpp">
Vector tmp1, tmp2;
+
v3.start<2>() += v2;
tmp1.x = v2.x + v3.x;
+
tmp1.y = v2.y + v3.y;
+
tmp1.z = v2.z + v3.z;
+
tmp2.x = tmp1.x + v4.x;
+
tmp2.y = tmp1.y + v4.y;
+
tmp2.z = tmp1.z + v4.z;
+
v1.x = tmp2.x;
+
v1.y = tmp2.y;
+
v1.z = tmp2.z;
+
 
</source>
 
</source>
  
This is performing no less than 9 float copies, while it was possible to do the same with only 3 copies:
+
but again this is not possible, for the same reason. The object returned by start() is not a lvalue -- at least it makes no sense to use it as such, and it should probably be marked as const to make sure that trying to use it as a lvalue generates a compiler error.
 +
 
 +
So if we want to add v2 to the start of v3, we can either do
  
 
<source lang="cpp">
 
<source lang="cpp">
v1.x = v2.x + v3.x + v4.x;
+
Vector2 tmp = v3.start<2>();
v1.y = v2.y + v3.y + v4.y;
+
tmp += v2;
v1.z = v2.z + v3.z + v4.z;
+
v3.setStart(tmp);
 
</source>
 
</source>
  
Depending on cases, it is possible that the compiler can do that optimization automatically, but in our experience, outside of really simple operations (like perhaps the one we are discussing), they can't. This is a difficult operation for the compiler to make, as it requires to reorder instructions.
+
which is slow for the same reason discussed in the previous section, or we need to add another method Vector3::addToStart(), which makes our API more complicated. Vector libraries taking this route end up with dozens of such methods to handle many kinds of usual operations, and still aren't very expressive.
  
Thus, the removal of temporaries, also known as "lazy evaluation", can be a major optimization. For example, for arithmetic on 3D vectors, we typically measure speed improvements between x1.5 and x2. We will see that expression templates allow that.
+
The ideal solution would be to make start() return an actual lvalue so that
  
== Motivation : better API through lvalue-returning methods ==
+
<source lang="cpp">
 +
v3.start<2>() += v2;
 +
</source>
  
== Simple Vector class with expression classes, without templates ==
+
just works. Now it is certainly possible to achieve that by letting start() return a "proxy" class implementing its own operator+=, but what we really want is to reuse the operator+= from class Vector2. Indeed, if we want this to be a powerful solution, we must make it possible to use any vector method on the result of start(). Obviously, it would be far too heavy to rewrite all vector methods for all kinds of expressions such as start().
 +
 
 +
As we will see, all these issues are solved by expression templates.
  
 
== Simple Vector class with expression templates, without inheritance ==
 
== Simple Vector class with expression templates, without inheritance ==
 +
 +
Let us now modify the Vector class from the previous section to implement start() and operator+() in a way that addresses the issues raised in the two previous sections.
 +
 +
<source lang="cpp">
 +
template<int Size>
 +
class Vector
 +
{
 +
    float data[Size];
 +
  public:
 +
    Vector() {}
 +
    void operator=(const Vector& other)
 +
    {
 +
      for(int i=0;i<Size;i++) data[i] = other.data[i];
 +
    }
 +
    void operator+=(const Vector& other)
 +
    {
 +
      for(int i=0;i<Size;i++) data[i] += other.data[i];
 +
    }
 +
    template<int N> Vector<N> start() const
 +
    {
 +
      Vector<N> result;
 +
      for(int i=0;i<N;i++) result.data[i] = data[i];
 +
      return result;
 +
    }
 +
    template<int N> void setStart(const Vector<N>& other)
 +
    {
 +
      for(int i=0;i<N;i++) data[i] = other.data[i];
 +
    }
 +
};
 +
</source>
 +
 +
  
 
== Simple Vector class with expression templates in Eigen's style ==
 
== Simple Vector class with expression templates in Eigen's style ==
  
 
== Guided tour of Eigen's expression templates ==
 
== Guided tour of Eigen's expression templates ==
 +
---

Latest revision as of 17:25, 11 July 2008

Expression templates refers to a c++ coding technique that was discovered in the 90's, which can greatly improve the performance and the API cleanness and expressiveness of certain kinds of c++ template libraries, and is especially useful for a vector/matrix library such as Eigen. First, some links: wikibooks and the links given there.

Motivation : better performance through lazy evaluation

Suppose that we want to write a 3D Vector class. The standard approach is:

class Vector
{
    float x, y, z;
  public:
    Vector() {}
    Vector(float _x, float _y, float _z) : x(_x), y(_y), z(_z) {}
    Vector operator+(const Vector& other) const
    {
      return Vector(x+other.x, y+other.y, z+other.z);
    }
    void operator=(const Vector& other)
    {
      x = other.x;
      y = other.y;
      z = other.z;
    }
};

With such a class, we can compute sums of vectors using operator+, like this:

Vector v1, v2, v3, v4;
v1 = v2 + v3 + v4;

The problem here is that operator+ stores the result of its computation in a temporary variable. Introducing temporaries here is in fact useless and inefficient. Indeed, assuming that operator+ and operator= get inlined, the compiler will produce this code:

Vector tmp1, tmp2;
tmp1.x = v2.x + v3.x;
tmp1.y = v2.y + v3.y;
tmp1.z = v2.z + v3.z;
tmp2.x = tmp1.x + v4.x;
tmp2.y = tmp1.y + v4.y;
tmp2.z = tmp1.z + v4.z;
v1.x = tmp2.x;
v1.y = tmp2.y;
v1.z = tmp2.z;

This is performing no less than 9 float copies, while it was possible to do the same with only 3 copies:

v1.x = v2.x + v3.x + v4.x;
v1.y = v2.y + v3.y + v4.y;
v1.z = v2.z + v3.z + v4.z;

Depending on cases, it is possible that the compiler will do that optimization automatically, but in our experience, outside of really simple operations (like perhaps the one we are discussing), it can't. This is a difficult optimization for the compiler to make, as it involves reordering instructions.

Thus, the removal of temporaries, also known as "lazy evaluation", can be a major optimization. For example, for arithmetic on 3D vectors, we typically measure speed improvements around x1.5. We will see that expression templates allow that.

Motivation : better API through lvalue-returning methods

Now suppose that we have a fixed-size vector class, with the size as template parameter, and with start() and setStart() methods to manipulate sub-vectors formed of the first coefficients, like this:

template<int Size>
class Vector
{
    float data[Size];
  public:
    Vector() {}
    void operator=(const Vector& other)
    {
      for(int i=0;i<Size;i++) data[i] = other.data[i];
    }
    void operator+=(const Vector& other)
    {
      for(int i=0;i<Size;i++) data[i] += other.data[i];
    }
    template<int N> Vector<N> start() const
    {
      Vector<N> result;
      for(int i=0;i<N;i++) result.data[i] = data[i];
      return result;
    }
    template<int N> void setStart(const Vector<N>& other)
    {
      for(int i=0;i<N;i++) data[i] = other.data[i];
    }
};

Let us leave aside the performance issues with start() returning a vector by value: we already discussed them in the previous section. Instead, let us focus on the API's cleanness and expressiveness.

It is clearly impossible to set the start of a vector by writing

Vector<2> v2; Vector<3> v3;
v3.start<2>() = v2;

as this would copy v2 into the temporary vector returned by start(), not into v3 itself. Hence, this would have no effect on v3. This is why we introduced setStart() so that it is possible to do:

v3.setStart(v2);

Thus, we already have introduced two different methods to reach the current API expressiveness. Now the problem is that the API still is not very powerful. Suppose that we now want to add v2 into the start of v3. Again, ideally we would like to do

v3.start<2>() += v2;

but again this is not possible, for the same reason. The object returned by start() is not a lvalue -- at least it makes no sense to use it as such, and it should probably be marked as const to make sure that trying to use it as a lvalue generates a compiler error.

So if we want to add v2 to the start of v3, we can either do

Vector2 tmp = v3.start<2>();
tmp += v2;
v3.setStart(tmp);

which is slow for the same reason discussed in the previous section, or we need to add another method Vector3::addToStart(), which makes our API more complicated. Vector libraries taking this route end up with dozens of such methods to handle many kinds of usual operations, and still aren't very expressive.

The ideal solution would be to make start() return an actual lvalue so that

v3.start<2>() += v2;

just works. Now it is certainly possible to achieve that by letting start() return a "proxy" class implementing its own operator+=, but what we really want is to reuse the operator+= from class Vector2. Indeed, if we want this to be a powerful solution, we must make it possible to use any vector method on the result of start(). Obviously, it would be far too heavy to rewrite all vector methods for all kinds of expressions such as start().

As we will see, all these issues are solved by expression templates.

Simple Vector class with expression templates, without inheritance

Let us now modify the Vector class from the previous section to implement start() and operator+() in a way that addresses the issues raised in the two previous sections.

template<int Size>
class Vector
{
    float data[Size];
  public:
    Vector() {}
    void operator=(const Vector& other)
    {
      for(int i=0;i<Size;i++) data[i] = other.data[i];
    }
    void operator+=(const Vector& other)
    {
      for(int i=0;i<Size;i++) data[i] += other.data[i];
    }
    template<int N> Vector<N> start() const
    {
      Vector<N> result;
      for(int i=0;i<N;i++) result.data[i] = data[i];
      return result;
    }
    template<int N> void setStart(const Vector<N>& other)
    {
      for(int i=0;i<N;i++) data[i] = other.data[i];
    }
};


Simple Vector class with expression templates in Eigen's style

Guided tour of Eigen's expression templates

---