Difference between revisions of "Expression templates"

From Eigen
Jump to: navigation, search
(General discussion)
m
 
(21 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
''Expression templates'' refers to a c++ coding technique that was discovered in the 90's, which can greatly improve the performance and the API cleanness and expressiveness of certain kinds of c++ template libraries, and is especially useful for a vector/matrix library such as Eigen. First, some links: [http://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Expression-template wikibooks] and [http://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Expression-template#References the links given there].
 
''Expression templates'' refers to a c++ coding technique that was discovered in the 90's, which can greatly improve the performance and the API cleanness and expressiveness of certain kinds of c++ template libraries, and is especially useful for a vector/matrix library such as Eigen. First, some links: [http://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Expression-template wikibooks] and [http://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Expression-template#References the links given there].
  
== General discussion ==
+
== Motivation : better performance through lazy evaluation ==  
  
You can safely skip this section, it is less precise and more demanding than the rest of this article. A concrete example is discussed [[#A simple Vector class|below]].
+
Suppose that we want to write a 3D Vector class. The standard approach is:
 
+
C++ method chaining is nice, but for certain use case it suffers from limitations. Suppose we have a class as follows:
+
  
 
<source lang="cpp">
 
<source lang="cpp">
class C
+
class Vector
 
{
 
{
  // ...
+
    float x, y, z;
 
+
 
   public:
 
   public:
     C();
+
     Vector() {}
     C a();
+
     Vector(float _x, float _y, float _z) : x(_x), y(_y), z(_z) {}
     C b();
+
     Vector operator+(const Vector& other) const
     C& operator=(const C&);
+
    {
 +
      return Vector(x+other.x, y+other.y, z+other.z);
 +
     }
 +
    void operator=(const Vector& other)
 +
    {
 +
      x = other.x;
 +
      y = other.y;
 +
      z = other.z;
 +
    }
 
};
 
};
 
</source>
 
</source>
  
We can then chain the methods as follows,
+
With such a class, we can compute sums of vectors using operator+, like this:
  
 
<source lang="cpp">
 
<source lang="cpp">
C c1, c2;
+
Vector v1, v2, v3, v4;
c1 = c2.a().b().a();
+
v1 = v2 + v3 + v4;
 
</source>
 
</source>
  
However there are two problems with that. The first problem is that the methods a() and b() create everytime a new object of C and return it by value, copying it onto the stack. This can introduce a large overhead. The second problem is a corollary of the first one: since the return value of a() and of b() is a new temporary object, it is impossible to use them like this,
+
The problem here is that operator+ stores the result of its computation in a temporary variable. Introducing temporaries here is in fact useless and inefficient. Indeed, assuming that operator+ and operator= get inlined, the compiler will produce this code:
  
 
<source lang="cpp">
 
<source lang="cpp">
C c1, c2;
+
Vector tmp1, tmp2;
c1.a() = c2.b();
+
tmp1.x = v2.x + v3.x;
 +
tmp1.y = v2.y + v3.y;
 +
tmp1.z = v2.z + v3.z;
 +
tmp2.x = tmp1.x + v4.x;
 +
tmp2.y = tmp1.y + v4.y;
 +
tmp2.z = tmp1.z + v4.z;
 +
v1.x = tmp2.x;
 +
v1.y = tmp2.y;
 +
v1.z = tmp2.z;
 
</source>
 
</source>
  
because this would assign to the temporary object of C returned by a() which has nothing to do with c1 -- hence this would have no effect on c1.
+
This is performing no less than 9 float copies, while it was possible to do the same with only 3 copies:
 
+
The usual idea to overcome the second problem, is to let a() and b() return "proxy objects" instead of objects of C. However this is not optimal as these objects could not be used as objects of C when that is what one wants, and in particular that wouldn't allow method chaining like this,
+
  
 
<source lang="cpp">
 
<source lang="cpp">
C c1, c2;
+
v1.x = v2.x + v3.x + v4.x;
c1.a().b() = c2;
+
v1.y = v2.y + v3.y + v4.y;
 +
v1.z = v2.z + v3.z + v4.z;
 
</source>
 
</source>
  
unless of course the API of class C is replicated in the proxy classes. This would also address the first problem.
+
Depending on cases, it is possible that the compiler will do that optimization automatically, but in our experience, outside of really simple operations (like perhaps the one we are discussing), it can't. This is a difficult optimization for the compiler to make, as it involves reordering instructions.
  
Now you are probably starting to think about a trick with templates and inheritance to do this automatically... and that is expression templates.
+
Thus, the removal of temporaries, also known as "lazy evaluation", can be a major optimization. For example, for arithmetic on 3D vectors, we typically measure speed improvements around x1.5. We will see that expression templates allow that.
  
Let us first see what we could do with templates only:
+
== Motivation : better API through lvalue-returning methods ==
 +
 
 +
Now suppose that we have a fixed-size vector class, with the size as template parameter, and with start() and setStart() methods to manipulate sub-vectors formed of the first coefficients, like this:
  
 
<source lang="cpp">
 
<source lang="cpp">
template<typename T> class A;
+
template<int Size>
template<typename T> class B;
+
class Vector
template<typename T>
+
class C
+
 
{
 
{
  // ...
+
    float data[Size];
 
+
 
   public:
 
   public:
     C();
+
     Vector() {}
     C<A<T> > a();
+
    void operator=(const Vector& other)
     C<B<T> > b();
+
    {
     template<typename U> C& operator=(const C<U>&);
+
      for(int i=0;i<Size;i++) data[i] = other.data[i];
 +
     }
 +
    void operator+=(const Vector& other)
 +
    {
 +
      for(int i=0;i<Size;i++) data[i] += other.data[i];
 +
    }
 +
    template<int N> Vector<N> start() const
 +
     {
 +
      Vector<N> result;
 +
      for(int i=0;i<N;i++) result.data[i] = data[i];
 +
      return result;
 +
    }
 +
     template<int N> void setStart(const Vector<N>& other)
 +
    {
 +
      for(int i=0;i<N;i++) data[i] = other.data[i];
 +
    }
 
};
 
};
 
</source>
 
</source>
  
Now one could imagine that C<int> is the actual class C that we referred to above; class C<A<T> > could be the "proxy" for a() and could be implemented by partial template specialization.
+
Let us leave aside the performance issues with start() returning a vector by value: we already discussed them in the previous section. Instead, let us focus on the API's cleanness and expressiveness.
  
There is, however, a big problem: every partial specialization of C<T> needs to reimplement all the methods, there is no code reuse between specializations even though typically many methods could be shared.
+
It is clearly impossible to set the start of a vector by writing
  
Our solution is given by the [http://en.wikipedia.org/wiki/Curiously_Recurring_Template_Pattern CRT inheritance pattern]. The general idea is sketched in the following code snippet, read the rest of this article for more implementation details.
 
 
<source lang="cpp">
 
<source lang="cpp">
template<typename Derived>
+
Vector<2> v2; Vector<3> v3;
class CBase
+
v3.start<2>() = v2;
{
+
</source>
  public:
+
    // implement most (if not all) of the API here.
+
    // when you need some code/data that is specific to the Derived class, implement it as follows:
+
    Derived& derived() { return *static_cast<Derived*>(this); }
+
    void someMethodSpecificToDerived() { return derived()._someMethodSpecificToDerived(); }
+
  
    C();
+
as this would copy v2 into the temporary vector returned by start(), not into v3 itself. Hence, this would have no effect on v3. This is why we introduced setStart() so that it is possible to do:
    C<A<Derived> > a();
+
    C<B<Derived> > b();
+
    template<typename U> C& operator=(const C<U>&);
+
};
+
  
class C : public CBase<C>
+
<source lang="cpp">
{
+
v3.setStart(v2);
  // only need to implement here what is specific to this derived class
+
</source>
  
  void _someMethodSpecificToDerived() { /* ... */ }
+
Thus, we already have introduced two different methods to reach the current API expressiveness. Now the problem is that the API still is not very powerful. Suppose that we now want to ''add'' v2 into the start of v3. Again, ideally we would like to do
};
+
  
template<typename Derived>
+
<source lang="cpp">
class A : public CBase<A<Derived> >
+
v3.start<2>() += v2;
{
+
</source>
  // ...
+
};
+
  
template<typename Derived>
+
but again this is not possible, for the same reason. The object returned by start() is not a lvalue -- at least it makes no sense to use it as such, and it should probably be marked as const to make sure that trying to use it as a lvalue generates a compiler error.
class A : public CBase<A<Derived> >
+
 
{
+
So if we want to add v2 to the start of v3, we can either do
  // ...
+
 
};
+
<source lang="cpp">
 +
Vector2 tmp = v3.start<2>();
 +
tmp += v2;
 +
v3.setStart(tmp);
 
</source>
 
</source>
  
== A simple Vector class ==
+
which is slow for the same reason discussed in the previous section, or we need to add another method Vector3::addToStart(), which makes our API more complicated. Vector libraries taking this route end up with dozens of such methods to handle many kinds of usual operations, and still aren't very expressive.
  
Suppose that we want to write a Vector class. The standard approach is,
+
The ideal solution would be to make start() return an actual lvalue so that
  
 
<source lang="cpp">
 
<source lang="cpp">
class Vector {
+
v3.start<2>() += v2;
     float m_x, m_y, m_z;
+
</source>
 +
 
 +
just works. Now it is certainly possible to achieve that by letting start() return a "proxy" class implementing its own operator+=, but what we really want is to reuse the operator+= from class Vector2. Indeed, if we want this to be a powerful solution, we must make it possible to use any vector method on the result of start(). Obviously, it would be far too heavy to rewrite all vector methods for all kinds of expressions such as start().
 +
 
 +
As we will see, all these issues are solved by expression templates.
 +
 
 +
== Simple Vector class with expression templates, without inheritance ==
 +
 
 +
Let us now modify the Vector class from the previous section to implement start() and operator+() in a way that addresses the issues raised in the two previous sections.
 +
 
 +
<source lang="cpp">
 +
template<int Size>
 +
class Vector
 +
{
 +
     float data[Size];
 
   public:
 
   public:
     Vector(float x, float y, float z) : m_x(x), m_y(y), m_z(z) {}
+
     Vector() {}
    float x() const { return m_x; }
+
     void operator=(const Vector& other)
    float y() const { return m_y; }
+
    float z() const { return m_z; }
+
     Vector operator+(const Vector& other)
+
 
     {
 
     {
       return Vector(x()+other.x(), y()+other.y(), z()+other.z());
+
       for(int i=0;i<Size;i++) data[i] = other.data[i];
 +
    }
 +
    void operator+=(const Vector& other)
 +
    {
 +
      for(int i=0;i<Size;i++) data[i] += other.data[i];
 +
    }
 +
    template<int N> Vector<N> start() const
 +
    {
 +
      Vector<N> result;
 +
      for(int i=0;i<N;i++) result.data[i] = data[i];
 +
      return result;
 +
    }
 +
    template<int N> void setStart(const Vector<N>& other)
 +
    {
 +
      for(int i=0;i<N;i++) data[i] = other.data[i];
 
     }
 
     }
 
};
 
};
 
</source>
 
</source>
 +
 +
 +
 +
== Simple Vector class with expression templates in Eigen's style ==
 +
 +
== Guided tour of Eigen's expression templates ==
 +
---

Latest revision as of 17:25, 11 July 2008

Expression templates refers to a c++ coding technique that was discovered in the 90's, which can greatly improve the performance and the API cleanness and expressiveness of certain kinds of c++ template libraries, and is especially useful for a vector/matrix library such as Eigen. First, some links: wikibooks and the links given there.

Motivation : better performance through lazy evaluation

Suppose that we want to write a 3D Vector class. The standard approach is:

class Vector
{
    float x, y, z;
  public:
    Vector() {}
    Vector(float _x, float _y, float _z) : x(_x), y(_y), z(_z) {}
    Vector operator+(const Vector& other) const
    {
      return Vector(x+other.x, y+other.y, z+other.z);
    }
    void operator=(const Vector& other)
    {
      x = other.x;
      y = other.y;
      z = other.z;
    }
};

With such a class, we can compute sums of vectors using operator+, like this:

Vector v1, v2, v3, v4;
v1 = v2 + v3 + v4;

The problem here is that operator+ stores the result of its computation in a temporary variable. Introducing temporaries here is in fact useless and inefficient. Indeed, assuming that operator+ and operator= get inlined, the compiler will produce this code:

Vector tmp1, tmp2;
tmp1.x = v2.x + v3.x;
tmp1.y = v2.y + v3.y;
tmp1.z = v2.z + v3.z;
tmp2.x = tmp1.x + v4.x;
tmp2.y = tmp1.y + v4.y;
tmp2.z = tmp1.z + v4.z;
v1.x = tmp2.x;
v1.y = tmp2.y;
v1.z = tmp2.z;

This is performing no less than 9 float copies, while it was possible to do the same with only 3 copies:

v1.x = v2.x + v3.x + v4.x;
v1.y = v2.y + v3.y + v4.y;
v1.z = v2.z + v3.z + v4.z;

Depending on cases, it is possible that the compiler will do that optimization automatically, but in our experience, outside of really simple operations (like perhaps the one we are discussing), it can't. This is a difficult optimization for the compiler to make, as it involves reordering instructions.

Thus, the removal of temporaries, also known as "lazy evaluation", can be a major optimization. For example, for arithmetic on 3D vectors, we typically measure speed improvements around x1.5. We will see that expression templates allow that.

Motivation : better API through lvalue-returning methods

Now suppose that we have a fixed-size vector class, with the size as template parameter, and with start() and setStart() methods to manipulate sub-vectors formed of the first coefficients, like this:

template<int Size>
class Vector
{
    float data[Size];
  public:
    Vector() {}
    void operator=(const Vector& other)
    {
      for(int i=0;i<Size;i++) data[i] = other.data[i];
    }
    void operator+=(const Vector& other)
    {
      for(int i=0;i<Size;i++) data[i] += other.data[i];
    }
    template<int N> Vector<N> start() const
    {
      Vector<N> result;
      for(int i=0;i<N;i++) result.data[i] = data[i];
      return result;
    }
    template<int N> void setStart(const Vector<N>& other)
    {
      for(int i=0;i<N;i++) data[i] = other.data[i];
    }
};

Let us leave aside the performance issues with start() returning a vector by value: we already discussed them in the previous section. Instead, let us focus on the API's cleanness and expressiveness.

It is clearly impossible to set the start of a vector by writing

Vector<2> v2; Vector<3> v3;
v3.start<2>() = v2;

as this would copy v2 into the temporary vector returned by start(), not into v3 itself. Hence, this would have no effect on v3. This is why we introduced setStart() so that it is possible to do:

v3.setStart(v2);

Thus, we already have introduced two different methods to reach the current API expressiveness. Now the problem is that the API still is not very powerful. Suppose that we now want to add v2 into the start of v3. Again, ideally we would like to do

v3.start<2>() += v2;

but again this is not possible, for the same reason. The object returned by start() is not a lvalue -- at least it makes no sense to use it as such, and it should probably be marked as const to make sure that trying to use it as a lvalue generates a compiler error.

So if we want to add v2 to the start of v3, we can either do

Vector2 tmp = v3.start<2>();
tmp += v2;
v3.setStart(tmp);

which is slow for the same reason discussed in the previous section, or we need to add another method Vector3::addToStart(), which makes our API more complicated. Vector libraries taking this route end up with dozens of such methods to handle many kinds of usual operations, and still aren't very expressive.

The ideal solution would be to make start() return an actual lvalue so that

v3.start<2>() += v2;

just works. Now it is certainly possible to achieve that by letting start() return a "proxy" class implementing its own operator+=, but what we really want is to reuse the operator+= from class Vector2. Indeed, if we want this to be a powerful solution, we must make it possible to use any vector method on the result of start(). Obviously, it would be far too heavy to rewrite all vector methods for all kinds of expressions such as start().

As we will see, all these issues are solved by expression templates.

Simple Vector class with expression templates, without inheritance

Let us now modify the Vector class from the previous section to implement start() and operator+() in a way that addresses the issues raised in the two previous sections.

template<int Size>
class Vector
{
    float data[Size];
  public:
    Vector() {}
    void operator=(const Vector& other)
    {
      for(int i=0;i<Size;i++) data[i] = other.data[i];
    }
    void operator+=(const Vector& other)
    {
      for(int i=0;i<Size;i++) data[i] += other.data[i];
    }
    template<int N> Vector<N> start() const
    {
      Vector<N> result;
      for(int i=0;i<N;i++) result.data[i] = data[i];
      return result;
    }
    template<int N> void setStart(const Vector<N>& other)
    {
      for(int i=0;i<N;i++) data[i] = other.data[i];
    }
};


Simple Vector class with expression templates in Eigen's style

Guided tour of Eigen's expression templates

---