Difference between revisions of "Expression templates"

From Eigen
Jump to: navigation, search
(General discussion)
Line 3: Line 3:
 
== General discussion ==
 
== General discussion ==
  
 +
{{{
 
You can safely skip this section, it is less precise and more demanding than the rest of this article. Concrete examples are discussed in the next sections below.
 
You can safely skip this section, it is less precise and more demanding than the rest of this article. Concrete examples are discussed in the next sections below.
  
Line 117: Line 118:
 
};
 
};
 
</source>
 
</source>
 +
}}}
  
 
== Motivation : better performance through lazy evaluation ==  
 
== Motivation : better performance through lazy evaluation ==  

Revision as of 13:37, 25 June 2008

Expression templates refers to a c++ coding technique that was discovered in the 90's, which can greatly improve the performance and the API cleanness and expressiveness of certain kinds of c++ template libraries, and is especially useful for a vector/matrix library such as Eigen. First, some links: wikibooks and the links given there.

General discussion

{{{ You can safely skip this section, it is less precise and more demanding than the rest of this article. Concrete examples are discussed in the next sections below.

C++ method chaining is nice, but for certain use cases it suffers from limitations. Suppose we have a class as follows:

class C
{
  // ...
 
  public:
    C();
    C a();
    C b();
    C& operator=(const C&);
};

We can then chain the methods as follows,

C c1, c2;
c1 = c2.a().b().a();

However there are two problems with that. The first problem is that the methods a() and b() create everytime a new object of C and return it by value, copying it onto the stack. This can introduce a large overhead. The second problem is a corollary of the first one: since the return value of a() and of b() is a new temporary object, it is impossible to use them like this,

C c1, c2;
c1.a() = c2.b();

because this would assign to the temporary object of C returned by a() which has nothing to do with c1 -- hence this would have no effect on c1.

The usual idea to overcome the second problem, is to let a() and b() return "proxy objects" instead of objects of C. However this is not optimal as these objects could not be used as objects of C when that is what one wants, and in particular that wouldn't allow method chaining like this,

C c1, c2;
c1.a().b() = c2;

unless of course the API of class C is replicated in every proxy class, which also address the first problem but at the cost of massive code duplication.

Now you are probably starting to think about a trick with templates and inheritance to do this automatically... and that is expression templates.

Let us first see what we could do with templates only:

template<typename T> class A;
template<typename T> class B;
template<typename T>
class C
{
  // ...
 
  public:
    C();
    C<A<T> > a();
    C<B<T> > b();
    template<typename U> C& operator=(const C<U>&);
};

Now one could imagine that C<int> is the actual class C that we referred to above; class C<A<T> > could be the "proxy" for a() and could be implemented by partial template specialization.

There is, however, a big problem: every specialization of C<T> needs to reimplement all the methods, there is no code reuse between specializations even though typically many methods could be shared.

Our solution is given by the CRT inheritance pattern. The general idea is sketched in the following code snippet, read the rest of this article for more implementation details.

template<typename Derived>
class CBase
{
  public:
    // implement most (if not all) of the API here.
    // when you need some code/data that is specific to the Derived class, implement it as follows:
    Derived& derived() { return *static_cast<Derived*>(this); }
    void someMethodSpecificToDerived() { return derived()._someMethodSpecificToDerived(); }
 
    A<Derived> a();
    B<Derived> b();
    template<typename U> Derived& operator=(const CBase<U>&);
};
 
class C : public CBase<C>
{
    // implement here what is specific to this derived class
    void _someMethodSpecificToDerived()
    {
      // ...
    }
 
  public:
    // constructors aren't inherited and are specific anyway
    C()
    {
      // ...
    }
 
    // assignment operators are not inherited, unfortunately. This can be automated by a macro.
    template<typename U>
    C& operator=(const CBase<U>& other) { return CBase<C>::operator=(other); }
};
 
template<typename Derived>
class A : public CBase<A<Derived> >
{
  // ... same remarks as in class C ...
};
 
template<typename Derived>
class B : public CBase<B<Derived> >
{
  // ... same remarks as in class C ...
};

}}}

Motivation : better performance through lazy evaluation

Suppose that we want to write a 3D Vector class. The standard approach is:

class Vector
{
    float x, y, z;
  public:
    Vector() {}
    Vector operator+(const Vector& other) const
    {
      return Vector(x+other.x, y+other.y, z+other.z);
    }
    void operator=(const Vector& other)
    {
      x = other.x;
      y = other.y;
      z = other.z;
    }
};

With such a class, we can compute sums of vectors using operator+, like this:

Vector v1, v2, v3, v4;
v1 = v2 + v3 + v4;

This is in fact a case of method chaining, since it amounts to

v1 = v2.operator+(v3).operator+(v4);

As operator+ and operator= get inlined, the compiler will produce this code:

Vector tmp1, tmp2;
tmp1.x = v2.x + v3.x;
tmp1.y = v2.y + v3.y;
tmp1.z = v2.z + v3.z;
tmp2.x = tmp1.x + v4.x;
tmp2.y = tmp1.y + v4.y;
tmp2.z = tmp1.z + v4.z;
v1.x = tmp2.x;
v1.y = tmp2.y;
v1.z = tmp2.z;

This is performing no less than 9 float copies, while it was possible to do the same with only 3 copies:

v1.x = v2.x + v3.x + v4.x;
v1.y = v2.y + v3.y + v4.y;
v1.z = v2.z + v3.z + v4.z;

Depending on cases, it is possible that the compiler can do that optimization automatically, but in our experience, outside of really simple operations (like perhaps the one we are discussing), they can't. This is a difficult operation for the compiler to make, as it requires to reorder instructions.

Thus, the removal of temporaries, also known as "lazy evaluation", can be a major optimization. For example, for arithmetic on 3D vectors, we typically measure speed improvements between x1.5 and x2. We will see that expression templates allow that.

Motivation : better API through lvalue-returning methods

Simple Vector class with expression classes, without templates

Simple Vector class with expression templates, without inheritance

Simple Vector class with expression templates in Eigen's style

Guided tour of Eigen's expression templates