Tensor support

From Eigen
Revision as of 15:49, 24 October 2016 by Benoitsteiner (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

IMPORTANT NOTE: The current developement version of Eigen (post-3.2) supports Tensors. This support is experimental and a moving target.

For a TODO list / discussion on further features, see Working notes - Tensor module.

Requirements

Since using objects with an (in principle) arbitrary number of indices requires variadic templates, Eigen's tensor module requires C++11. The initial version of the Tensor module has been tested with g++ from 4.6 as well as clang 3.2 and 3.3. Intel's 13.1 compiler also works in principle (13.0 crashes during compilation), but produces non-optimal code. For these compilers, please supply at least -std=c++0x, -std=c++11 or even -std=c++1y to the compiler as a flag so that they switch to C++11/C++14 mode.

It has not yet been tested with other compilers.

Supported features

Currently, the Tensor module only supports the following few features:

  • basic storage for dynamic-sized and statically sized objects
  • setting to zero (.setZero()) for POD types
  • simple assignments to other tensors of the same type
  • getting the dimensionality of the tensor
  • Coefficient extraction and manipulation through slicing, shuffling, and more...
  • Coefficient-wise operations (addition, subtraction, ...)
  • Reductions
  • Contractions
  • Parallelization over multiple CPU cores though multi-threading
  • Offloading of the computation to a CUDA GPU or an OpenCL acccelerator.

The README file contains a complete documentation of the module and its API.

Using the Tensor module

In order to use Tensors, include the Tensor module with the following include statement:

#include <unsupported/Eigen/CXX11/Tensor>

The following example demonstrates how the Tensor class can be used:

Eigen::Tensor<double, 3> epsilon(3,3,3);
epsilon.setZero();
epsilon(0,1,2) = 1;
epsilon(1,2,0) = 1;
epsilon(2,0,1) = 1;
epsilon(1,0,2) = -1;
epsilon(2,1,0) = -1;
epsilon(0,2,1) = -1;
Eigen::Tensor<double, 4> grassmannIdentity(3,3,3,3);
grassmannIdentity.setZero();
// this is not the most efficient way to write such a product,
// but is the only way possible with the current feature set
for (int i = 0; i < 3; i++) {
  for (int j = 0; j < 3; j++) {
    for (int k = 0; k < 3; k++) {
      for (int l = 0; l < 3; l++) {
        for (int m = 0; m < 3; m++) {
          grassmannIdentity(i,j,l,m) += epsilon(i,j,k) * epsilon(k,l,m);
        }
      }
    }
  }
}

// verify
for (int i = 0; i < 3; i++) {
  for (int j = 0; j < 3; j++) {
    for (int l = 0; l < 3; l++) {
      for (int m = 0; m < 3; m++) {
        assert(grassmannIdentity(i,j,l,m) == (int(i == l) * int(j == m) - int(i == m) * int(j == l)));
      }
    }
  }
}

// dimensionalities
assert(epsilon.dimension(0) == 3);
assert(epsilon.dimension(1) == 3);
assert(epsilon.dimension(2) == 3);
auto dims = epsilon.dimensions();
assert(dims[0] == 3);
assert(dims[1] == 3);
assert(dims[2] == 3);

By default, Tensors are stored in "column-major" order, &t(x+1, ...) == &t(x, ...) + 1, &t(x, y+1, ...) == &t(x, y, ...) + t.dimension(0), etc. But by using a flag, they can also be changed to "row-major" ordering:

Eigen::Tensor<double, 4, Eigen::RowMajor> rowMajorTensor(4, 6, 3, 2);

Caveats and known issues

  • In order to run the tests for this module, one should specify -DEIGEN_TEST_CXX11=ON. Furthermore, one need to specify -DEIGEN_TEST_NVCC=ON to enable the CUDA tests.

Exploiting tensor symmetries

Often, Tensors have intrinsic symmetries between indices, which may be exploited. There is an additional module TensorSymmetry that contains support for this. It may be included via

#include <unsupported/Eigen/CXX11/TensorSymmetry>

The current implementation works in the following way:

  • every symmetry is assumed to be a relation of the tensor to itself with two swapped indices, e.g. identities such as t(a,b,c) == t(b,a,c) or epsilon(a,b,c) == -epsilon(b,a,c) (currently supported are symmetry, antisymmetry, hermiticity and antihermiticity)
  • the tensor itself stores the full values
  • a single assignment sets all elements of the tensor related through symmetry

At a later point in time, the code could be extended to use symmetries to reduce the required storage (while at the same time making memory layout much more complicated), but there are no concrete plans so far to do so.

Constructing the symmetry group

The first step is to construct the full symmetry group that relates all the given symmetries to each other.

To take the example of the epsilon tensor, it has two elementary symmetries: antisymmetry between the first two and the last two indices (or equivalently, the first two and the first and the last index). The following four ways show how to create the corresponding group:

StaticSGroup<AntiSymmetry<0,1>, AntiSymmetry<1,2>> ssym1;
StaticSGroup<AntiSymmetry<0,1>, AntiSymmetry<0,2>> ssym2;
DynamicSGroup dsym; dsym.addAntiSymmetry(0,1); dsym.addAntiSymmetry(1,2);
SGroup<AntiSymmetry<0,1>, AntiSymmetry<1,2>> sym;

In their result, all four are equivalent. The difference between StaticSGroup template and the DynamicSGroup class is that the StaticSGroup template generates the whole group at compile time (through some very involved template logic), while the DynamicSGroup class performs the group computation at run time.

The SGroup template is a wrapper that will instantiate the static case if the number of symmetries specified is at most four and the number of elements in the group is at most 16. If there are more than four symmetry specifiers (i.e. generators of the group apart from the identity), this might cause the compiler to use up GB of RAM and take hours/days/weeks to compile.

Trivially zero tensors / elements

Different combinations of symmetries / antisymmetries are not necessarily useful. For example, any tensor that has the property that the first two indices are antisymmetric and the second and third indices are symmetric has to be zero by definition. The code detects this properties of a symmetry group, but does not react to it by default.

The user can check after the construction of a symmetry group whether any tensor with this symmetry is trivially zero, trivially real or trivially imaginary:

if (sym.globalFlags() & Eigen::GlobalZeroFlag) {
  // any tensor with this symmetry is trivially zero
} else if (sym.globalFlags() & Eigen::GlobalRealFlag) {
  // any tensor with this symmetry is trivially real
} else if (sym.globalFlags() & Eigen::GlobalImagFlag) {
  // any tensor with this symmetry is trivially imaginary
} else {
  // the symmetry does not pose any restrictions on the tensor in
  // general
}

Note that currently Eigen::GlobalZeroFlag == Eigen::GlobalRealFlag | Eigen::GlobalImagFlag (since 0 is the only number that is both on the real and imaginary axis), therefore it is necessary to check it first.

Alternatively, the user can

#define EIGEN_TENSOR_SYMMETRY_CHECK_VALUES

before including the TensorSymmetry module. This will activate checks upon assignment to a tensor that the values assigned fulfill these criteria exactly (i.e. whether they are zero, purely real or purely imaginary) and fails an assertion if not.

Even if the symmetry itself does not restrict the tensor, some elements may be restricted. For example, if there is an antisymmetry between two indices, all elements where both are equal are zero. Similarly, all elements with equal indices that have a hermiticity relation are real. The define EIGEN_TENSOR_SYMMETRY_CHECK_VALUES also activates checks for this.

Note that the EIGEN_TENSOR_SYMMETRY_CHECK_VALUES checks are expensive (and incur quite a bit of runtime cost) and should therefore only be used for developement and debugging.

Applying to a tensor

The initialization of the epsilon tensor can be rewritten as:

Eigen::Tensor<double, 3> epsilon(3,3,3);
epsilon.setZero();
Eigen::SGroup<Eigen::AntiSymmetry<0,1>, Eigen::AntiSymmetry<1,2>> sym;
sym(epsilon, 0, 1, 2) = 1;

Here, the last line sets all related elements (i.e. the specified element (0,1,2) and all 5 other permutations of these indices) to the proper value. Since the group here is small enough (2 generators, 6 elements), SGroup chooses the StaticSGroup implementation for this group. Therefore, the entire group is generated at compile time and the assignment incurs no runtime penalty compared to just having 6 assignments (when compiled with optimization on a decent compiler), because the loop over the group elements is unrolled at compile time.

For larger groups, it can be better to not unroll the loop at compile time in order to reduce the code size to better fit it in the processor cache. For this reason, the SGroup template will choose the DynamicSGroup variant if the group has more than 16 elements, even if it does not have more than four generators.