Difference between revisions of "Working notes - Indexing++"

From Eigen
Jump to: navigation, search
(Some references: added SO questions/feature requests)
(Some references: reference to libigl)
Line 18: Line 18:
 
* This [http://ericniebler.com/2014/12/07/a-slice-of-python-in-c post] about the Range v3 library.
 
* This [http://ericniebler.com/2014/12/07/a-slice-of-python-in-c post] about the Range v3 library.
 
* This [http://ericniebler.com/2014/12/07/a-slice-of-python-in-c/#comment-130118 comment] describes an interesting API for range and slices inspired by [https://en.wikipedia.org/wiki/APL_(programming_language) APL].
 
* This [http://ericniebler.com/2014/12/07/a-slice-of-python-in-c/#comment-130118 comment] describes an interesting API for range and slices inspired by [https://en.wikipedia.org/wiki/APL_(programming_language) APL].
 +
* [https://libigl.github.io/libigl/matlab-to-eigen.html libigl] extends Eigen by a colon and a slice function
 
* Several questions/feature request on StackOverflow: [http://stackoverflow.com/questions/13540147/submatrices-and-indices-using-eigen] [http://stackoverflow.com/questions/16253415/eigen-boolean-array-slicing] [http://stackoverflow.com/questions/40074738/eigen-extracting-submatrix-from-vector-of-indices]
 
* Several questions/feature request on StackOverflow: [http://stackoverflow.com/questions/13540147/submatrices-and-indices-using-eigen] [http://stackoverflow.com/questions/16253415/eigen-boolean-array-slicing] [http://stackoverflow.com/questions/40074738/eigen-extracting-submatrix-from-vector-of-indices]
  

Revision as of 14:39, 22 December 2016

The goal of this page is to summarize the different ideas and working plan to (finally!) provide support for flexible row/column indexing in Eigen. See this bug report.

Needs

We aim to support various indexing mechanisms. For each dimension, we would like to be able to have any of:

  • all: as in A(:)
  • singleton: as in A(3) (already supported!)
  • index-array: as in A([3 1 5])
  • range-based: as in A(3:9), but sometimes it is also convenient to specify the start+length similar to what Eigen::Block offers.
  • slicing: as in A(3:2:9). Same as for range, we sometime also want to specify start+length+stride
  • negative indices: as in A(end-3) or A(3:2:end-1)
  • boolean masking: as in A(A>0)

All of this with a concise and unambiguous API! We should also be able to provide compile-time information such as lengths and maybe strides.

Some references

  • This post about the Range v3 library.
  • This comment describes an interesting API for range and slices inspired by APL.
  • libigl extends Eigen by a colon and a slice function
  • Several questions/feature request on StackOverflow: [1] [2] [3]

C++11 API

Since achieving a lean API in C++03 seems to be rather impossible, let's first see what could be done in C++11 and provide more verbose fallbacks for C++03.

Better than a long story, here is a showcase demo.


Source code

Click expand to see the implementation details.

#include <iostream>
#include <array>
#include <valarray>
#include <vector>
#include <Eigen/Dense>
using namespace Eigen;
using namespace std;
 
struct all_t { all_t() {} };
static const all_t all{};
 
struct shifted_last {
  int offset;
};
 
struct last_t {
  shifted_last operator- (int offset) const { return shifted_last{offset}; }
};
static const last_t last{};
 
struct Range {
  Range(Index f, Index l) : m_first(f), m_last(l) {}
  Index m_first, m_last;
  Index size() const { return m_last-m_first+1; }
  Index operator[] (Index k) { return m_first + k; }
};
 
struct RangeLast {
  RangeLast(Index f, last_t) : m_first(f) {}
  Index m_first;
};
 
struct RangeShiftedLast {
  RangeShiftedLast(Index f, shifted_last l) : m_first(f), m_last(l) {}
  Index m_first;
  shifted_last m_last;
};
 
struct Slice {
  Slice(Index f, Index s, Index l) : m_first(f), m_step(s), m_last(l) {}
  Index m_first, m_step, m_last;
  Index size() const { return (m_last-m_first+m_step)/m_step; }
  Index operator[] (Index k) { return m_first + k*m_step; }
};
struct SliceLast {
  SliceLast(Index f, Index s, last_t) : m_first(f), m_step(s) {}
  Index m_first, m_step;
};
 
struct SliceShiftedLast {
  SliceShiftedLast(Index f, Index s, shifted_last l) : m_first(f), m_step(s), m_last(l) {}
  Index m_first, m_step;
  shifted_last m_last;
};
 
Range             span(Index f, Index l)                  { return Range(f,l); }
RangeLast         span(Index f, last_t l)                 { return RangeLast(f,l); }
RangeShiftedLast  span(Index f, shifted_last l)           { return RangeShiftedLast(f,l); }
Slice             span(Index f, Index s, Index l)         { return Slice(f,s,l); }
SliceLast         span(Index f, Index s, last_t l)        { return SliceLast(f,s,l); }
SliceShiftedLast  span(Index f, Index s, shifted_last l)  { return SliceShiftedLast(f,s,l); }
 
template<int Size=Dynamic,int Step=1,int Start=0>
struct iota_t {
  iota_t(Index size=Size) : m_size(size), m_step(Step), m_start(Start) {}
  iota_t(Index size,Index step,Index start) : m_size(size), m_step(step), m_start(start) {}
  internal::variable_if_dynamic<Index,Size> m_size;
  internal::variable_if_dynamic<Index,Step> m_step;
  internal::variable_if_dynamic<Index,Start> m_start;
 
  operator Slice() const {
    return Slice(m_start.value(), m_step.value(),
                 m_start.value()+(m_size.value()-1)*m_step.value()); }
 
  friend iota_t<Size,Dynamic,Start> operator*(int stride,const iota_t &x) {
    return iota_t<Size,Dynamic,Start>{x.m_size.value(),x.m_step.value()*stride,x.m_start.value()};
  }
 
  iota_t<Size,Step==Dynamic?Dynamic:-Step,Start> operator-() {
    return iota_t<Size,Step==Dynamic?Dynamic:-Step,Start>{m_size.value(),-m_step.value(),m_start.value()};
  }
 
  iota_t<Size,Step,Dynamic> operator+(int offset) const {
    return iota_t<Size,Step,Dynamic>{m_size.value(),m_step.value(),m_start.value()+offset};
  }
 
  friend iota_t<Size,Step,Dynamic> operator+(int offset,const iota_t &x) {
    return iota_t<Size,Step,Dynamic>{x.m_size.value(),x.m_step.value(),x.m_start.value()+offset};
  }
 
  friend iota_t<Size,Step==Dynamic?Dynamic:-Step,Dynamic> operator-(int offset,const iota_t &x) {
    return iota_t<Size,Step==Dynamic?Dynamic:-Step,Dynamic>{x.m_size.value(),-x.m_step.value(),x.m_start.value()+offset};
  }
};
 
iota_t<> iota(Index size) { return iota_t<>(size); }
template<int Size>
iota_t<Size> iota() { return iota_t<Size>(); }
 
// NOTE that declaring value_type here is needed to make SFINAE works properly in case T does not exhibit value_type
// Another solution would be to specialize has_boolean_value_type and has_index_value_type for Range, Slice, etc.
template<typename T,typename value_type=typename T::value_type>
struct has_boolean_value_type {
  enum {
    value =  bool(internal::is_same<value_type,bool>::value)
#if EIGEN_COMP_CLANG
    // Workaround clang's issue:
    //  (valarray::operator<)::value_type is int instead of bool,
    //  so we check the return type of (valarray::operator<).operator[], which is bool!
          || bool(internal::is_same<typename internal::remove_all<decltype(declval<T>()[0])>::type,bool>::value)
#endif
  };
};
 
template<typename T> struct is_index { enum { value = std::is_integral<T>::value }; };
template<> struct is_index<bool> { enum { value = false }; };
 
template<typename T,typename value_type=typename T::value_type>
struct has_index_value_type {
  enum {
    value = is_index<value_type>::value
  };
};
 
struct MyMat {
 
  void operator()(Index k) { cout << "singleton: "; cout << k << '\n'; }
 
  void operator()(const char*) { cout << "all\n"; }
 
  void operator()(all_t) { cout << "all\n"; }
 
  void operator()(Range ids) { cout << "range-based: ";  print_indices(ids);  }
  void operator()(RangeLast ids) { return operator()({ids.m_first,m_size-1});  }
  void operator()(RangeShiftedLast ids) { return operator()({ids.m_first,m_size-ids.m_last.offset-1});  }
 
  void operator()(Slice ids) { cout << "slice-based: ";  print_indices(ids);  }
  void operator()(SliceLast ids) { return operator()(Slice{ids.m_first,ids.m_step,m_size-1});  }
  void operator()(SliceShiftedLast ids) { return operator()(Slice{ids.m_first,ids.m_step,m_size-ids.m_last.offset-1});  }
 
  template<typename Indices>
  typename internal::enable_if<has_index_value_type<Indices>::value>::type
  operator()(const Indices &ids) { cout << "index-based: "; print_indices(ids); }
 
  template<typename Mask>
  typename internal::enable_if<has_boolean_value_type<Mask>::value>::type
  operator()(const Mask &mask) {
    cout << "mask-based: ";
    for(int k=0; k<mask.size(); ++k)
      if(mask[k]) cout << k << " ";
    cout << "\n";
  }
 
  template<typename T>
  void F(T) {}
 
protected:
  template<typename T>
  void print_indices(T& ids) {
    for(int k=0; k<ids.size(); ++k)
      cout << ids[k] << " ";
    cout << "\n";
  }
 
  Index m_size = 11;
};
 
#define PRINT(X,P) cout << #X << P << "  -> "; X
 
int main()
{
  ArrayXd eia(10); eia.setRandom();
  ArrayXi eii(4); eii << 3, 1, 7, 5;
  valarray<double> vala(10); Map<ArrayXd>(&vala[0],10) = eia;
  valarray<int> vali(4); Map<ArrayXi>(&vali[0],4) = eii;
  vector<int> veci(4); Map<ArrayXi>(veci.data(),4) = eii;
 
  MyMat mat;
  cout << "\nc++98 API:\n"
       <<   "-------------\n";
  PRINT( mat(5),"                     " );  // singleton
  PRINT( mat(all),"                   " );  // all
  PRINT( mat(""),"                    " );  // all
  PRINT( mat(span(3,9)),"             " );  // range
  PRINT( mat(span(3,last-2)),"        " );  // range
  PRINT( mat(span(9,3)),"             " );  // empty range (as in MatLab), need to use a slice:
  PRINT( mat(span(9,-1,3)),"          " );  // slice
  PRINT( mat(span(3,3,last-3)),"      " );  // slice
  PRINT( mat(eii),"                   " );  // index-based
  PRINT( mat(vali),"                  " );  // index-based
  PRINT( mat(veci),"                  " );  // index-based
  PRINT( mat(eia>0),"                 " );  // mask
  PRINT( mat(vala>0.0),"              " );  // mask
 
  cout << "\nc++11 shortcuts:\n"
       <<   "-------------\n";
  PRINT( mat({3,9}),"                 " );  // range
  PRINT( mat({3,last}),"              " );  // range
  PRINT( mat({3,last-2}),"            " );  // range
  PRINT( mat({9,3}),"                 " );  // empty range (as in MatLab), need to use a slice:
  PRINT( mat({9,-1,3}),"              " );  // slice
  PRINT( mat({9,-2,1}),"              " );  // slice
  PRINT( mat({3,2,9}),"               " );  // slice
  PRINT( mat({3,2,last}),"            " );  // slice
  PRINT( mat({3,3,last-3}),"          " );  // slice
  PRINT( mat(valarray<int>{5,2,5,6}),"" );  // index-based
  PRINT( mat(array<int,4>{5,2,5,6})," " );  // index-based
  PRINT( mat( vector<bool>{false,true,true,false}),"  " );  // mask
  PRINT( mat(!valarray<bool>{false,true,true,false}),"" );  // mask
 
  cout << "\nAPL-like API (c++98 compatible):\n"
       <<   "--------------------------------\n";
  PRINT( mat(iota(3)),"               " );
  PRINT( mat(2+iota(3)),"             " );
  PRINT( mat(3+2*iota(3)),"           " );
  PRINT( mat(9-iota(3)),"             " );
  PRINT( mat(9-2*iota(3)),"           " );
  PRINT( mat(9-2*iota<3>()),"         " );
 
}

Generated output

c++98 API:
----------
mat(5)                       -> singleton: 5
mat(all)                     -> all
mat("")                      -> all
mat(span(3,9))               -> range-based: 3 4 5 6 7 8 9
mat(span(3,last-2))          -> range-based: 3 4 5 6 7 8
mat(span(9,3))               -> range-based:
mat(span(9,-1,3))            -> slice-based: 9 8 7 6 5 4 3
mat(span(3,3,last-3))        -> slice-based: 3 6
mat(eii)                     -> index-based: 3 1 7 5
mat(vali)                    -> index-based: 3 1 7 5
mat(veci)                    -> index-based: 3 1 7 5
mat(eia>0)                   -> mask-based: 2 4 7 8 9
mat(vala>0.0)                -> mask-based: 2 4 7 8 9

c++11 shortcuts:
----------------
mat({3,9})                   -> range-based: 3 4 5 6 7 8 9
mat({3,last})                -> range-based: 3 4 5 6 7 8 9 10
mat({3,last-2})              -> range-based: 3 4 5 6 7 8
mat({9,3})                   -> range-based:
mat({9,-1,3})                -> slice-based: 9 8 7 6 5 4 3
mat({9,-2,1})                -> slice-based: 9 7 5 3 1
mat({3,2,9})                 -> slice-based: 3 5 7 9
mat({3,2,last})              -> slice-based: 3 5 7 9
mat({3,3,last-3})            -> slice-based: 3 6
mat(valarray<int>{5,2,5,6})  -> index-based: 5 2 5 6
mat(array<int,4>{5,2,5,6})   -> index-based: 5 2 5 6
mat( vector<bool>{false,true,true,false})    -> mask-based: 1 2
mat(!valarray<bool>{false,true,true,false})  -> mask-based: 0 3

APL-like API (c++98 compatible):
--------------------------------
mat(iota(3))                 -> slice-based: 0 1 2
mat(2+iota(3))               -> slice-based: 2 3 4
mat(3+2*iota(3))             -> slice-based: 3 5 7
mat(9-iota(3))               -> slice-based: 9 8 7
mat(9-2*iota(3))             -> slice-based: 9 7 5
mat(9-2*iota<3>())           -> slice-based: 9 7 5

Discussions

  1. For concision, this demo only considers the 1D case, but of course the idea is to implement it in the nD settings. For instance, with a 3D tensor, we could imagine:
    mat(span(0,2,last), "", vec>0)
    which is why an API like mat.slice(...) is a no go. This is also a crucial limitation of the current Eigen::Block-based API.
  2. We use the word 'last' instead of Matlab's 'end' to (1) reduce collision with std::end, and (2) make it more consistent with STL's end as 'last' refer to the last element whereas 'end' refers to the last+1.
  3. Using "" for all might looks odd, but it would avoid introducing a polluting identifier all (nobody wants to write Eigen::all). We have the same issue with last though.
  4. c++98 span(first,[step,]last) API:
    • Pros:
      • mimics MatLab first:[step:]last API
      • enable indexing from the end
    • Cons:
      • the argument order is not explicit
      • we could specify compile time size with span<3>(4,6) or span<3>(4,2,8) but then we have redundant information...
  5. APL-like API:
    • Pros:
      • easy to specify compile-time info
      • no argument-order confusion
      • can be built on top of the range/slice API (hence, can be done outside Eigen)
    • Cons:
      • cumbersome?
      • needs to specify length instead of first-last
  6. c++11 mat({first,[step,]last}) API:
    • Pros:
      • very compact
      • closely mimics MatLab first:step:last API
      • no need to import a possibly polluting name as Eigen::span
      • enable indexing from the end
    • Cons:
      • might be confusing with mat({i0,i1,i2}) (which is why mat({i0,i1,i2,i3,...}) is not allowed)
      • does not permit to specify compile-time size
      • the argument order is not explicit
  7. APL-like API:
    • Pros:
      • easy to specify compile-time info
      • no argument-order confusion
      • can be built on top of the range/slice API (hence, can be done outside Eigen)
    • Cons:
      • cumbersome?
      • needs to specify length instead of first-last
  8. What does not work:
mat({5,2,6,7,1});                                // needs to explicitly use a valarray, array, or ...
mat({true,false,false,true,true,false,true});    // needs to explicitly use a valarray, array, or ...

C++03 fallback API

Internal implementation details

Performance

One could be tempted to leverage AVX2 gather/scatter instructions, but those are horribly slow. Better emulate them for now.