Bugzilla – Bug 329

Feature request: Ability to get a "view" into a sub-matrix by indexing it with a vector or matrix of indices

Last modified: 2016-01-28 21:13:44 UTC

Created attachment 204 [details] A prototype for "views" that would allow Matlab-style indexing of sub-matrices I'd like to be able to index into Eigen matrices with an array of indices, in a way similar to Matlab. I've implemented a prototype of this capability. As a starting point, I've tried to follow the conventions established in Matlab, since I think these will be familiar to many people. I expect that the code I'm posting here will still need some work before it would be suitable for including in the library, but I'm hoping this will help provide a starting point for further discussion. Here's a quick summary of the files in the attachment: include/MyEigen/MatrixViewSubscript.h: implements indexing into a matrix by a set of row indices and a set of column indices. include/MyEigen/MatrixViewIndex.h: implements indexing into a matrix by a single set of indices. This follows the Matlab convention, which allows an element of a 2d matrix to be accessed with a single index (treats the matrix as though it were a vector). include/MyEigen/IndexRange.h: a class for defining ranges of indices, based on a start, and end, and a step size. A negative value for the start or end will define an index relative to the last element of a matrix, with -1 being the last element, -2 being the second to last, and so on. A #define is used to provide shorthand for the common case of IndexRange(0,-1,1) . include/MyEigen/SubToInd.h: a class for combining two matrices containing row and column indices, respectively, into a single set of indices. include/MyEigen/MatrixBaseAddons.h: implements the () operator for various types of indices. include/MyEigen/Core: an include header for including Eigen/Core plus the additional features src/test_views.cpp: a very simple test of the new features

It seems to me that it would make more sense to have the indices implemented as Arrays instead of Matrices, and the corresponding indexing for Array types would also be needed. I was going to file a RFE for this today, but it's nice to see that you've already started on a patch!

Hi Michael, this is a very good starting point! Of course I have many remarks though. The first is that there is of course a large overlap between the current "Block" API (block, head, tail, segment, corner*, middle*, left*, right*) and what can be done with ranges. Personally I consider this as a plus because, depending on the context, one is more convenient than the other. Ranges are also more flexible because they include a step to extract what people sometimes call "slices", and they can be directly combined with col/row extraction and/or indices based extraction. On the other hand, "ranges" are currently slower than "blocks" because they lost direct access, vectorization and unrolling abilities. Unrolling requires to know the size at compile-time. I guess it is fine to give up on this because this would lead to a cumbersome API: Range(start,end) Range<Size>(start) without clear gain compared to what offers the current block based API. DirectAccess is however important to preserve. This is possible when the nested expression has DirectAccess and when we only have single indices and/or ranges. In this case, MatrixViewSubscript should be based on MapBase<> with proper strides computed from the steps. Knowing the "steps" at compile time is also very important, especially when step==1 for vectorization. I'd suggest to have the step as a template parameter of the class IndexRange defaulting to 1, and to add convenient functions to create ranges: template<int Step> IndexRange<Step> Range(int s, int e) { return IndexRange<Step>(s,e); } IndexRange<> Range(int s, int e) { return IndexRange<>(s,e); } IndexRange<Dynamic> Range(int s, int e, int i) { return IndexRange<Dynamic>(s,e,i); } In most cases, users would only have to deal with these Range() functions: Range(1,5) Range(1,5,2) Range<2>(1,5) The advantage of inheriting MapBase<> is that all the vectorization logic is already implemented. Regarding the implementation of MatrixViewSubscript, I'm pretty sure that all the numerous variants could be factorized using some simple meta programming at a lower level. MatrixViewIndex sounds quite dangerous since its result depend on the storage order.

I second this! I could use it for following case: I store a mesh of n Vector3f in a VectorXf of size 3*n. Parts of my code need to select a subset of these vertices (i.e. those in a certain bounding box), and then multiply a VectorXf constructed from them ({v1.x,v1.y,v1.z, v2.x,v2.y,v2.z, ...}) with a matrix. Having the ability to use an index array to form a view on the original VectorXf would simplify my task quite a bit. I would suggest to also support using a std::vector<int> in addition to a VectorXi to store the indices. This seems natural to me and would allow using iterators to construct it (e.g. I use a boost::filter_iterator<... bounding_box_filter ...> to create the vector).

What's the status on this feature? This is a central feature in MATLAB and Armadillo: non-contiguous views (added in version 3.0): X.elem( vector_of_indices ) X( vector_of_indices ) (added in version 3.810) X.cols( vector_of_column_indices ) X.rows( vector_of_row_indices ) X( vector_of_row_indices, vector_of_column_indices ) X.submat( vector_of_row_indices, vector_of_column_indices ) I would appreciate if this went into Eigen as well. We use this kind of indexed views a lot.

(In reply to comment #4) > What's the status on this feature? It appears that nobody is actively working on this at the moment, maybe mostly because a clean solution is blocked by bug 99. We could implement an ad-hoc solution with no guarantee that the API may change later. I increased the priority of this bug.

The biggest difference from Armadillo and MATLAB is in my experience the lack of indexed views in Eigen. That's the only missing killer-feature. I vote for an ad-hoc solution that might change in the future. Software is iterative, so I don't care if the API changes later on.

In our use we would also appreciate this feature. I think an ad-hoc implementation would be fine, one could use a new method name to avoid stepping on other parts of Eigen if it sounds too dangerous to overload the () operator for this. As long as it is clearly stated to be experimental, we are OK with it changing later.

How is this bug going? I'm working with FEM, so I solve part of a matrix and then add the results to a vector. In Armadillo I could do this: vector(vector_of_indices) += matrix(vector_of_indices, vector_of_indices).solve(vector(vector_of_indices)); In Eigen I currently have to split this into like 20 rows of code where I actually have to create a new (dynamic) matrix and copy the entries according to vector_of_indices in it. The same for the right-hand-side vector. Then I have to loop over the results and add it to the resulting vector. This is not as performant as it could be with this feature added. I know it's no use to nag but I just wanted to give you a real world example of how this could improve performance and simplicity of some projects.

(In reply to comment #8) > How is this bug going? Sorry to disappoint, but I'm afraid nobody is working on it yet. > vector(vector_of_indices) += matrix(vector_of_indices, > vector_of_indices).solve(vector(vector_of_indices)); In your case, you will merely safe one matrix-copy, because the decomposition must be on a copy anyways (you will save the copy from your temporary). As copying is O(N^2) but the decomposition is O(N^3), you will hardly notice a difference. Also, solving the system will be much more efficient on an (aligned) temporary vector -- you might again save a copy operation here (but again copying is O(N), solving is O(N^2)). The most important improvement is that you save your 20 rows of code (and I do think that is important). > I know it's no use to nag [...] It's actually like the second best way, to force a feature to be implemented. The best way would be to implement it and provide a patch :)

We need to decide in what form vector_of_indices can be provided. The API-wise simplest solution would be to accept anything that has a size() and operator[] implemented. But we should ensure that operator[] returns any kind of integer types.

I agree that we should accept anything with a size() and operator[] returning integers. We should probably also specialize for operator[] returning bools to select the respective columns/rows. To this end, without C++11 I guess that the only robust solution would be to check for either value_type or Scalar. Maybe we could only check for value_type and make value_type an alias of Scalar in Eigen (this was suggested to leverage more STL compatibility).

A simple test without requiring value_type would be: template<typename T> void assert_integer_type(const T&) { /* static assertion here */}; Usage: template<class Indexes> ... foo(const Indexes& ind) { if(ind.size()>0) assert_integer(ind[0]); ... I'm making this depend on Bug 360, anyways. Another thing: An optimization we'd want to have is to check if Indexes itself has a fixed size, e.g., if we pass an std::array or an Array<int,Size,1> as index vector we should return an expression which itself has a fixed size. (This is definitely not required for the first versions ...)

The problem is that we probably want to specialize wrt integers or bools at compile time.

Yes, you are right. Checking for Scalar and value_type seems a reasonable solution for that. If we implement a traits class for that, this could also provide a SizeAtCompileTime value.