At least the most use vector extensions SSE and AVX do have comparison instructions. These currently cannot be used by the Eigen-User in a useful way.
To use them one has to consider:
--Different formats for booleans
In AVX only the highest bit counts when using select assembler instructions
In SSE one uses bitmasks
Without using extensions an int is true if it's nonzero
--Different Sizes for booleans
booleans have the size of the scalar
--Using the FPU/Interger Unit
xor/or/and operations withe these boolean can be done in the integer/floating point unit of the processor. Getting them from the integer unit to the fp unit takes up to 2 cycles. The programmer should have a way to control this.
In order to adress these issues the boolean type in Eigen needs to have a format description. The best way to do this may be templates.
template <Scalartype (int/float/double), Vectorengine>
This adresses the first two issues. One can partially specialize these to get the size right.
This type can then be used with some useful operations like
Basically the Scalartype of the EigenBool selects the Processor unit to use (FPU/Integer), so the typical user is not bothered by problem 3.
One should be able to store the EigenBools in memory.
And/Or/Xor should be usable
If only the highest bit count like in AVX, then one could convert float to EigenBool<float,AVX> without any cost and this should be possible.
These operations can all be Expression Templates.
I think if everything is done right, then the user has very fast comparisons in the usual clean-and-easy-to-use Eigen way.
So the question is: How to implement this exactly? One could add operators | & ^ to Matrixbase, but using a special type for this is much more elegant.
For the select() operator, we need to specify if it is guaranteed to make lazy-evaluation (as the (_ ? _ : _) operator does), e.g. do we guarantee that the following is always possible:
ArrayXi A, B, C;
B = (A!=0).select(10/A, A);
This would unfortunately disallow vectorization in many cases.
Certainly, it would be possible to ensure lazy evaluation if unsafe operations are involved (integer division is not SSE supported, anyways).
At the moment, it is mostly lazy evaluation, except if either expression has side effects:
B = (A!=0).select(C=10/A, C=A); // not a good idea anyways ...
This crashes if A contains 0 elements and C is either 10/A or A entirely.
Gael once suggested making assignment operators expressions as well (bug 110), this might theoretically allow to make the above expression behave as:
for(Index i=0; ...)
B(i) = (A(i)!=0) ? (C(i)=10/A(i)) : (C(i)=A(i));
Admittedly, that might be a bit overkill ...
As stated in the original post, we need to hide the masks into an internal structure. I don't think that Vectorengine should be part of the type, though -- e.g., it would not really make sense to have SSE-masks when compiling for NEON.
Of course, SSE/AVX blend operations shall be used when available.
I also doubt that the high-bit-only optimization is worth the effort. This would only be beneficial for x<=-0.0 comparisons (and ignoring NaNs).
However, to be compatible with Meta-Packets (Bug 692) the number of comparison results should be customizable.
We then need to implement conversions from EigenBool<float,4> to EigenBool<double,4>, etc (using register shuffling).
Also, storing booleans in memory should best be architecture independent (and basically the same as Array<bool, M, N>), i.e., EigenBool will need specialized pload/pstore instructions.
Furthermore, the name should probably better be something like
template<class Scalar, int N>