Because of the differences in how std::min/std::max and the Intel SIMD instructions used for pmin/pmax handle the presence of a NaN in one of the arguments, you end up with inconsistent values on the outputs produced by from the vectorized and scalar code paths of scalar_min_op/scalar_max_op.
For example, std::max(a, b) is defined to have the same value as the ternary expression (a < b) ? b : a, in other words, if any or the arguments are NaN, a is returned.
Conversely, the MAXPS instruction invoked by _mm_max_ps(a, b) intrinsic always returns the value of the second argument b if either a or b is NaN.
I propose to swap the arguments to pmin/pmax in scalar_*_op::packetOp to get consistent results.
One could also swap the arguments in the implementation of scalar_*_op::operator(), but it turns out that the former approach makes the Eigen Tensor library properly propagate NaNs for the Relu operator, which is performance critical for neural net models.
Do you approve of this solution or do you have a better idea?
Thinking about this again, perhaps it would make more sense to make the low-level pmin/pmax implementations consistent with std::min/std::max by swapping the arguments to the intrinsics?
right, this is also related to bug 564. swapping the arguments to _mm_max_ps looks harmless, so feel free to propose a patch or PR.
Pull request in https://bitbucket.org/eigen/eigen/pull-requests/291/make-nan-propagation-consistent-between/diff