in LU/Determinant.h, one can read
// trick by Martin Costabel to compute 4x4 det with only 30 muls
If I compare with another project (CGAL/determinant.h), they have only 28 muls (and the same number of add+sub). The license is not compatible so I won't copy the code here, but they are using a simple algorithm: develop with respect to the last column, recursively, and notice the common sub-expressions. In other words
* compute the 6 2x2 subdeterminants of the first 2 columns (2 muls each)
* compute the 4 3x3 subdeterminants of the first 3 columns (3 muls each)
* conclude (4 muls)
(the same strategy is still profitable for dimension 5 IIRC)
The new version is better vectorized by clang. GCC does not vectorize any of them.