Bug 200 - ploaddup using _mm_load_sd, which is generally miscompiled on gcc/i386
As we've found out on bug 195, GCC (at least up to 4.4) on i386 (i.e. -m32) miscompiles the _mm_load_sd intrinsic in that it adds redundant x87 fldl/fstpl instructions, which should result in poor performance (in bug 195, it even resulted in a wrong result bug, but that's a different story).

Our ploaddup function is still using _mm_load_sd, so it would be nice to have a work-around for gcc/i386 not using it.
