How to Properly Use SSE3 and SSSE3

Ian Ollmann a member of Apple's Vector and Numerics Group and one of the chief architects of the Accelerate framework has an article on MacResearch
Intel keeps extending its vector ISA, which means more headaches for more programmers trying to make use of the latest and greatest stuff in their apps. While it is easiest to be a grumpy Luddite and stick to SSE2 (the last extension guaranteed to be everywhere by Apple), it turns out there is actually (some) legitimately useful stuff in the new vector extensions. While the complex arithmetic / horizontal add stuff in SSE3 is pretty useless, and more often than not a performance trojan horse (SSE2 based approaches, done properly, typically lead to less permute overhead and use cheaper instructions), SSSE3 contains at least two highly useful instructions, pshufb and pmulhrs. With
more useful stuff in SSE4 around the corner, it seems that a little good programming practice now will save you lots of headaches later.
