Sc00bz wrote:One problem would be that bit slicing needs a lot of registers.
Sadly, yes. Or efficient register spilling. Which may be useful - I'm not sure if this can be done. atom & I both keep seeing references to "fast as hell" bitsliced DES on GPUs, but neither of us can find any actual code that does it.
CPUs can do 256 LM hashes at once per core (with AVX [Sandy Bridge and Bulldozer Q3 2011]) and GPUs can only do 32 LM hashes at once per core.
AVX does not yet support integer/boolean ops. Check your sources.

I wish they did, I was hoping they did, but they don't. Not until Ivy Bridge. So still 128 bit registers for hash stuff... sadly.

Nvidia GPUs are what like 10-ish times faster than CPUs. So GPUs would be around 1.25 times faster. The only benefit now is being able to have multiple GPUs in one computer.
ATI is twice that so... 2.5 times faster. Ohh wait ATI will probably run out of registers so it will need to use less cores than it has.
It's one of those "Try it & work on it" things. For now, I will be using lookup tables & shared memory. I may try for bit slicing to see if I can do it.
