Alright, now I'm on a keyboard, so hardware thoughts.
For the multiforcer, all other things being equal (will expand on this later), performance scales linearly with (stream processor count * shader clock). This is where the "performance number" comes from in the code - and it's mostly accurate.
However, the "all things equal" applies to the 460s. They've got a different dispatch unit setup that reduces CUDA efficiency on non-vector code. So, basically, the 460s behave like a 256 stream processor card under *most* workloads.
Details here:
http://www.anandtech.com/show/3809/nvid ... 200-king/2 I don't know how my code runs on a 460... benchmarks welcome. They may work well, but I'd not expect great performance off them. There's a possibility I could tune my code to perform better on the 460, and that may happen around the time I go OpenCL, but for now, unless benchmarks prove otherwise, I would consider the 460 to be a sub-optimum choice for running my code.
The 470s, on the other hand, are pretty sweet. $250/card, decent clocking, and generally are respectable units.
As for host CPU/RAM/etc: Optimum performance *right now* would happen with one CPU per GPU, running at 100% in a spin wait for the GPU, so it can fire off the new kernel ASAP.
Long term, this will change. With the multiGPU support, it's not hard at all to add support for CPU cracking. So, shortly, having a good CPU will be beneficial, and I'll tune the code to interrupt on complete instead of spin wait - I'll lose a bit of GPU performance, but more than gain it in the CPU thread performance.
However, GPUs are still the primary contributor. So, in a tradeoff between money spent on GPUs and money spent on CPUs, go with the faster GPUs.
Also, while I don't recommend this, overclocking well cooled GPUs will gain a good bit of performance... if you have random water blocks or something laying around.
As for RAM, it really doesn't matter. As long as it's a reasonable amount, you're fine.
For the rainbow table systems, you'll want a fast GPU, a lot of RAM, and fast disk. However, SSDs don't benefit my code as much as one might hope, because with the indexes in place, the table search is just a bunch of quick linear reads, which spindle disks do well. If you don't have indexes, yeah, SSDs are beneficial, but I did write my code with big spindle disk arrays in mind. I *suspect* (but haven't proven) that a RAID1 mirror with a few disks would be the fastest option, since it can read from multiple places at once. I'm also thinking of adding a prefetching thread which would help even more with this.