Picch wrote:Very cool. Since this will be a semi-distributed model, will each "node" be able to use its own bits, threads, and blocks settings? Are you using the CUDA 4.0 RCs or are we safe still being on 3.2?
I'm still on 3.2. I don't typically upgrade CUDA versions until they're public. I have enough trouble on my hands without beta testing...
As for each node's settings, I'm still hammering out details, but the general flow is worked out.
Each node will use the same "bits per workunit" setting as the master node. The client nodes get the hashlist, charset, general information, and workunits from the master node, perform the operations, and feed completed workunits/passwords back to the master.
This means that, once I write it, you will be able to use separate threads/blocks settings on the client nodes, but will be locked into the same charset/bits per workunit/hash lists/etc. They are all working on the same problem, just as different systems.
Long term, my plan is to eliminate a lot of the command line fiddling by adding back in the auto-tune capability, and automatically setting up the "large bitmap" to as large as it can be given the video card. This adds significant performance, and even a 128/256 meg bitmap would be a huge improvement over not using one. I'd like to get it as close to "fire and forget" as possible. I'll still allow people to tweak settings, but I'd prefer they not *have* to do it to get good performance.

The other thing missing right now is a more robust workunit system. Right now, if a workunit is handed out, it is assumed to be completed. I don't check for non-completed workunits (if the client disconnects or there's a network glitch). I will be updating the workunit system to handle this, as I designed it from the beginning to be able to handle this case when needed.