Page 2 of 7
Re: Progress on the GPU accelerated RTs!

Posted:
Tue Feb 23, 2010 3:22 pm
by Bitweasil
Sc00bz, thanks. That's sort of what I thought.
Per my previous research on BOINC, I'm designing the generators to be file based. Input comes from the command line (and I'll probably add support for a workunit file/config file), and goes to an output file on disk. This keeps the platform-specific (read, Windows) code to a bare minimum, and should make it easy to deploy on a variety of platforms.
I'm also planning on writing in some basic TCP/IP support, but that will be more for my use/private cluster use than general internet distributed.
Either way, the code is fairly straightforward at this point, so modifying it for distributed generation shouldn't be a big problem. I'm really simplifying a lot of areas as I go through.
Re: Progress on the GPU accelerated RTs!

Posted:
Sat Feb 27, 2010 11:11 pm
by Bitweasil
Looks like on a GTX260/216SP, generate rates are going to be around 400M links/sec for MD5 tables, 550M links/sec for NTLM tables.
Or, roughly 3x the existing CUDA RTGEN rate. And I bet I can do some fancy stuff with SSE2 at some point. My reduction function should be efficient with that too.
I'm working out how to do state saving for my chains. Right now, to generate long chains efficiently, kernel times are WAY too long for GUI desktop use. So I'm going to break up the work, just need to figure out the finer details of that.
I'm hoping to have something out for beta testing within the next week or two, should be Win/Linux/Mac binaries available for the table generation and verification (slow, simple CPU based code that just verifies the tables are being made correctly)
Re: Progress on the GPU accelerated RTs!

Posted:
Sun Feb 28, 2010 6:01 am
by Bitweasil
Ok, I have preliminary generation code working that "works" with a GUI.
It's not as fast as a dedicated video card, but that's no real surprise.
I'm seeing around 370-390M MD5 links per second, length 8, with a GTX260/216SP card, while keeping a moderately usable GUI.
This is certainly progress, and still 3x faster than the old stuff.
Re: Progress on the GPU accelerated RTs!

Posted:
Sun Feb 28, 2010 8:12 am
by Bitweasil
437M MD5 links/sec per core of a GTX295. So the dual GTX295 box I'm testing on can do ~1.7B links/sec for MD5, will be well over 2B/sec for NTLM.
This is with chain lengths of 200 000. There are a few more merges than I like, though... need to see if I can work out how to fix that, though if you're covering the space well enough, things /will/ merge.
Fortunately, a lot of the rest of my code is closer to standalone, so I'm making good progress.
I plan to release a set of utilities that does MD4/MD5/NTLM, with SHA1 coming soon after, and possibly other algorithms.
Re: Progress on the GPU accelerated RTs!

Posted:
Sun Feb 28, 2010 10:07 am
by blazer
1.7B links/sec does that mean ur 2x 295 > Entire FRT team?
if so then DAMN nice work
Re: Progress on the GPU accelerated RTs!

Posted:
Sun Feb 28, 2010 6:03 pm
by Bitweasil
Looks like it...
Based on the 71M chains completed in the last 24h (ignoring the 0.03B links/sec, that seems low), and assuming length 10 000 chains, that looks like a generate rate of ~8.2M links per second
(71 000 000 * 10 000 / (24 * 3600)) ~= 8 200 000
This may be a bit low if tables with longer chains are being worked on, but it's in the ballpark.
One of my GTX295 cores (they're dual core GPUs) generates 437 000 000 links per second. And I have 4 GT200 cores in that system.
If I ran those cards flat out for 24h generating length 10k chains, I'd have:
(437 000 000 * 4 * 24 * 3600 / 10 000) ~= 15 000 000 000 chains.
Since I'm not, I'm generating length 200k+ chains, I can generate:
(437 000 000 * 4 * 24 * 3600 / 200 000) ~= 188 800 000 length 200k chains.
... didn't FRT optimize their code or something? These numbers seem somehow spectacularly wrong, or I misestimated the scope of FRT.
Re: Progress on the GPU accelerated RTs!

Posted:
Mon Mar 01, 2010 9:51 pm
by Sc00bz
Bitweasil wrote:didn't FRT optimize their code or something?
Well no, but FRT is currently down they normally do 1.? BLinks/s. I think it's like 1.2-ish for the last MD5 or NTLM table they did.
Re: Progress on the GPU accelerated RTs!

Posted:
Wed Mar 03, 2010 2:25 pm
by the_drag0n
What OS/GPU combos do I have available for testing?
you have my sword, and my axe xD
linux x64 machine (with GUI) / win xp x64 - 9600GT AMD Phenom 9550
win 7 x64 - gtx 260 Intel i7 920
win 7 x64 - gtx 260 intel core2quad.
nice to see some progress in this section

Re: Progress on the GPU accelerated RTs!

Posted:
Sat Mar 06, 2010 10:49 am
by mastergamer
I've got a Vista x64 box with a 9800GT available.
Re: Progress on the GPU accelerated RTs!

Posted:
Tue Mar 09, 2010 3:12 pm
by sapling
Thats great work there I have right now a Quad Core Xeon 3.0Ghz with a GTS 250 in it RAID 1+0 on 10k drives for read write speedups... It has 64bit xp on it and linux running. However as soon as the fermi cards are released I will be purchasing the new rig which will run an i7 2.6ghz with Quad SLI Fermi cards. ( Work paying for this for the purpose of hash cracking )...
I just wanted to comment this is great you can generate tables this quickly now the next big movement will be to be able to run rainbow crack using GPU's because creating tables is great but when your trying to crack large amounts of hashes and your using rcrack now its so slow during the precalc stages on any processor... Would be great to speed this up using a GPU to handle the compares and the precalcs.